|
While some realtime kernels or executives provide support for
memory protection in the development environment, few provide protected memory support for the runtime
configuration, citing penalties in memory and performance as reasons. But with memory protection becoming
common on many embedded processors, the benefits of memory protection far outweigh the very small penalties in
performance for enabling it.
The key advantage gained by adding memory protection to
embedded applications, especially for mission-critical systems, is improved robustness.
With memory protection, if one of the processes executing in
a multitasking environment attempts to access memory that hasn't been explicitly declared or allocated for the
type of access attempted, the MMU hardware can notify Neutrino, which can then abort the thread (at the
failing/offending instruction).
This "protects" process address spaces from each other,
preventing coding errors in a thread on one process from "damaging" memory used by threads in other processes
or even in the OS. This protection is useful both for development and for the installed runtime system,
because it makes postmortem analysis possible.
During development, common coding errors (e.g. stray pointers
and indexing beyond array bounds) can result in one process/thread accidentally overwriting the data space of
another process. If the overwrite touches memory that isn't referenced again until much later, you can spend
hours of debugging - often using in-circuit emulators and logic analyzers - in an attempt to find the "guilty
party."
With an MMU enabled, the OS can abort the process the instant
the memory-access violation occurs, providing immediate feedback to the programmer instead of mysteriously
crashing the system some time later. The OS can then provide the location of the errant instruction in the
failed process, or position a symbolic debugger directly to this instruction.
Memory Management Units (MMUs)
A typical MMU operates by dividing physical memory into a
number of 4K pages. The hardware within the processor then makes use of a set of page tables stored in
system memory that define the mapping of virtual addresses (i.e. the memory addresses used within the
application program) to the addresses emitted by the CPU to access physical memory.
While the thread executes, it uses memory addresses much as
it would in a non-MMU system, except that the page tables managed by the OS control how these addresses are
"mapped" onto the physical memory attached to the processor.
 |
On an x86, page tables managed by the OS control how virtual addresses are mapped onto physical memory. |
For a large address space with many processes and threads,
the number of page-table entries needed to describe these mappings can be significant - more than can be
stored within the processor. To maintain performance, the processor caches frequently used portions of the
external page tables within a TLB (translation look-aside buffer).
The servicing of "misses" on the TLB cache is part of the
overhead imposed by enabling the MMU. Neutrino uses various clever page-table arrangements to minimize this
overhead.
Associated with these page tables are bits that define the
attributes of each page of memory. Pages can be marked as read-only, read-write, etc. Typically, the memory of
an executing process would be described with read-only pages for code and read-write for the data and
stack.
When Neutrino performs a context switch (i.e. suspends the
execution of one thread and resumes another), it will manipulate the MMU to use a potentially different set of
page tables for the newly resumed thread. If Neutrino is switching between threads within a single
process, no MMU manipulations are necessary.
When the new thread resumes execution, any addresses
generated as the thread runs are mapped to physical memory through the assigned page tables. If the thread
tries to use an address not mapped to it or to use an address in a way that violates the defined attributes
(e.g. writing to a read-only page), the CPU will receive a "fault" (similar to a divide-by-zero error),
typically implemented as a special type of interrupt.
By examining the instruction pointer pushed on the stack by
the interrupt, the OS can determine the address of the instruction that caused the memory-access fault within
the thread/process and can act accordingly.
Memory protection at run time
While memory protection is useful during development, it can
also provide greater reliability for embedded systems installed in the field. Many embedded systems already
employ a hardware "watchdog timer" to detect if the software or hardware has "lost its mind," but this
approach lacks the finesse of an MMU-assisted watchdog.
Hardware watchdog timers are usually implemented as a
retriggerable monostable timer attached to the processor reset line. If the system software doesn't strobe the
hardware timer regularly, the timer will expire and force a processor reset. Typically, some component of the
system software will check for system integrity and strobe the timer hardware to indicate the system is
"sane."
Although this approach enables recovery from a lockup related
to a software or hardware glitch, it results in a complete system restart and perhaps significant "downtime"
while this restart occurs.
Software watchdog
When an intermittent software error occurs in a
memory-protected system, the OS can catch the event and pass control to a user-written thread instead of the
memory dump facilities. This thread can make an intelligent decision about how best to recover from the
failure, instead of forcing a full reset as the hardware watchdog timer would do. The software watchdog
could:
- Abort the process that failed due to a memory access violation and simply restart that process without
shutting down the rest of the system.
- Abort the failed process and any related processes, initialize the hardware to a "safe" state, and then
restart the related processes in a coordinated manner.
- If the failure is very critical, perform a coordinated shutdown of the entire system and sound an
audible alarm.
The important distinction here is that we retain intelligent,
programmed control of the embedded system, even though various processes and threads within the control
software may have failed for various reasons. A hardware watchdog timer is still of use to recover from
hardware "latch-ups," but for software failures we now have much better control.
While performing some variation of these recovery strategies,
the system can also collect information about the nature of the software failure. For example, if the embedded
system contains or has access to some mass storage (Flash memory, hard drive, a network link to another
computer with disk storage), the software watchdog can generate a chronologically archived sequence of dump
files. These dumpfiles could then be used for postmortem diagnostics.
Embedded control systems often employ these "partial restart"
approaches to surviving intermittent software failures without the operators experiencing any system
"downtime" or even being aware of these quick-recovery software failures. Since the dumpfiles are available,
the developers of the software can detect and correct software problems without having to deal with the
emergencies that result when critical systems fail at inconvenient times. If we compare this to the hardware
watchdog timer approach and the prolonged interruptions in service that result, it's obvious what our
preference is!
Postmortem dump-file analysis is especially important for
mission-critical embedded systems. Whenever a critical system fails in the field, significant effort should be
made to identify the cause of the failure so that a "fix" can be engineered and applied to other systems
before they experience similar failures.
Dump files give programmers the information they need to fix
the problem - without them, programmers may have little more to go on than a customer's cryptic complaint that
"the system crashed."
Quality control
By dividing embedded software into a team of cooperating,
memory-protected processes (containing threads), we can readily treat these processes as "components" to be
used again in new projects. Because of the explicitly defined (and hardware-enforced) interfaces, these
processes can be integrated into applications with confidence that they won't disrupt the system's overall
reliability. In addition, because the exact binary image (not just the source code) of the process is being
reused, we can better control changes and instabilities that might have resulted from recompilation of source
code, relinking, new versions of development tools, header files, library routines, etc.
Since the binary image of the process is reused (with its
behavior perhaps modified by command-line options), the confidence we have in that binary module from acquired
experience in the field more easily carries over to new applications than if the binary image of the process
were changed.
As much as we strive to produce error-free code for the
systems we deploy, the reality of software-intensive embedded systems is that programming errors will end up
in released products. Rather than pretend these bugs don't exist (until the customer calls to report them), we
should adopt a "mission-critical" mindset. Systems should be designed to be tolerant of, and able to recover
from, software faults. Making use of the memory protection delivered by integrated MMUs in the embedded
systems we build is a good step in that direction.
Memory protection
The performance cost of increased protection is insignificant
for most systems. Perhaps more important is the increased memory cost to support MMU page tables to implement
greater levels of protection.
The Neutrino Process Manager
(procnto) offers the choice of either full protection or none at
all:
| Protection Type |
MMU Memory Overhead |
| No protection |
none |
| Full protection VM |
4K to 8K per process |
The full-protection model relocates all code in the image into a
new virtual space, enabling the MMU hardware and setting up the initial page-table mappings. This allows
procnto to start in a correct, MMU-enabled environment. The Process Manager
will then take over this environment, changing the mapping tables as needed by the processes it
starts.
No protection
In this model, every process in the system is relocated
during system build time to an absolute physical location. Without an MMU, there can be no distinction or
protection between kernel and user address spaces - all processes run in a common, shared address
space.
 |
No protection - all processes run in a shared address space. |
This kernel memory configuration is particularly well suited
to cost-reduced embedded processors that may lack a paged MMU. This memory configuration is typical of what
most realtime executives and kernels provide.
Although there's no memory protection or virtual memory
mapping in this runtime model, Neutrino still provides an mmap() function that can make a "best effort"
for the memory model it runs in. As a result, without source or object-file changes, application programs can
map memory and use it in various ways (hardware access, shared-inter-process memory, etc.), regardless of the
underlying memory-protection services.
Full protection VM
In this model, each process is given its own private virtual
memory, which starts at 0 and spans to 2 or 3.5 Gigabytes (depending on the CPU). This is accomplished by
using the CPU's MMU. The performance cost for a process switch and a message pass will increase due to the
increased complexity of obtaining addressability between two completely private address spaces.
 |
Full protection VM - each user process is given its own private virtual memory. This example assumes an x86 CPU. |
The memory cost per process may increase by 4K to 8K for each
process's page tables. Note that this memory model supports the POSIX fork() call.
|