Kumar R., Singhania A., Castner A., Kohler E Proceedings of Design Automation Conference Pages: June /7/13
Many embedded systems contain resource constrained microcontrollers where applications, operating system components and device drivers reside within a single address space with no form of memory protection. Programming errors in one application can easily corrupt the state of the operating system and other applications on the microcontroller. In this paper we propose a system that provides memory protection in tiny embedded processors. Our system consists of a software run-time working with minimal low-cost architectural extensions to the processor core that prevents corruption of state by buggy applications. We restrict memory accesses and control flow of applications to protection domains within the address space. The software run-time consists of a memory map: a flexible and efficient data structure that records ownership and layout information of the entire address space. Abstract - 2 -
Memory map checks are done for store instructions by hardware accelerators that significantly improve the performance of our system. We preserve control flow integrity by maintaining a safe stack that stores return addresses in a protected memory region. Cross domain function calls are redirected through a software based jump table. Enhancements to the microcontroller call and return instructions use the jump table to track the current active domain. We have implemented our scheme on a VHDL model of ATMEGA103 microcontroller. Our evaluations show that embedded applications can enjoy the benefits of memory protection with minimal impact on performance and a modest increase in the area of the microcontroller. Abstract (cont.) - 3 -
Memory corruption on tiny embedded processor What’s the Problem Microcontroller Address Space Single address space CPU Shared by apps., drivers and OS Buggy applications can easily corrupt the state of OS and other applications Memory Protection is an enabling technology for building robust embedded software Memory Protection is an enabling technology for building robust embedded software Memory is accessible to all SW modules via a single address space Program1 Program2 Program3 OS
MMU can provide protection domains However, No MMU in embedded micro-controllers 。 MMU hardware requires lot of RAM 。 Increases area and power consumption 。 Poor performance - High context switch overhead Memory Protection Unit (MPU) Static partition of address space into segments However, not suited for complex embedded software (such as OS) 。 Supports only two domains (user mode and supervisor mode) Protect the kernel from applications but not the applications from another Software-based Fault Isolation Run time checks to ensure all memory accesses reside within the segment allocated to it The run time checks are introduced through compiler or binary rewrite 。 However, Binary rewrite are quite error prone Related Work supervisor user P1 P2 P3 MPU
Memory protection suited for low-end microcontrollers memory write Protection Solve memory write Protection 。 “store”, “call”, and “return” instructions The proposed Memory Protection Method Memory Map Table HW extension Program1 Program2 Program3 OS Domain A Domain B Domain C Domain D Domain N …… Domain A Domain B Domain C Domain N Memory Map Checker Memory Map Checker Control Flow Manager Control Flow Manager Jump Table Safe Stack Software Routine Hardware Software Co-Design approach to memory protection Hardware Software Co-Design approach to memory protection
Protection Domain Domains - Logical partitions of address space Every software module stores its state in its own protection domain Protect domain from corruption by other domains Modules are restricted from writing to memory outside their domain through run-time checks There is one single trusted domain in the system that is allowed to access all memory
Memory Map Data Structure Fine-grained layout and ownership information User Domain Kernel Domain Partition address space into blocks Memory is allocated to domains as segments (Sets of contiguous blocks) Store information for all blocks Encoded information for all block Ownership – domain ID Layout - start of a logical segment A domain could be allocated multiple segments Efficiently encoded using 4 bits per block xxx0 - Start block of segment xxx1 - Later block of segment xxx is the 3-bit domain ID Efficiently encoded using 4 bits per block xxx0 - Start block of segment xxx1 - Later block of segment xxx is the 3-bit domain ID Back
Functional unit that validates store operations Programs can write only into their domain Invoked before every write access Memory Map Checker DATA_BUS CPU Memory Map Checker Memory Map Checker RAM CPU_ADDR CPU_WR_EN CPU_STALL MMC_ADDR MMC_WR_EN ST_INSTR Triggered on a store instruction Operations performed by the checker Lookup memory map for issued write address Retrieve permission from memory map and validates stores 。 Verify current executing domain is block owner
Assuming block size of 8 bytes, the nine significant bits of the address represent the block number Permissions are packed into a byte If the encoded information is stored in four bits, then each byte would contain information of two contiguous blocks Last bit of the block number Last bit of the block number represents the block offset of the permission The remaining bits index into the memory map tale Address Memory Map Lookup Address (bits 11-0) Memory Map Table 1 Byte has 2 memory map records 8 1 Block Number (bits 11-3)Byte Offset (bits 2-0) mem_map_base Assume block size of 8 bytes Memory Map Offset (bits 11-4)
In cycle 2 First, it stalls the processor execution and take control of the address bus to memory address translation Perform address translation to lookup memory map for issued write address Read memory map table to retrieve the permission In cycle 3 Retrieve permission from memory map, and compare the ownership information to the current executing domain ID If check is successful, then MMC issues a write operation to data memory Operations Performed by Memory Map Checker (MMC) CPU_WR_ADDR MMC_RD_ADDR CLK CPU_ADDR CPU_WR_EN MMC_ADDR MMC_WR_EN CPU_STALL Cycle 1Cycle 3Cycle 2 Regular Mode Protected Mode
The software library manages all the memory available Ensure memory map accurately reflects current ownership and layout 。 The library provides “malloc”, “free” and “change_own” calls that automatically update the memory map data structure Only permit block owner to free/change its ownership 。 To enforce this condition, the software library reads the current active domain ID memory map located in a protected region Set up the memory map to be located in a protected region 。 This prevents corruption of the memory map data structure Initialize the MMC with the proper block size, number of protection domains and the range of protected address space Memory Map Software Library Back
Control flow can become corrupt at run-time EX: Returns on corrupted stack (return addresses are stored in stack) Memory map can’t prevent such internal memory corruption 。 Programming errors can cause a module to corrupt its own state Control flow manager ensures that control can never flow out of a domain, except Via calls to functions exported by other domains Via returns to calls from other domains The current executing domain also needs to be tracked Required by the memory map checker to validate write accesses Control Flow Manager Preserve control flow integrity through the safe stack that stores return addresses Preserve control flow integrity through the safe stack that stores return addresses
Each domain has its own jump table in flash memory that contains The set of functions exported by each domain The jump table can’t be corrupted Due to modules are not allowed to write to flash memory Each entry in the jump table is an instruction jump to a valid exported function Re-directed through jump table Re-directed through jump table to functions exported by a domain Cross Domain Linking Program Memory Domain A call fooJT Domain A call fooJT Domain B foo: … ret Domain B foo: … ret fooJT: jmp foo Domain B Jump Table Cross Domain Call Verify call into jump table Compute callee domain ID Verify call into jump table Compute callee domain ID Jump exception
Jump table of all domains are stored at fixed location in flash memory This simplifies the verifying of the target address of a call A valid target address has to reside in the jump table The ID of the called domain can be easily determined First, computing the address offset from the base address of the jump table Then, dividing it by the size of the jump table The cross domain call state machine Push the current domain ID into stack Push the current domain ID into stack, during cross domain call Restore the previous domain ID and transfer control back to the caller’s domain, during cross domain return Domain Tracking jmp_tbl_base_address jmp_tbl_upper_bound call_addr <= < < AND
Single stack shared by all domains Protection Model Prevent corruption of stack belonging to a domain by any module belonging to a different domain Bounds set during cross domain call current stack pointerstack_bound Processor copies the current stack pointer into a stack_bound register Enforced by MMC before all writes No writes beyond stack bound Run-Time Stack Protection Run Time Stack Caller Domain Stack Frame Caller Domain Stack Frame Callee Domain Stack Frame Callee Domain Stack Frame Stack Ptr. Stack_Bound Stack Base Prevent cross domain corruption of stack Prevent cross domain corruption of stack
In spite of the stack are protected from corruption from modules in other domains However, programming errors can cause a module to corrupt its own stack Therefore, maintain an extra stack in protected memory To store return addresses in a separate stack that resides in a different protection domain Setup safe stack at the end of all global data and make it grows up toward run-time stack Safe Stack RUN-TIME STACK RUN-TIME STACK SAFE STACK SAFE STACK HEAP and GLOBALS HEAP and GLOBALS Safe Stack and Run-Time Stack approach one another
Performance Overhead (CPU Cycles) Introduced by the Memory Protection Mechanism Compare with software based approach through binary rewrite Superior performance of run-time checks in HW High overhead of software based memory map checker Due to require bit shift operations to translate write address to memory map lookup Cross domain call and return have an overhead of five cycles Due to push “current domain ID”, “stack bound” and “return address” to stack 。 Information of five bytes needed to push to stack, and one byte can be written every cycle Restoring the values read from stack Saving and restoring return addresses doesn’t introduced added overhead Due to simply redirect the store of the return address to safe stack when processor pushes the return address to the run-time stack Unit: CPU Cycles
Overhead introduced in memory map software library Due to memory map needs to be updated during allocation, free and transfer of memory Higher overheads of free and change_own calls Due to additional checks to prevent illegal freeing or ownership transfer of memory by non-owners Performance Overhead (CPU Cycles) of Software Library Introduced by the Protection Mechanism Compare overhead of memory allocation routines in the presence and absence of the protection mechanism Unit: CPU Cycles
Code and Data Memory Usage of the Software Library Memory map size is 256 bytes for multi-domain protection This represents an overhead of 6.25% (256 bytes / 4KB) Flexible data-structure - Tradeoff RAM for protection Size of memory map required can be reduced 。 By modifying portion of address space that required memory map for protection The total code memory usage of the software library 3674 bytes, an overhead of 2.8% (3674 bytes / 128KB)
Most of the additions to the core area are in the memory map decoder That support arbitrary bit-shift in a single cycle 。 We can eliminate this overhead for fixed block size and number of protection domains 32% overall increase in the core area This represents a modest increase in the overall chip area 。 As core occupies only a small fraction of the overall area Hardware Overhead of the Memory Protection Mechanism
HW/SW co-design approach for memory protection Enabling technology for reliable embedded software systems Combine flexibility of software with efficiency of hardware Building blocks for memory protection Memory map checker Control flow manager Practical system with widespread applications Low resource utilization Minimal performance overhead Binary compatible with existing software and tool-chains 。 The software library provides a standard programming interface 。 Doesn’t modify the instruction set architecture of the processor Conclusions
Memory protection suited for low-end microcontrollers Doesn’t static partition of address space 。 Rely on a memory map data structure ownership layout Record ownership and layout info. of the entire address space Doesn’t rewrite binary to introduce run time checks 。 Enhance the “store”, “call”, and “return” instructions to perform run time checks in hardware Hardware Software Co-Design approach Hardware Software Co-Design approach to memory protection The proposed Memory Protection Method Memory Map Checker (STORE instruction extension) Memory Map Checker (STORE instruction extension) Hardware ExtensionsSoftware Routine Memory Map Domain Tracker (CALL instruction extension) Domain Tracker (CALL instruction extension) Domain Tracker (RETURN instruction extension) Domain Tracker (RETURN instruction extension) Jump Table Safe Stack Low cost architecture extension and software library work together to isolate from another Low cost architecture extension and software library work together to isolate from another