Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu
Outline Introduction Code injection, ISR Problem Inefficiency of ISR Tools Diablo, Strata Solution Efficient ISR using Strata Conclusion
Introduction Most exploits involve some kind of code injection Stack overflows Heap overflows etc Several defenses proposed StackGuard Address Space Randomization ISR
Instruction Set Randomization A general defense against code injection Any application code must be decoded before execution Code that has been injected will not be encoded with the proper key, and will fail This approach works regardless of how or where code is injected However, it does not address other exploits that do not inject code i.e. return-to-libc attacks
Disadvantage of ISR High overhead Every instruction must be decoded at run-time Most implementations use an emulator i.e. bochs or Valgrind Emulators can be times slower than native execution, depending on the application Our goal is to use Strata, a software dynamic translation tool, to overcome this overheard
Our Approach Randomize a binary statically before run-time We are using a binary rewriting tool called Diablo Instructions are XORed with a key Derandomize the program at run-time using software dynamic translation We are using a SDT system called Strata All instructions are XORed with the same key
Diablo A retargetable link-time binary rewriting framework. Only works on statically linked programs. Its inputs are the object files and libraries from which the program is built, instead of just the program executable.
How Diablo Works Merges code sections of each object file Disassembles the binary text Builds a control flow graph Does some optimizations Reassembles the assembly code Write to a binary file
How We Can Use Diablo Diablo does a lot of work for us Disassembles binaries Builds control graph Writes binaries Modify Diablo to randomize the codes
Strata A software dynamic translation infrastructure, written in C and assembly Main design by Kevin Scott, Jack Davidson’s PhD student here at UVa So far it has been targeted to SPARC, MIPS, and x86
Strata Imposes a translation layer between the executable and the processor Code fragments are fetched from the binary, optionally translated, and then placed into a fragment cache Processor control is switched to the fragment for direct execution
Software Dynamic Translation
Using Strata Application Linker Strata New Application
Apache Under Strata We can successfully run Apache under Strata with no randomization Performance was tested with a tool called Flood that generates HTTP requests Native execution requests/second Under Strata requests/second 1.39x slowdown
How We Can Use Strata Implement a custom fetch routine Any time an instruction is fetched from the binary, XOR it with the key before placing it into the fragment cache
Implementation Application Diablo Strata key Randomized Application
Issues with Our Approach Strata is designed to be linked in directly with the application Strata must share the process space with the randomized application This introduces several problems Selective randomization Shared libraries Program start up and shut down
Selective Randomization We have to distinguish the code of the application from that of Strata, and only randomize the code of the application. We modified Diablo so that: it remembers the source object file of every code section it checks the source object file before randomization and does nothing if the code is from Strata
Shared Libraries Functions in glibc are called both by the application and by Strata What’s wrong? Everything outside Strata should be randomized If the shared function is randomized, the program will crash when Strata calls it
Shared Libraries Solution Create a separate copy of glibc functions Use objcopy to rename all the names of glibc functions called by Strata The code bloat incurred by separate copy of glibc for Strata is about 400Kb (Strata doesn’t use too much)
Current Status We can run real server applications (Apache) under Strata without randomization We can successfully randomize and derandomize a simple program There are still some sharing issues with real programs
Things Left To Do Get Apache working with ISR Verify that ISR will protect against a buffer overflow attack Collect performance numbers
Insider Attack Our current implementation assumes a remote attack, where the attacker does not have access to the randomized binary The encryption scheme is simple The key is stored in the binary itself We must address these issues in order to protect against an insider attack
How Effective is ISR? Our primary goal was to show an efficient implementation of ISR Strata can also be modified to check the current PC and reject execution of any code on the stack or other illegal location Does this accomplish the same thing at a much lower cost?
ISR vs. Non-Executable Stack There are legitimate uses for an executable stack [Cowan 1998] gcc uses executable stacks for function trampolines for nested functions Linux uses executable user stacks for signal handling Functional programming languages, and some other programs, rely on executable stacks for run-time code generation In these cases, ISR can help us differentiate between injected code and legitimate code on the stack
Conclusions Implementing ISR using emulation is very inefficient Software dynamic translation can be used for a more efficient approach However, implementation can be very tricky due to sharing problems
Questions?