Binary Program Rewriting with Diablo – Bjorn De Sutter – 2006-6-11 Engineering Sciences Faculty – Electronics and Information Systems Department p. 1 Binary.

Slides:



Advertisements
Similar presentations
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Advertisements

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
MIPS ISA-II: Procedure Calls & Program Assembly. (2) Module Outline Review ISA and understand instruction encodings Arithmetic and Logical Instructions.
10/6: Lecture Topics Procedure call Calling conventions The stack
Compiler construction in4020 – lecture 10 Koen Langendoen Delft University of Technology The Netherlands.
Wannabe Lecturer Alexandre Joly inst.eecs.berkeley.edu/~cs61c-te
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Assembler/Linker/Loader Mooly Sagiv html:// Chapter 4.3 J. Levine: Linkers & Loaders
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
Whole-Program Linear-Constant Analysis with Applications to Link-Time Optimization Ludo Van Put – Dominique Chanet – Koen De Bosschere Ghent University.
Code Compaction of an Operating System Kernel Haifeng He, John Trimble, Somu Perianayagam, Saumya Debray, Gregory Andrews Computer Science Department.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Assembler/Linker/Loader Mooly Sagiv html:// Chapter 4.3.
August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Telescoping Languages: A Compiler Strategy for Implementation of High-Level Domain-Specific Programming Systems Ken Kennedy Rice University.
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
Previous finals up on the web page use them as practice problems look at them early.
Chapter 11-14, Appendix D C Programs Higher Level languages Compilers C programming Converting C to Machine Code C Compiler for LC-3 Please return breadboards.
Run time vs. Compile time
1 Lecture 7: Computer Arithmetic Today’s topics:  Chapter 2 wrap-up  Numerical representations  Addition and subtraction Reminder: Assignment 3 will.
Practical Session 8 Computer Architecture and Assembly Language.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
ARM C Language & Assembler. Using C instead of Java (or Python, or your other favorite language)? C is the de facto standard for embedded systems because.
Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah.
CS2422 Assembly Language and System Programming High-Level Language Interface Department of Computer Science National Tsing Hua University.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
6.828: PC hardware and x86 Frans Kaashoek
Evolution of Programming Languages Generations of PLs.
A Model for Self-Modifying Code Bertrand Anckaert, Matias Madou and Koen De Bosschere 8 th Information Hiding Conference, July 11 th 2006.
Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
Code Optimization 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a.
Topic 2d High-Level languages and Systems Software
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
April 23, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Assemblers, Linkers, and Loaders * Jeremy R. Johnson Mon. April 23,
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Program Obfuscation: A Quantitative Approach Presented by: Mariusz Jakubowski Microsoft Research Third Workshop on Quality of Protection October 29 th,
© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.
Compacting ARM binaries with the Diablo framework – Dominique Chanet & Ludo Van Put Compacting ARM binaries with the Diablo framework Dominique Chanet.
Linking I Topics Assembly and symbol resolution Static linking Systems I.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
Prologue Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Lecture 3 Translation.
Writing Functions in Assembly
Instruction Set Architecture
Assemblers, linkers, loaders
Names and Attributes Names are a key programming language feature
Microprocessor and Assembly Language
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Introduction to Compilers Tim Teitelbaum
Chapter 1: Introduction to Compiling (Cont.)
Writing Functions in Assembly
The HP OpenVMS Itanium® Calling Standard
Topic 2e High-Level languages and Systems Software
Computer Architecture and Assembly Language
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Efficient x86 Instrumentation:
Lecture 4: Instruction Set Design/Pipelining
Program Assembly.
Computer Architecture and Assembly Language
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Binary Program Rewriting with Diablo – Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 1 Binary Program Rewriting with Diablo Bjorn De Sutter Ghent University PLDI06, June 06, 2006

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 2 Credits Bruno De Bus Dominique Chanet Ludo Van Put Matias Madou Bertrand Anckaert Koen De Bosschere

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 3 Overview Some background Diablo –Extensibility –Retargetability –Reliability

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 4 Overview INTRODUCTION (45 min) DATASTRUCTURES (1 hr) ANALYSES AND TRANSFORMATIONS (45 min) BACKENDS (30 min)

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 5 Program Development Assembly Code Assembly Code compiler Assembly Code Assembly Code Assembly Code Assembly Code Assembly code Assembly code Code 1 Data 1 Meta 1 Code 1 Data 1 Meta 1 Code 1 Data 1 Meta 1 assembler Executable linker Broncode programming Broncode Source code

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 6 Linking Linker Data 1 Code 1 Meta 1 Code 2 Data 2 Meta 2 Code 3 Data 3 Meta 3 start print stop Data 1 Code 1 Code 2 Code 3 Data 2 Data 3

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 7 Program Development Assembly Code Assembly Code compiler Assembly Code Assembly Code Assembly Code Assembly Code Assembly code Assembly code Code 1 Data 1 Meta 1 Code 1 Data 1 Meta 1 Code 1 Data 1 Meta 1 assembler Executable linker Broncode programming Broncode Source code Compacted Executable Compacted Executable Squeeze++

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 8 Exec Additional optimization opportunities? Squeeze++ Exec Meta linker

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 9 Code size reduction obtained with optimization and code abstraction mediaintfpC++ Results (Squeeze++) [De Sutter, De Bus and De Bosschere, ACM TOPLAS, Sept 2005]

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 10 Exec Problems Squeeze++ Exec Meta linker Only for the Alpha architecture. Only for compaction/optimization Small change implies days of debugging Not retargetable Not extensible Not reliable

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 11 Other rewriters? Exec OM Code Data Meta Code Data Meta Code Data Meta Same problems: - not retargetable - not extensible - not reliable

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 12 Spike Tru Epoxie Wall92 OM mips Srivastava92 Mahler Wall89 OM alpha Srivastava94 OM ATOM Srivastava94 Spike NT Cohn97 OM's evolution

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 13 Alto Squeeze++ Squeeze Plto Ilto NNOT Alto's Evolution

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 14 static binary rewriting is useful… Applications - Optimization, compaction - Instrumentation - Obfuscation - Program understanding, visualisation - Debugging -... Problems - Not retargetable - Not extensible - Not reliable but it is a bit problematic…

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 15 Overview Some background Diablo –Extensibility –Retargetability –Reliability

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 16 Pre-link or post-link? + More meta information = more aggressive transformations - No program overview + More meta information = more aggressive transformations - No program overview Pre-link - Less meta information = more conservative transformations + Whole-program overview - Less meta information = more conservative transformations + Whole-program overview Post-link + More meta information + Whole-program overview - More implementation work + More meta information + Whole-program overview - More implementation work Link - time Exec Diablo Code Data Meta Code Data Meta Code Data Meta Exec

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 17 Extensibility – The problem compactor instrumentator obfuscator performance optimizer

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 18 Exec Diablo Code Data Meta Code Data Meta Code Data Meta Frontend optimizatio n Frontend optimizatio n Exec Instructions Creates Instructions Schedules Instructions... ROFs Reads object files Links them Writes a program... Analyses liveness constant propagation.... Optimizations dead code elimination inlining.... CFG Creates functions Removes Basic Blocks.... Designed for extensibility Frontend instrumentation Frontend instrumentation

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 19 Code Data Meta Code Data Meta Code Data Meta linking Code Data Code Data Code Data Operation at different levels Data disassembly CFG construction Exec Layout; assembling Frontend

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 20 Linux kernel specialization frontend Application optimization and compaction frontend Extensibility Diablo kDiablo LCTES'04 LCTES'05 Instrumentation frontend FIT PASTE'04 Interactive binary program editor Lancet PASTE'05 Steganography frontend Stilo ICISC'04 Interactive program obfuscator Loco PEPM'05

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 21 Overview Some background Diablo –Extensibility –Retargetability –Reliability

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 22 Object formats Linkers Architectures Retargetability - The problem Retargetability

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 23 Code Data Meta Code Data Meta Code Data Meta Code Data Code Data Code Data Operation at different levels Data Exec Layout; assembling Frontend linking disassembly CFG construction

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 24 Code Data Meta Code Data Meta Code Data Meta linking Code Data Code Data Code Data Retargeting multiple ROF & Linkers Data disassembly CFG construction Exec Layout; assembling Code Data Meta Code Data Meta Code Data Meta Object file backend Linker map & linker script Architecture Backend Layout script & layout code Exe backend

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 25 Major implemented backends Diablo Diablo, FIT, kDiablo, Lancet ARM MIPS32 LCTES'04 ESA'04 Diablo, FIT, Stilo, kDiablo, Lancet, Loco x86 Diablo, FIT Alpha Diablo IA64 Europar'04 LCTES'05

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 26 Overview Some background Diablo –Extensibility –Retargetability –Reliability

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 27 Reliability CFG construction Relocation Data flow analysis Reliability - The Problem Conservative for safety Aggressive, precise enough to be useful

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 28 CFG Construction Potential problems: - Indirect control flow transfers - Code that is treated as data - Unrealizable paths (procedures) Solutions: - Use relocation information: identifies computable addresses - Use pattern matching: identifies known address computations - Use knowledge on compiler-generated code Potential problems: - Indirect control flow transfers - Code that is treated as data - Unrealizable paths (procedures) Solutions: - Use relocation information: identifies computable addresses - Use pattern matching: identifies known address computations - Use knowledge on compiler-generated code Potential problems: - Differentiate data from code - Detect self-modifying code - Detect unrewritable code Solutions: - Section information - Symbols annotate data in code (ARM ABI) - Self-modifying code in data: no problem at this point - True self-modifying code: look at system calls and protection Potential problems: - Differentiate data from code - Detect self-modifying code - Detect unrewritable code Solutions: - Section information - Symbols annotate data in code (ARM ABI) - Self-modifying code in data: no problem at this point - True self-modifying code: look at system calls and protection DisassemblingConservatively modelling control flow

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 29 Detecting Data 0x0080: 0x34900abc 0x0084: 0x x0088: 0xe25437ff 0x008c: 0xc x0090: 0x x0094: 0x858599bd 0x0098: 0x9912f4d7 0x009c: 0xde8723c4 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: 0x6798fedc 0x0080: mov r2, 0x0a0 0x0084: cmp r1, $0 0x0088: jl 0x0b4 0x008c: cmp r1, 5 0x0090: jge 0x0b4 0x0094: add r1, r2, r1 0x0098: ldr r1, [r1] 0x009c: jmp r1 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: mov r3, r5 $code $data Solution: add mapping symbols Solution: add mapping symbols

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 30 Detecting Control Flow Targets 0x0080: 0x34900abc 0x0084: 0x x0088: 0xe25437ff 0x008c: 0xc x0090: 0x x0094: 0x858599bd 0x0098: 0x9912f4d7 0x009c: 0xde8723c4 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: 0x6798fedc 0x0080: mov r2, 0x0a0 0x0084: cmp r1, $0 0x0088: jl 0x0b4 0x008c: cmp r1, 5 0x0090: jge 0x0b4 0x0094: add r1, r2, r1 0x0098: ldr r1, [r1] 0x009c: jmp r1 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: mov r3, r5... 0x00d0: add r4, r6, r6... 0x0120: ldr r4, [r5] $code $data Direct control flow: trivial Direct control flow: trivial Indirect control flow: only to code-addresses that are targets of relocations! jmp r1 Problem: what about unrelocated computations on code- addresses?

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 31 Control Flow: Pattern Matching 0x0080: 0x34900abc 0x0084: 0x x0088: 0xe25437ff 0x008c: 0xc x0090: 0x x0094: 0x858599bd 0x0098: 0x9912f4d7 0x009c: 0xde8723c4 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: 0x6798fedc 0x0080: mov r2, 0x0a0 0x0084: cmp r1, $0 0x0088: jl 0x0b4 0x008c: cmp r1, 5 0x0090: jge 0x0b4 0x0094: add r1, r2, r1 0x0098: ldr r1, [r1] 0x009c: jmp r1 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: mov r3, r5... 0x00d0: add r4, r6, r6... 0x0120: ldr r4, [r5] $code $data Use pattern matching to improve accuracy of control flow graph: disallow computations on code-addresses that are not part of a recognized pattern

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 32 Pattern Matching: example 0x0080: 0x34900abc 0x0084: 0x x0088: 0xe25437ff 0x008c: 0xc x0090: 0x x0094: 0x858599bd 0x0098: 0x9912f4d7 0x009c: 0xde8723c4 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: 0x6798fedc 0x0080: mov r2, 0x0a0 0x0084: cmp r1, $0 0x0088: jl 0x0b4 0x008c: cmp r1, 5 0x0090: jge 0x0b4 0x0094: add r1, r2, r1 0x0098: ldr r1, [r1] 0x009c: jmp r1 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: mov r3, r5... 0x00d0: add r4, r6, r6... 0x0120: ldr r4, [r5] $code $data switch statement: lower bounds check upper bounds check switch jump jump table switch statement: lower bounds check upper bounds check switch jump jump table

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 33 Pattern Matching: example 2 0x0080: 0x34900abc 0x0084: 0x x0088: 0xe25437ff 0x008c: 0xc x0090: 0x x0094: 0x858599bd 0x0098: 0x9912f4d7 0x009c: 0xde8723c4 0x00a0: 0x x00a4: 0x c 0x00a8: 0x000000d0 0x00ac: 0x x00b0: 0x x00b4: 0x6798fedc 0x0080: mov r2, 0x09c 0x0084: cmp r1, $0 0x0088: jl 0x0b4 0x008c: cmp r1, $5 0x0090: jge 0x0b4 0x0094: add r1, r2, r1 0x0098: jmp r1 0x009c: jmp 0x0120 0x00a0: jmp 0x012c 0x00a4: jmp 0x00d0 0x00a8: jmp 0x0248 0x00ac: jmp 0x0210 0x00b0: nop 0x00b4: mov r3, r5... 0x00d0: add r4, r6, r6... 0x0120: ldr r4, [r5] $code switch statement 2: address table is replaced by a series of direct jumps to the switch cases. unrecognized pattern! switch statement 2: address table is replaced by a series of direct jumps to the switch cases. unrecognized pattern! Solution: add pattern to Diablo

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 34 Procedure Calls and Returns Function calls and returns are often just “special” indirect jumps: not recognizing them makes the flow graph much too conservative Solution: use pattern matching to recognize them: - rely on ABI - rely on compiler conventions Function calls and returns are often just “special” indirect jumps: not recognizing them makes the flow graph much too conservative Solution: use pattern matching to recognize them: - rely on ABI - rely on compiler conventions ARM indirect procedure call: mov r14, pc mov pc, r2 ARM procedure return: mov pc, r14 or ldr pc, [r13], #4 or ldmia r13, {r4-r7,r15}!

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 35 Reliability CFG construction Relocation Data flow analysis Reliability - The Problem 

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 36 Data flow analyses Problem - Difficult to analyse - Necessary to improve precision - Especially for C++-like languages (calls through function pointers) Solution - rely on calling-conventions - use symbol information - use mapping symbols - use source code information - use stack unwind information Problem - Difficult to analyse - Necessary to improve precision - Especially for C++-like languages (calls through function pointers) Solution - rely on calling-conventions - use symbol information - use mapping symbols - use source code information - use stack unwind information Stack analysis

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 37 int B(int x) { return x * 2; } int C(int x) { return B(x)*2; } int B(int x) { return x * 2; } int C(int x) { return B(x)*2; } B.c Calling convention adherence extern int B(int x); int A(int x) { return B(x); } extern int B(int x); int A(int x) { return B(x); } A.c B is unknown call to B respects calling conventions B is unknown call to B respects calling conventions C is known but A is unknown B respects calling convention C is known but A is unknown B respects calling convention

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 38 static int B(int x) { return x * 2; } int C(int x) { return B(x)*2; } static int B(int x) { return x * 2; } int C(int x) { return B(x)*2; } B.c Calling convention adherence C is known no unknown callers of B B does not need to respect calling convention C is known no unknown callers of B B does not need to respect calling convention

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 39 B: shl %ecx, #1 ret B: shl %ecx, #1 ret B.s Calling convention adherence extern int B(int x); int A(int x) { int y; asm(“ movl %ecx, x call B movl y, %ecx "); return y; } extern int B(int x); int A(int x) { int y; asm(“ movl %ecx, x call B movl y, %ecx "); return y; } A.c even though B is global, the programmer has control over all call sites B does not need to adhere to calling conventions even though B is global, the programmer has control over all call sites B does not need to adhere to calling conventions solution: identify assembler code through mapping symbols (for inline assembler) and object file header info solution: identify assembler code through mapping symbols (for inline assembler) and object file header info

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 40 Reliability CFG construction Relocation Data flow analysis Reliability - The Problem  

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 41 Relocation Problem - How to write a correct program? - How to layout data? - How to update pointers? - How to update addresses? Observation - Most “strange” requirements come from linker manipulations Solution - make relocations expressive - make relocations first class objects - let transformations update relocations - use linker scripts Problem - How to write a correct program? - How to layout data? - How to update pointers? - How to update addresses? Observation - Most “strange” requirements come from linker manipulations Solution - make relocations expressive - make relocations first class objects - let transformations update relocations - use linker scripts Producing binary program again

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 42 Overview Some background Diablo –Extensibility –Retargetability –Reliability (no, really now)

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 43 Reliability CFG construction Relocation Data flow analysis Reliability - The Real Problem Conservative for safety Aggressive, precise enough to be useful

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 44 Reliability CFG construction Reliability - The Real Problem Conservative for safety Aggressive, precise enough to be useful Solution: limit most conservative assumptions to parts of program

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 45 Code Data Meta Code Data Meta Code Data Meta Code Data Code Data Code Data Limit imprecision to some parts Data Exec

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 46 Code Data Meta Code Data Meta Code Data Meta Code Data Code Data Code Data Limit imprecision to some parts Data Exec Code

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 47 What program parts? Sections from object files –only refer to each other via symbols –special code addresses identified by relocations –extend relocations where necessary no relaxation annotate PIC code with relocations if necessary mark data...

Binary Program Rewriting with Diablo - Bjorn De Sutter – Engineering Sciences Faculty – Electronics and Information Systems Department p. 48 When/why does this work? Under separate compilation –Partial-separate compilation –Compiler-generated code only, not manually- written assembler Compiler needs to maintain conventions Assembly writers do not know compiler- generated code –Because multiple compiler versions are available Whenever imprecision could become viral, the linker (rewriter) is informed!