University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012
University of Maryland Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance o Binary modification Parsing: first step in most binary analyses o Not straight-forward o Time consuming 2
University of Maryland Objective Improve parsing speed and accuracy Store more data in binary files o Basic block locations o Edge information (source, target, type) Binary analysis tools read this extra information o Create basic block, edge, and finally CFG abstractions 3
University of Maryland Difficulties in Parsing Distinguishing code and data Disassembly is tricky o Identifying functions o Finding instruction boundaries −Variable-length instruction set architectures Building Control Flow Graphs o Identify Basic Block boundaries o Identify edges between basic blocks 4
University of Maryland Compiler Assistance for Parsing Developed new compilation mechanism o Wrappers for GNU compiler suite (gcc/g++) o Transparent to the end user Support most standard flags o Pass flags to underlying system compiler o Intercept output flags (-c, -S, -o, etc.) Augments binary files with tables o Basic Block Table o Edge Table 5
University of Maryland Compiler Infrastructure Analyze intermediate assembly files o Generate information about basic blocks and edges o Store in a section that is not loaded at runtime 6
University of Maryland Basic Block - Edge Tables 7
University of Maryland Assembly Modification Function Model o Block of code o “type o “.size …” Modifications o Add Basic Block and Edge Tables o Add shadow symbol 8
University of Maryland Merge Duplicate Functions Weak functions are merged by linker o Functions included multiple times o Binary code might slightly differ o Only one weak function survives Tables cannot be merged o Need to uniquely match functions and tables o Use shadow symbol in function to extract file name o Use file name and function name to identify tables 9
University of Maryland Reconstruction Binary analysis tools operate on executables directly o No interaction with the compiler 10
University of Maryland Reconstruction Parsing a functions involves: o Finding the shadow symbol stored in the function −File name is extracted o Locating Basic Block and Edge Tables with the function name and file name pair o Reading in the tables o Adding function start address to offsets o Creating basic block and edge abstractions No need to parse individual instructions 11
University of Maryland Evaluation Benchmarks o SPEC CINT2006 o PETSc snes package o Firefox (v ) Systems o 64-bit Linux machines o server: 24-core Intel Xeon, 48 GB total memory o laptop: AMD Turion, 2 GB total memory Methodology o Executed running time experiments 5 times o Reporting mean 12
University of Maryland Normalized Parsing Time SPEC CINT
University of Maryland 14 Normalized Parsing Time PETSc snes Package
University of Maryland 15 Normalized Parsing Time Firefox Version 9.0.1
University of Maryland Build Time Metrics File size increase on disk o Not reflected to memory footprint Small increase in compilation time o One time cost o Not reflected to running time performance 16 File Size Compilation Time Without DebugWith Debug SPEC CINT x1.38x1.25x PETSc1.50x1.09x1.32x Firefox1.17x1.21x1.13x OVERALL1.63x1.23x
University of Maryland Runtime Metrics Virtually no change in runtime metrics o Memory requirement is almost constant o Change in running time is within noise Hard to measure Firefox running time o No workload o Use V8 Benchmark 17 Memory Footprint Running Time SPEC CINT x0.97x PETSc1.00x0.95x Firefox1.00x0.94x OVERALL1.00x0.95x
University of Maryland V8 Benchmarks for Firefox V8: JavaScript benchmark o Higher scores are better o Cannot be converted to time No significant change in performance 18 V8 Benchmark Value Firefox with gcc Firefox with our mechanism2587.6
University of Maryland 19 Limitations / Future Work Hand-written assembly o When branches use offsets in assembly 2n more symbols (n: number of functions) Compilation takes 23% more time o Integrate compilation mechanism into gcc File size increases o Compress tables – about 78% compression ratio
University of Maryland 20 Conclusion Developed a new compilation mechanism o Creates Basic Block and Edge Tables o Transparent to end user Improved parsing speed o On average 73% decrease in parsing time o No memory or runtime overhead