University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week 2012 26 – 27 March 2012.

Slides:



Advertisements
Similar presentations
lld The LLVM Linker Friday, April 13, 2012
Advertisements

UEE072HM Linking HLL and ALP An example on ARM. Embedded and Real-Time Systems We will mainly look at embedded systems –Systems which have the computer.
Computer Abstractions and Technology
Program Development Tools The GNU (GNU’s Not Unix) Toolchain The GNU toolchain has played a vital role in the development of the Linux kernel, BSD, and.
The Functions and Purposes of Translators Code Generation (Intermediate Code, Optimisation, Final Code), Linkers & Loaders.
Code Compaction of an Operating System Kernel Haifeng He, John Trimble, Somu Perianayagam, Saumya Debray, Gregory Andrews Computer Science Department.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Assembler/Linker/Loader Mooly Sagiv html:// Chapter 4.3.
Characteristics of Realtime and Embedded Systems Chapter 1 6/10/20151.
1 UQC122S3 Real-Time and Embedded Systems GCC as a cross compiler.
64bit Development Overview March 28 Microsoft. Objectives Learn about the current 64-bit platforms from a hardware, software and tools perspective Review.
1 Lecture 1  Getting ready to program  Hardware Model  Software Model  Programming Languages  The C Language  Software Engineering  Programming.
Cs238 Lecture 3 Operating System Structures Dr. Alan R. Davis.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
UEE072HM. Embedded and Real-Time Systems We will mainly look at embedded systems –Systems which have the computer system embedded within their application.
X86 ISA Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CS430 – Computer Architecture Lecture - Introduction to Performance
2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Types of software. Sonam Dema..
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
4-1 Chapter 4 - The Instruction Set Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
4-1 Chapter 4 - The Instruction Set Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
CS533 Concepts of Operating Systems Jonathan Walpole.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
4-1 Chapter 4 - The Instruction Set Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Test Specifications A Specification System for Multi-Platform Test Suite Configuration, Build, and Execution Greg Cooksey.
University of Maryland Profile-Driven Selective Program Loading Tugrul Ince Jeff Hollingsworth Department of Computer Science University.
Survey of Program Compilation and Execution Bangor High School Ali Shareef 2/28/06.
Computer Software Types Three layers of software Operation.
Gogul Balakrishnan, Radu Gruian and Thomas Reps Computer Science Dept., Univ. of Wisconsin GrammaTech, Inc. April, 2005 CodeSurfer / x86 A Platform for.
Chapter 1 Introduction. Chapter 1 -- Introduction2  Def: Compiler --  a program that translates a program written in a language like Pascal, C, PL/I,
Full and Para Virtualization
University of Maryland Instrumentation with Relocatable Program Code Tugrul Ince Department of Computer Science University of Maryland, College Park, MD.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Translating Assembly Language to Machine Language.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Representation of Data Binary Representation of Instructions teachwithict.weebly.com.
Hello world !!! ASCII representation of hello.c.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
RealTimeSystems Lab Jong-Koo, Lim
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Computer Architecture & Operations I
Muen Policy & Toolchain
Chapter 1: A Tour of Computer Systems
Microprocessor and Assembly Language
课程名 编译原理 Compiling Techniques
History of compiler development
OS Virtualization.
Compiler Construction
Computer Organization & Compilation Process
CMSC 611: Advanced Computer Architecture
COMPUTER SOFT WARE Software is a set of electronic instructions that tells the computer how to do certain tasks. A set of instructions is often called.
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
CMSC 611: Advanced Computer Architecture
Program Execution in Linux
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Chapter 1 Introduction.
Computer Organization & Compilation Process
Presentation transcript:

University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012

University of Maryland Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance o Binary modification Parsing: first step in most binary analyses o Not straight-forward o Time consuming 2

University of Maryland Objective Improve parsing speed and accuracy Store more data in binary files o Basic block locations o Edge information (source, target, type) Binary analysis tools read this extra information o Create basic block, edge, and finally CFG abstractions 3

University of Maryland Difficulties in Parsing Distinguishing code and data Disassembly is tricky o Identifying functions o Finding instruction boundaries −Variable-length instruction set architectures Building Control Flow Graphs o Identify Basic Block boundaries o Identify edges between basic blocks 4

University of Maryland Compiler Assistance for Parsing Developed new compilation mechanism o Wrappers for GNU compiler suite (gcc/g++) o Transparent to the end user Support most standard flags o Pass flags to underlying system compiler o Intercept output flags (-c, -S, -o, etc.) Augments binary files with tables o Basic Block Table o Edge Table 5

University of Maryland Compiler Infrastructure Analyze intermediate assembly files o Generate information about basic blocks and edges o Store in a section that is not loaded at runtime 6

University of Maryland Basic Block - Edge Tables 7

University of Maryland Assembly Modification Function Model o Block of code o “type o “.size …” Modifications o Add Basic Block and Edge Tables o Add shadow symbol 8

University of Maryland Merge Duplicate Functions Weak functions are merged by linker o Functions included multiple times o Binary code might slightly differ o Only one weak function survives Tables cannot be merged o Need to uniquely match functions and tables o Use shadow symbol in function to extract file name o Use file name and function name to identify tables 9

University of Maryland Reconstruction Binary analysis tools operate on executables directly o No interaction with the compiler 10

University of Maryland Reconstruction Parsing a functions involves: o Finding the shadow symbol stored in the function −File name is extracted o Locating Basic Block and Edge Tables with the function name and file name pair o Reading in the tables o Adding function start address to offsets o Creating basic block and edge abstractions No need to parse individual instructions 11

University of Maryland Evaluation Benchmarks o SPEC CINT2006 o PETSc snes package o Firefox (v ) Systems o 64-bit Linux machines o server: 24-core Intel Xeon, 48 GB total memory o laptop: AMD Turion, 2 GB total memory Methodology o Executed running time experiments 5 times o Reporting mean 12

University of Maryland Normalized Parsing Time SPEC CINT

University of Maryland 14 Normalized Parsing Time PETSc snes Package

University of Maryland 15 Normalized Parsing Time Firefox Version 9.0.1

University of Maryland Build Time Metrics File size increase on disk o Not reflected to memory footprint Small increase in compilation time o One time cost o Not reflected to running time performance 16 File Size Compilation Time Without DebugWith Debug SPEC CINT x1.38x1.25x PETSc1.50x1.09x1.32x Firefox1.17x1.21x1.13x OVERALL1.63x1.23x

University of Maryland Runtime Metrics Virtually no change in runtime metrics o Memory requirement is almost constant o Change in running time is within noise Hard to measure Firefox running time o No workload o Use V8 Benchmark 17 Memory Footprint Running Time SPEC CINT x0.97x PETSc1.00x0.95x Firefox1.00x0.94x OVERALL1.00x0.95x

University of Maryland V8 Benchmarks for Firefox V8: JavaScript benchmark o Higher scores are better o Cannot be converted to time No significant change in performance 18 V8 Benchmark Value Firefox with gcc Firefox with our mechanism2587.6

University of Maryland 19 Limitations / Future Work Hand-written assembly o When branches use offsets in assembly 2n more symbols (n: number of functions) Compilation takes 23% more time o Integrate compilation mechanism into gcc File size increases o Compress tables – about 78% compression ratio

University of Maryland 20 Conclusion Developed a new compilation mechanism o Creates Basic Block and Edge Tables o Transparent to end user Improved parsing speed o On average 73% decrease in parsing time o No memory or runtime overhead