Optimizing Your Dyninst Program

Slides:



Advertisements
Similar presentations
Week 8 Stack and Subroutines. Stack  The stack is a section of data memory (or RAM) that is reserved for storage of temporary data  The data may represent.
Advertisements

INSTRUCTION SET ARCHITECTURES
University of Maryland Smarter Code Generation for Dyninst Nick Rutar.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 2: IT Students.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
Machine-Level Programming III: Procedures Apr. 17, 2006 Topics IA32 stack discipline Register saving conventions Creating pointers to local variables CS213.
– 1 – , F’02 ICS05 Instructor: Peter A. Dinda TA: Bin Lin Recitation 4.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Overview C programming Environment C Global Variables C Local Variables Memory Map for a C Function C Activation Records Example Compilation.
1. 2 FUNCTION INLINE FUNCTION DIFFERENCE BETWEEN FUNCTION AND INLINE FUNCTION CONCLUSION 3.
Stacks and Frames Demystified CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
1 CSC 2405: Computer Systems II Spring 2012 Dr. Tom Way.
Programming Models CT213 – Computing Systems Organization.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training module provides an overview of optimization techniques used in.
Assembly Questions תרגול 12.
Programmer's view on Computer Architecture by Istvan Haller.
Compiler Construction
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
Languages and the Machine Chapter 5 CS221. Topics The Compilation Process The Assembly Process Linking and Loading Macros We will skip –Case Study: Extensions.
Race Conditions Defined 1. Every data structure defines invariant conditions. defines the space of possible legal states of the structure defines what.
Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C Chapter 7 – Subroutines These are lecture notes to accompany the book SPARC Architecture,
13-Nov-15 (1) CSC Computer Organization Lecture 7: Input/Output Organization.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
November 2005 New Features in Paradyn and Dyninst Matthew LeGendre Ray Chen
by Richard P. Paul, 2nd edition, 2000.
Machine-level Programming III: Procedures Topics –IA32 stack discipline –Register saving conventions –Creating pointers to local variables.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Chapter 5 Index and Clustering
CS 155 Section 1 PP1 Eu-Jin Goh. Setting up Environment Demo.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Subprograms - implementation. Calling a subprogram  transferring control to a subprogram: save conditions in calling program pass parameters allocate.
Calling Procedures C calling conventions. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
CSc 453 Linking and Loading
Preocedures A closer look at procedures. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
Binding & Dynamic Linking Presented by: Raunak Sulekh(1013) Pooja Kapoor(1008)
F453 Module 8: Low Level Languages 8.1: Use of Computer Architecture.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Section 5: Procedures & Stacks
Efficient Instrumentation for Code Coverage Testing
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Performance Optimizations in Dyninst
F453 Computing Questions and Answers
Chapter 12 Variables and Operators
143A: Principles of Operating Systems Lecture 4: Calling conventions
Chapter 5 Conclusion CIS 61.
The Stack & Procedures CSE 351 Spring 2017
Chapter 7 Subroutines Dr. A.P. Preethy
Machine-Level Programming 4 Procedures
A configurable binary instrumenter
Optimizing Malloc and Free
CSCE Fall 2013 Prof. Jennifer L. Welch.
Lecture 30 (based on slides by R. Bodik)
Machine-Level Programming III: Procedures Sept 18, 2001
Understanding Program Address Space
by Richard P. Paul, 2nd edition, 2000.
Homework Any Questions?.
Runtime Environments What is in the memory?.
Where is all the knowledge we lost with information? T. S. Eliot
02/02/10 20:53 Assembly Questions תרגול 12 1.
Lecture 4: Instruction Set Design/Pipelining
CSc 453 Final Code Generation
Dynamic Binary Translators and Instrumenters
Topic 2b ISA Support for High-Level Languages
Return-to-libc Attacks
Presentation transcript:

Optimizing Your Dyninst Program Optimizing Your Dyninst Programs Optimizing Your Dyninst Program Matthew LeGendre legendre@cs.wisc.edu Wandering titles Cut out complete sentences, highlight in yellow Inconsistent instrumentation Matthew LeGendre

Optimizing Your Dyninst Programs Optimizing Dyninst Dyninst is being used to insert more instrumentation into bigger programs. For example: Instrumenting every memory instruction Working with binaries 200MB in size Performance is a big consideration What can we do, and what can you do to help? Matthew LeGendre

Performance Problem Areas Optimizing Your Dyninst Programs Performance Problem Areas Parsing: Analyzing the binary and reading debug information. Instrumenting: Rewriting the binary to insert instrumentation. Runtime: Instrumentation slows down a mutatee at runtime. Matthew LeGendre

Optimizing Your Dyninst Programs Optimizing Dyninst Programmer Optimizations Telling Dyninst not to output tramp guards. Dyninst Optimizations Reducing the number of registers saved around instrumentation. More examples (parsing) Matthew LeGendre

Optimizing Your Dyninst Programs Parsing Overview Control Flow Identifies executable regions Data Flow Analyzes code prior to instrumentation Debug Reads debugging information, e.g. line info Symbol Reads from the symbol table Lazy Parsing: Not parsed until it is needed Remove lazy parsing define? Rephrase? Matthew LeGendre

Optimizing Your Dyninst Programs Control Flow Dyninst needs to analyze a binary before it can instrument it. Identifies functions and basic blocks Granularity Parses all of a module at once. Triggers Operating on a BPatch_function Requesting BPatch_instPoint objects Performing Stackwalks (on x86) Explicititly say “triggers”, “granularity” Picture of module/function? Matthew LeGendre

Optimizing Your Dyninst Programs Data Flow Dyninst analyzes a function before instrumenting it. Live register analysis Reaching allocs on IA-64 Granularity Analyzes a function at a time. Triggers The first time a function is instrumented Little redudency here. Use explicit labels. Matthew LeGendre

Optimizing Your Dyninst Programs Debug Information Reads debug information from a binary. Granularity Parses all of a module at once. Triggers Line number information Type information Local variable information Mention “Optional” Allows you to “avoid or defer” Matthew LeGendre

Optimizing Your Dyninst Programs Symbol Table Extracts function and data information from the symbol Granularity Parses all of a module at once. Triggers Not done lazily. At module load. Mention “Optional” Allows you to “avoid or defer” Matthew LeGendre

Optimizing Your Dyninst Programs Lazy Parsing Overview Granularity Triggered By Control Flow Module BPatch_function Queries Data Flow Function Instrumentation Debug Debug Info Queries Symbol Automatically Talking point, symbol table parsing is cheap earlier. Lazy parsing allows you to avoid or defer costs. Matthew LeGendre

Inserting Instrumentation Optimizing Your Dyninst Programs Inserting Instrumentation What happens when we re-instrument a function? foo: 0x1000: push ebp 0x1001: movl esp,ebp 0x1002: push $1 0x1004: call bar 0x1005: leave 0x1006: ret foo: 0x2000: jmp entry_instr 0x2005: push ebp 0x2006: movl esp,ebp 0x2007: push $1 0x2009: call bar 0x200F: leave 0x2010: ret foo: 0x3000: jmp entry_instr 0x3005: push ebp 0x3006: movl esp,ebp 0x3007: push $1 0x3009: jmp call_instr 0x300F: call bar 0x3014: leave 0x3015: ret foo: 0x4000: jmp entry_instr 0x4005: push ebp 0x4006: movl esp,ebp 0x4007: push $1 0x4009: jmp call_instr 0x400F: call bar 0x4014: leave 0x4015: jmp exit_instr 0x401A: ret Use easier to read font. Better highlighting of changes. Laser pointer them. Draw attention away from content, summarize what will talk about Matthew LeGendre

Inserting Instrumentation Optimizing Your Dyninst Programs Inserting Instrumentation Bracket instrumentation requests with: beginInsertionSet() . endInsertionSet() Batches instrumentation Allows for transactional instrumentation Improves efficiency (rewrite) Matthew LeGendre

Optimizing Your Dyninst Programs Runtime Overhead Two factors determine how instrumentation slows down a program. What does the instrumentation cost? Increment a variable Call a cache simulator How often does instrumentation run? Every time read/write are called Every memory instruction Additional Dyninst overhead on each instrumentation point. Frequent instrumentation needs to be inexpensive Matthew LeGendre

Runtime Overhead - Basetramps Optimizing Your Dyninst Programs Runtime Overhead - Basetramps A Basetramp save all GPR save all FPR t = DYNINSTthreadIndex() if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false } restore all FPR restore all GPR Save Registers Calculate Thread Index Check Tramp Guards Run All Minitramps Boldify Save GPR -> save All GPR Move box down, caption below Cross out things as deleted Restore Registers Matthew LeGendre

Runtime Overhead – Registers Optimizing Your Dyninst Programs Runtime Overhead – Registers A Basetramp save all GPR save all FPR t = DYNINSTthreadIndex() if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false } restore all FPR restore all GPR Analyzes minitramps for register usage. Analyzes functions for register liveness. Only saves what is live and used. Matthew LeGendre

Runtime Overhead – Registers Optimizing Your Dyninst Programs Runtime Overhead – Registers A Basetramp Called functions are recursively analyzed to a max call depth of 2. save all GPR save all FPR t = DYNINSTthreadIndex() if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false } restore all FPR restore all GPR minitramp call foo() call bar() Matthew LeGendre

Runtime Overhead – Registers Optimizing Your Dyninst Programs Runtime Overhead – Registers A Basetramp Use shallow function call chains under instrumentation, so Dyninst can analyze all reachable code. Use BPatch::setSaveFPR() to disable all floating point saves. save live GPR t = DYNINSTthreadIndex() if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false } restore live GPR Save some GPR -> save live GPR Matthew LeGendre

Runtime Overhead – Tramp Guards Optimizing Your Dyninst Programs Runtime Overhead – Tramp Guards A Basetramp Prevents recursive instrumentation. Needs to be thread aware. save live GPR t = DYNINSTthreadIndex() if (!guards[t]) { guards[t] = true jump to minitramps guards[t] = false } Restore live GPR Matthew LeGendre

Runtime Overhead – Tramp Guards Optimizing Your Dyninst Programs Runtime Overhead – Tramp Guards A Basetramp Build instrumentation that doesn’t make function calls (no BPatch_funcCallExpr snippets) Use setTrampRecursive() if you’re sure instrumentation won’t recurse. save live GPR t = DYNINSTthreadIndex() jump to minitramps restore live GPR Matthew LeGendre

Runtime Overhead – Threads Optimizing Your Dyninst Programs Runtime Overhead – Threads A Basetramp Returns an index value (0..N) unique to the current thread. Used by tramp guards and for thread local storage by instrumentation Expensive save live GPR t = DYNINSTthreadIndex() jump to minitramps restore live GPR Matthew LeGendre

Runtime Overhead – Threads Optimizing Your Dyninst Programs Runtime Overhead – Threads A Basetramp Not needed if there are no tramp guards. Only used on mutatees linked with a threading library (e.g. libpthread) save live GPR jump to minitramps restore live GPR Matthew LeGendre

Runtime Overhead – Minitramps Optimizing Your Dyninst Programs Runtime Overhead – Minitramps A Basetramp Minitramps contain the actual instrumentation. What can we do with minitramps? save live GPR jump to minitramps restore live GPR Matthew LeGendre

Runtime Overhead – Minitramps Optimizing Your Dyninst Programs Runtime Overhead – Minitramps Minitramp A Created by our code generator, which assumes a RISC like architecture. Instrumentation linked by jumps. //Increment var by 1 load var -> reg reg = reg + 1 store reg -> var jmp Minitramp B Minitramp B //Call foo(arg1) push arg1 call foo jmp BaseTramp Matthew LeGendre

Runtime Overhead – Minitramps Optimizing Your Dyninst Programs Runtime Overhead – Minitramps Minitramp A New code generator recognizes common instrumentation snippets and outputs CISC instructions. Works on simple arithmetic, and stores. //Increment var by 1 load var -> reg reg = reg + 1 store reg -> var jmp Minitramp B //Increment var by 1 inc var jmp Minitramp B Minitramp B //Call foo(arg1) push arg1 call foo jmp BaseTramp Matthew LeGendre

Runtime Overhead – Minitramps Optimizing Your Dyninst Programs Runtime Overhead – Minitramps Minitramp A Mergetramp New merge tramps combine minitramps together with basetramp. Faster execution, slower re-instrumentation. Change behavior with BPatch::setMergeTramp //Increment var by 1 inc var jmp Minitramp B //Increment var by 1 load var -> reg reg = reg + 1 store reg -> var jmp Minitramp B //Increment var by 1 inc var jmp Minitramp B Minitramp B //Call foo(arg1) push arg1 call foo jmp BaseTramp Matthew LeGendre

Optimizing Your Dyninst Programs Runtime Overhead Where does the Dyninst’s runtime overhead go? 87% Calculating thread indexes 12% Saving registers 1% Trampoline Guards Dyninst allows inexpensive instrumentation to be inexpensive. Piechart Inexp -> Inexp is stupid Matthew LeGendre

Optimizing Your Dyninst Programs Summary Make use of lazy parsing Use insertion sets when inserting instrumentation. Small, easy to understand snippets are easier for Dyninst to optimize. Try to avoid function calls in instrumentation. Matthew LeGendre

Optimizing Your Dyninst Programs Questions? Matthew LeGendre