Efficient x86 Instrumentation:

Slides:

Advertisements

Similar presentations

Part IV: Memory Management

Advertisements

ITEC 352 Lecture 25 Memory(3). Review Questions RAM –What is the difference between register memory, cache memory, and main memory? –What connects the.

David Brumley Carnegie Mellon University Credit: Some slides from Ed Schwartz.

Chapter 2: Memory Management, Early Systems

The Assembly Language Level

ITEC 352 Lecture 27 Memory(4). Review Questions? Cache control –L1/L2  Main memory example –Formulas for hits.

Paradyn Project Paradyn / Dyninst Week College Park, Maryland March 26-28, 2012 Self-propelled Instrumentation Wenbin Fang.

Chapter 7: Subroutines Lecture notes to accompany the text book SPARC Architecture, Assembly Language Programming, and C, by Richard P. Paul, 2 nd edition,

1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:

Processes CSCI 444/544 Operating Systems Fall 2008.

1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.

Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.

Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.

Address Space Layout Permutation

Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.

PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.

Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison

Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.

March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin.

Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C Chapter 7 – Subroutines These are lecture notes to accompany the book SPARC Architecture,

COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.

© 2004 Andrew R. BernatApril 14, 2004Dynamic Call-Path Profiling Incremental Call-Path Profiling Andrew Bernat Computer Sciences Department.

© 2001 Barton P. MillerParadyn/Condor Week (12 March 2001, Madison/WI) The Paradyn Port Report Barton P. Miller Computer Sciences Department.

by Richard P. Paul, 2nd edition, 2000.

JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University.

Efficient Software Based Fault Isolation Author: Robert Wahobe,Steven Lucco,Thomas E Anderson, Susan L Graham Presenter: Maitree kanungo Date:02/17/2010.

University of Maryland Instrumentation with Relocatable Program Code Tugrul Ince Department of Computer Science University of Maryland, College Park, MD.

© 2001 B. P. Miller & M. Livny (12-14 March 2001)Paradyn/Condor Week Agenda Paradyn/Condor Week 2001 Barton P. Miller Miron Livny

© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.

13/July/1999Third USENIX Windows NT Symposium1 Detours: Binary Interception of Win32 Functions Galen Hunt and Doug Brubacher Systems and Networking Group.

© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.

Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat

Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.

Datapath and control Dr. ir. A.B.J. Kokkeler 1. What is programming ? “Programming is instructing a computer to do something for you with the help of.

Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore

1 Contents: 3.1 Instruction format and Addressing Modes 3.2 Instruction Introduction Chapter 3 Instruction system.

Remix: On-demand Live Randomization

Kernel Code Coverage Nilofer Motiwala Computer Sciences Department

iProbe: A Lightweight User- Space Instrumentation Tool

Memory Management.

Non Contiguous Memory Allocation

Performance Optimizations in Dyninst

Chapter 9 – Real Memory Organization and Management

Subroutines and the Stack

Disk Drive Fragmentation

Improving Program Efficiency by Packing Instructions Into Registers

Chapter 7 Subroutines Dr. A.P. Preethy

Subroutine Call; Stack

Computer-System Architecture

Module 2: Computer-System Structures

Lesson Objectives Aims Key Words Compiler, interpreter, assembler

CS399 New Beginnings Jonathan Walpole.

by Richard P. Paul, 2nd edition, 2000.

Lecture 3: Main Memory.

Week 2: Buffer Overflow Part 2.

Module 2: Computer-System Structures

Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.

Machine Independent Assembler Features

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Subroutines and the Stack

Machine Independent Assembler Features

Chapter 11 Processor Structure and function

Module 2: Computer-System Structures

Module 2: Computer-System Structures

COMP755 Advanced Operating Systems

Computer Operation 6/22/2019.

Dynamic Binary Translators and Instrumenters

Presentation transcript:

Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Itai Gurari gurari@cs.wisc.edu Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685 Paradyn/Condor Week Madison, WI March 12-14, 2001

Introduction Dynamic Instrumentation: Insert instrumentation into application in execution Used by Paradyn to gather performance data Paradyn instrumentation is inserted for three types of points function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrumentation Points Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrumentation Points Paradyn Instrumentation Points Executable Code Entry foo () { call <bar> } Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrumentation Points Paradyn Instrumentation Points Instrumentation Executable Code Entry startTimer() foo () { call <bar> } counter++ Call Exit stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer To switch execution from a function to its instrumentation code: Overwrite instructions in function with a control transfer instruction. Equivalent of overwritten instructions are copied to the code patch area. On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. 5-byte jump range is whole address space If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Inserting Control Transfer Instructions Dynamically rewrite function in place Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Instrument Entry Point Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Call Point call <Foo> Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 1 mov leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 1 mov leave ret Back up far enough to replace instructions with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Jump interferes with the preceding call call jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” call <Foo> leave ret ? ? ? Beginning of next function (4-byte boundary) Replace instructions with a jump call <Foo> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) call <Foo> leave ret ? Overwrite return with a trap call <Foo> leave int3 ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot push mov sub mov No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot push mov sub mov No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump jmp <instrumentation> mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps and Traps Extra slot push mov sub mov No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Control Transfer Traps on x86 Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Problem Trap handling is slow: Traps Limit Instrumentation: On Solaris 2.6 jumps are over 1000 times faster than traps. On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: can’t insert as much or at as fine a granularity Trap handling logic is difficult: Susceptible to bugs Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation In Paradyn we rewrite a function: only if the function contains an instrumentation point that would require using a trap to instrument the first time a request to instrument the function is made even if the instrumentation to be inserted is not for a point that requires using a jump e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Function Rewriting and Relocation (continued) all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push mov call <Foo> call <Bar> ret Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push nop mov call <Foo> call <Bar> Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < instrumentation > call <Foo> Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < instrumentation > call <Foo> Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret nop nop nop nop Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < instrumentation > call <Foo> Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function push mov call <Foo> call <Bar> ret Original Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Rewriting A Function jmp < rewritten function> call <Foo> Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Update Jumps and Calls PC-relative jump and call instructions: with destinations outside the function will have incorrect displacements some jumps to locations inside the function will have incorrect displacements 2-byte jumps: have range of 128 bytes forward, 127 bytes backwards if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Current Limitations We do not relocate a function if: the application is executing within the function we want to instrument it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Average time to get to instrumentation and back Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris Linux 37.6 .03 .04 8.3 time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Jumps vs. Traps Relocating functions that are performance bottlenecks, leads to greatest speedup More instrumentation can be inserted since perturbation to system is minimized. In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

bubba (circuit layout) Some Results bubba (circuit layout) instrumented 9 functions for CPU all required trap for exit point 5 relocated functions called 400 thousand times consumed 20% of CPU. 23 seconds to execute using relocation 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

fspx (2-D heat transfer simulation) Some Results fspx (2-D heat transfer simulation) 4 of 46 functions required traps all for exit points instrumented __atan for CPU required trap for exit called 107 million times consumed 25% of CPU. 7.5 minutes to execute using relocation 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

Conclusions Dynamic rewriting and function relocation: Used by Paradyn to allow using jumps, instead of traps, when profiling applications, to improve performance. Crucial for large scale and fine-grained instrumentation. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation