Dynamic Binary Translators and Instrumenters

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

CSI 3120, Implementing subprograms, page 1 Implementing subprograms The environment in block-structured languages The structure of the activation stack.
Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner, Angela Demke Brown, Ashvin Goel University of Toronto Presenter: Chuong.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
ThreadsThreads operating systems. ThreadsThreads A Thread, or thread of execution, is the sequence of instructions being executed. A process may have.
1 Names, Scopes and Bindings Aaron Bloomfield CS 415 Fall
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Chapter 4 Memory Management Virtual Memory.
Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation.
4P13 Week 3 Talking Points 1. Process State 2 Process Structure Catagories – Process identification: the PID and the parent PID – Signal state: signals.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University.
12/22/ Thread Model for Realizing Concurrency B. Ramamurthy.
CSC 8505 Compiler Construction Runtime Environments.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
CSC 322 Operating Systems Concepts Lecture - 7: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Hello world !!! ASCII representation of hello.c.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Lecture 3 Translation.
Chapter 4 – Thread Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
OPERATING SYSTEM CONCEPT AND PRACTISE
Process concept.
CS 6560: Operating Systems Design
Day 12 Threads.
Chapter 4 – Thread Concepts
ENERGY 211 / CME 211 Lecture 25 November 17, 2008.
Main Memory Management
Chapter 4: Threads.
Chapter 9 :: Subroutines and Control Abstraction
Realizing Concurrency using Posix Threads (pthreads)
Chapter 4: Threads.
Chapter 9: Virtual-Memory Management
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Realizing Concurrency using the thread model
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
System Structure and Process Model
Computer-System Architecture
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Lecture Topics: 11/1 General Operating System Concepts Processes
Architectural Support for OS
Efficient x86 Instrumentation:
Multithreaded Programming
Realizing Concurrency using Posix Threads (pthreads)
Realizing Concurrency using the thread model
Chapter 4: Threads.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
Realizing Concurrency using Posix Threads (pthreads)
CS510 Operating System Foundations
Runtime Environments What is in the memory?.
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Structure of Processes
Procedure Linkages Standard procedure linkage Procedure has
Chapter 8 & 9 Main Memory and Virtual Memory
Presentation transcript:

Dynamic Binary Translators and Instrumenters By Brian McClannahan

Static Compilation Compile program before running it Link code before run time Optimize code before run time Do everything before run time

Static Compilation Challenges Hard to predict dynamic behavior Difficult to get profiling information Phase changes are not indicated during static compilation OOP Runtime bindings

Solution Compile program dynamically Profile program as its run

DynamoRio Released in 2002 Current version: 7.1 Released February 2019 Works on Linux and Windows Created as a collaboration between HP and MIT Open-sourced in 2009

Code Cache Translates code into code cache one block at a time. Return control to dynamorio after block is executed Blocks don’t end at direct jumps. Call instructions are walked into. Block ends at any other control transfer

New Code When a fragment targets code not in the code cache. Jump to dynamorio control. Compile new fragment Link previous fragment to new fragment.

Self-Modifying Code Not allowed Uncommon in large-scale applications

Threads Each thread has its own code cache Cache is split into basic block cache and trace cache Enables thread-specific optimizations

Traces A group of consecutive blocks of code Trace can be exited at joins of basic blocks Indirect jumps are inlined in traces but a comparison is made to guarantee execution drops out if the target of the indirect branch does not match the recorded target from creation Trace head is a basic block fragment that is either: Target of a backwards branch Target of an exit from an existing trace

Trace Creation Each trace head has a counter Create trace starting from initial trace head until backwards branch or another trace is reached New trace represents a commonly executed grouping of fragments Targets of all exits from new trace become trace heads

Trace Efficiency

Execution Flow

Branch Prediction

Decode-Dispatch Interpreters Hard to create traces on switch statements

DynamoRio with Log PC Define new PC as a pair of a native PC and Logical PC Allow Dynamorio to track information about jumps Create traces for the interpreted program and not the interpreter

Interpreter Optimizations Call Return Matching Constant Propagation Dead Code Removal Stack Cleanup

Optimizations

Optimizations

Valgrind Created in 2000 Initially created to be a free memory debugger on linux Expanded to be a dynamic instrumenter Divided into a core system and skins Comes with some default skins: Memcheck Addrcheck Cachegrind Helgrind Nulgrind

Coverage Manages all code and libraries Even if source code is unavailable Can’t control system calls but they can be observed Uses a JIT compiler

Ucode Intermediate language used in valgrind Two-address language JIT compiler translates code from x86 to Ucode back to x86

Ucode cont.

Base Block Stores the simulated CPU Registers for simulated CPU tracked in memory

Basic Blocks Translation Disassembly Optimization Instrumentation Register Allocation Code Generation

Basic Block Jumps If known at compile time, insert direct jump Otherwise return to dispatcher and check small address cache. If not in cache, check entire table. Drop out to valgrind scheduler and translate new target Control is returned to valgrind scheduler if a system call or client request needs to be handled

Signal Processing Instruction is added at the beginning of every block to decrement a signal counter When counter hits 0, drop back to valgrind scheduler In valgrind scheduler process any signals and thread switches that are necessary

System Call Procedure Save valgrind stack pointer Copy simulated registers except PC into real registers Execute system call Copy real registers back into simulated registers Restore stack pointer

Floating Point Operations The FPU is not simulated like the CPU When a floating point instruction needs to be run: Move simulated registers to real registers Match integer registers on the simulated CPU to the real CPU if needed Copy results back into simulated CPU

Client Requests A signal or query sent from a client program to a skin. When a client request is made, valgrind inserts a no-op sequence into the code. When valgrind sees this sequence, it drops out and processes the request. Arguments can be passed to the client requests and the request can return a value to the client.

Self-Modifying Code Not supported by valgrind Does allow for code regions to be ignored

Signals Valgrind does not allow programs to interact with signals directly. If it did it’s possible it could lose control of the program permanently Instead valgrind intercepts the system calls used to register signals. Every few thousand basic blocks, any pending signals are processed.

Threading Valgrind supports the pthreads model. Provide replacement for the libpthread library Threads exist in user space. All threads run on a single kernel thread.

Execution Spaces User Space Core Space Kernel Space Vast majority of operations happen here Covers all JIT compiled code Core Space Signal handling Pthread operations Scheduling Kernel Space System calls Process Scheduling

Skins Needs – core services a skin wishes to use Trackable Events – core space events a skin wishes to be notified about Instrumentation – read and modify Ucode