Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.

Slides:



Advertisements
Similar presentations
Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.
1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?
Chapter 6: Process Synchronization
Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner, Angela Demke Brown, Ashvin Goel University of Toronto Presenter: Chuong.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
Code Compaction of an Operating System Kernel Haifeng He, John Trimble, Somu Perianayagam, Saumya Debray, Gregory Andrews Computer Science Department.
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance Steven Wallace and Kim Hazelwood.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Continuously Recording Program Execution for Deterministic Replay Debugging.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Virtualization Technology Prof. Dan Connors. Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
University of Colorado
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Cortex-M3 Debugging System
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
1 Dimension: An Instrumentation Tool for Virtual Execution Environments Jing Yang, Shukang Zhou and Mary Lou Soffa Department of Computer Science University.
Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Using the Pin Instrumentation Tool for Computer Architecture Research Aamer Jaleel, Chi-Keung.
Native Client: A Sandbox for Portable, Untrusted x86 Native Code
Pin Tutorial Kim Hazelwood David Kaeli Dan Connors Vijay Janapa Reddi.
1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.
PMaC Performance Modeling and Characterization A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection Michael Laurenzano 1, Joshua.
Dynamic Compilation and Modification CS 671 April 15, 2008.
March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
Day 2: Building Process Virtualization Systems Kim Hazelwood ACACES Summer School July 2009.
Determina, Inc. Persisting Information Across Application Executions Derek Bruening Determina, Inc.
JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University.
Full and Para Virtualization
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
1 JIFL: JIT Instrumentation Framework for Linux Marek Olszewski Adam Czajkowski Keir Mierle University of Toronto.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Better Performance Through Thread-local Emulation Ali Razeen, Valentin Pistol, Alexander Meijer, and Landon P. Cox Duke University.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Instruction Set Architectures Continued. Expanding Opcodes & Instructions.
Introduction to Operating Systems Concepts
Remix: On-demand Live Randomization
Virtual Machine Monitors
Presented by Mike Marty
Dynamic Compilation Vijay Janapa Reddi
Vijay Janapa Reddi The University of Texas at Austin Interpretation 2
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
The heavyweight parts of lightweight languages
PinADX: Customizable Debugging with Dynamic Instrumentation
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Processes Creation and Threads
rePLay: A Hardware Framework for Dynamic Optimization
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn

Hazelwood – ISMM Dynamic Binary Instrumentation sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; Inserts or modify arbitrary instructions in executing binaries, e.g.: instruction count

Hazelwood – ISMM Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out $ pin -t inscount.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out Count

Hazelwood – ISMM How Does it Work? Generates and caches modified copies of instructions Modified (cached) instructions are executed in lieu of original instructions EXE Transform Code Cache Execute Profile

Hazelwood – ISMM Why “Dynamic” Instrumentation? Robustness! No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes The Code Discovery Problem on x86 Instr 1Instr 2 Instr 3Jump RegDATA Instr 5Instr 6 Uncond BranchPADDING Instr 8 Indirect jump to ?? Data interspersed with code Pad for alignment

Hazelwood – ISMM Intel Pin A dynamic binary instrumentation system Easy-to-use instrumentation interface Supports multiple platforms –Four ISAs – IA32, Intel64, IPF, ARM –Four OSes – Linux, Windows, FreeBSD, MacOS Popular and well supported –32,000+ downloads –400+ citations –500+ mailing list subscribers

Hazelwood – ISMM Research Applications Gather profile information about applications Compare programs generated by competing compilers Generate a select stream of live information for event-driven simulation Add security features Emulate new hardware Anything and everything multicore

Hazelwood – ISMM The Problem with Modern Tools Many research tools do not support multithreaded guest applications Providing support for MT apps is mostly straightforward Providing scalable support can be tricky!

Hazelwood – ISMM Issues that Arise Gaining control of executing threads Determining what should be private vs. shared between threads Code cache maintenance and consistency Concurrent instruction writes Providing/handling thread-local storage Handling indirect branches Handling signals / system calls

Hazelwood – ISMM The Pin Architecture JIT Compiler Syscall Emulator Signal Emulator Dispatcher Instrumentation Code Call-Back Handlers Analysis Code Code Cache Pin SerializedParallel T1 T2 T1 T2 Pin Tool

Hazelwood – ISMM Code Cache Consistency Cached code must be removed for a variety of reasons: Dynamically unloaded code Ephemeral/adaptive instrumentation Self-modifying code Bounded code caches EXE Transform Code Cache Execute Profile

Hazelwood – ISMM Motivating a Bounded Code Cache The Perl Benchmark

Hazelwood – ISMM Option 1: All threads have a private code cache (oops, doesn’t scale) Option 2: Shared code cache across threads If one thread flushes the code cache, other threads may resume in stale memory Flushing the Code Cache

Hazelwood – ISMM Naïve Flush Wait for all threads to return to the code cache Could wait indefinitely! VM CC1 VMstall VMstall CC2 VMCC1VMCC2 Flush Delay Thread1 Thread2 Thread3 Time

Hazelwood – ISMM Generational Flush Allow threads to continue to make progress in a separate area of the code cache VM CC1 VM CC2 VMCC1VMCC2 Thread1 Thread2 Thread3 Requires a high water mark Time

Hazelwood – ISMM Memory Scalability of the Code Cache Ensuring scalability also requires carefully configuring the code stored in the cache Trace Lengths First basic block is non-speculative, others are speculative Longer traces = fewer entries in the lookup table, but more unexecuted code Shorter traces = two off-trace paths at ends of basic blocks with conditional branches = more exit stub code

Hazelwood – ISMM Effect of Trace Length on Trace Count

Hazelwood – ISMM Effect of Trace Length on Memory

Hazelwood – ISMM Rewriting Instructions Pin must regularly rewrite branches No atomic branch write on x86 We use a neat trick*: “old” 5-byte branch 2-byte self branch n-2 bytes of “new” branch “new” 5-byte branch * Sundaresan et al. 2006

Hazelwood – ISMM Performance Results We use the SPEC OMP 2001 benchmarks OMP_NUM_THREADS environment variable We compare Native performance and scalability Pin (no Pintool) performance scalability Pin (lightweight Pintool) scalability InsCount Pintool – counts instructions at BB granularity Pin (middleweight Pintool) scalability MemTrace Pintool – records memory addresses Pin (heavyweight Pintool) scalability CMP$im – collects memory addresses and applies a software model of the CMP cache

Hazelwood – ISMM Native Scalability of SPEC OMP 2001

Hazelwood – ISMM Performance Scalability (No Instrumentation)

Hazelwood – ISMM Performance Scalability (LightWeight Instrumentation)

Hazelwood – ISMM Performance Scalability (MiddleWeight Instrumentation)

Hazelwood – ISMM Performance Scalability (HeavyWeight Instrumentation)

Hazelwood – ISMM Memory Scalability

Hazelwood – ISMM Summary Dynamic instrumentation tools are useful In the multicore era, we must provide support for MT application analysis and simulation Providing MT support in Pin was easy Making it robust and scalable was not easy