Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software
GDB Improvements for Telecom System. GDBs Strengths, Today Primary debugger for Linux Used by Eclipse for C, C++ debugging –Eclipse communicates with.
An Overview Of Virtual Machine Architectures Ross Rosemark.
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
User-Mode Linux Ken C.K. Lee
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.
Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?
INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
TDB: A Source-level Debugger for Dynamically Translated Programs Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
RIVERSIDE RESEARCH INSTITUTE Helikaon Linux Debugger: A Stealthy Custom Debugger For Linux Jason Raber, Team Lead - Reverse Engineer.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing Yan Wang *, Harish Patil **, Cristiano Pereira **, Gregory Lueck **, Rajiv Gupta.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Software & Services Group PinADX: Customizable Debugging with Dynamic Instrumentation Gregory Lueck, Harish Patil, Cristiano Pereira Intel Corporation.
Pipelined Profiling and Analysis on Multi-core Systems Qin Zhao Ioana Cutcutache Weng-Fai Wong PiPA.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Continuously Recording Program Execution for Deterministic Replay Debugging.
October 2003 What Does the Future Hold for Parallel Languages A Computer Architect’s Perspective Josep Torrellas University of Illinois
Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.
BugNet Continuously Recording Program Execution for Deterministic Replay Debugging Satish Narayanasamy Gilles Pokam Brad Calder.
Ritu Varma Roshanak Roshandel Manu Prasanna
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
Previous Next 06/18/2000Shanghai Jiaotong Univ. Computer Science & Engineering Dept. C+J Software Architecture Shanghai Jiaotong University Author: Lu,
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
CS533 Concepts of Operating Systems Jonathan Walpole.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Interactive Debugging QuickZoom: A State Alteration and Inspection-based Interactive Debugger 1.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing Yan Wang *, Harish Patil **, Cristiano Pereira **, Gregory Lueck **, Rajiv Gupta.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
PPT 4: Reproducing the Problem CEN Software Testing.
Programmability Hiroshi Nakashima Thomas Sterling.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Full and Para Virtualization
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
Tuning Threaded Code with Intel® Parallel Amplifier.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Introduction to Operating Systems Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
PinADX: Customizable Debugging with Dynamic Instrumentation
Instruction-level Tracing: Framework & Applications
Introduction to Operating Systems
Chapter 4: Threads.
Hardware Works, Software Doesn’t: Enforcing Modularity with Mondriaan Memory Protection Emmett Witchel Krste Asanović MIT Lab for Computer Science.
Speculative execution and storage
Shielding applications from an untrusted cloud with Haven
Dynamic Binary Translators and Instrumenters
TEE-Perf A Profiler for Trusted Execution Environments
Presentation transcript:

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, James Cownie Intel Corporation CGO 2010, Toronto, Canada 1

Software & Services Group Non-Determinism Program execution is not repeatable across runs –Interactions with environment (single-threaded) –Shared-memory interleaving (multi-threaded) Source of many problems –Hard to predict and test behaviors -> leads to bugs –Very hard and unpleasant to debug –Breaks program analyses that rely on repeatability Obstacle for adoption of parallel programming 2

Software & Services Group Dealing with Non-Determinism Eliminate it –Deterministic program execution enforced by runtime (e.g. constrained execution [ISCA’09]) Deterministic Replay –Let it be but capture and reproduce execution if needed –Every instruction gets same input as in original run This paper: User-level Deterministic Replay –Implementation, challenges and usage examples 3

Software & Services Group Requirements No OS or hardware changes No changes in user environment Manageable log sizes for long runs Reasonable run-time overhead Multi-threaded and multi-processed applications Integration with other existing analysis tools (e.g. Dynamic analyzers, debuggers, profilers) No assumptions about synchronization APIs 4

Software & Services Group Rest of the Talk Motivation & Requirements PinPlay Overview Usage Examples Results Summary 5

Software & Services Group User-level deterministic replay and analysis PinPlay 6 Binary + Input Logs (pinballs) Normal Program Output + OS (Linux® or Windows®) PinPlay capture OS (Linux® or Windows®) Logs (pinballs) PinPlay Analysis Tools + Debuggers replay  Run in application’s native environment  Replays user code  OS independent: cross-OS replay!  Easily integrates w/ other tools and debuggers

Software & Services Group Parallel-capture and parallel-replay Replay Models 7 PinPlay T0 T1 T2 Logs (pinballs) PinPlay T0 T1 T2 Parallel-capture and isolated-replay PinPlay Logs (pinballs) PinPlay T0 T1 T2 T0 T1 T2 Logs (pinballs)

Software & Services Group Information Captured For Replay 3.Code executed (user and libraries) 4.Position of code and stack 5.Output of some instructions (e.g. RDTSC) 6.Subset of shared-memory access interleaving (transitive opt. - FDR [ISCA’03]) 8 1.Subset of Memory Values Shadow-memory to capture first reads without prior writes and OS side-effects automatically [Sigmetrics’06] Values changed by remote threads 2.Initial registers and OS register side-effects: Signals/Exceptions/APCs/system calls All memory Values Reads without prior writes OS side-effects used by app All other values (not captured) Values from remote threads

Software & Services Group PinPlay Architecture 9 Capable of logging, replaying and relogging execution (recapture from a replaying run) OS (Linux® or Windows®) Intel’s Pin (JIT compiler and instrumentor) * User Land PinPlay Lib Instrumentation and analysis to capture logs Application code and data Logger Replayer Instrumentation and analysis to inject side-effects Your Pin-based Tool * pinball

Software & Services Group Cross-OS Replay and Challenges Log on one OS and replay on another System call translations –Most OS activity does not happen on replay (only side- effects restored) –Semantics is translated across OSes (e.g. create thread) Memory mapping –Problem: address space different across OSes –Solution: use Pin’s Fetch API to redirect code and memory operand rewriting to redirect data 10 data address space on Windows® code address space on Linux® Remap code Remap data code data

Software & Services Group Usage Example: Program Analysis Sampling and checkpointing for simulation –One run for profiling and finding representative regions, another for checkpointing –Requirement: both runs must be identical 11 PinPlay Logs (pinballs) PinPlay + Profiler PinPlay + Profiler Multi-process MPI program Multi-process MPI program Logs (pinballs) Per-Process pinball Representative Regions PinPlay + Checkpointer PinPlay + Checkpointer Checkpoints for simulation Pinballs are used to share workloads for Pin- based analyses among architects

Software & Services Group Usage Example: Replay for Debugging Capture a buggy run and replay under debugger –Guaranteed to reproduce the bug and helps root causing –Works w/ off-the-shelf unmodified debuggers (e.g. GDB) –PinPlay based tool extends GDB commands w/ your own –Limitation: debugger can’t change control-flow Used to debug various multi-threaded applications Also using it for in-house debugging of concurrency issues with a major database vendor 12 Logs (pinballs) PinPlay Enabled Debugger Tool Intel’s Pin Binary GDB (unmodified) GDB (unmodified) remote protocol

Software & Services Group Results 13 Benchmark/ApplicationAverage Icount (Billions)Size (MB) SPEC2006 (single-threaded)92439 SPECOMP2001 (4-threaded openmp)30791 McBench (4-threaded RMS) MILC-8p (numerical simulator/MPI) POP-8p (ocean circulator model/MPI) WRF-8p (Weather Prediction/MPI) EnergyApp-8p (Energy Exploration/MPI) Isolated replay

Software & Services Group Sources of Slowdown Instrumentation of every memory operation to identify system call side-effects and log data –Could be done by OS at the cost of OS modification or OS-specific analysis (doesn’t work on Windows®) Locks for shadow-memory accesses –Could be eliminated by using a shadow-copy per thread at the cost of significant increase in log sizes Other optimizations possible (please look at the paper) 14

Software & Services Group Summary User-level deterministic capture and replay –No OS changes, special hardware, or virtualization –Integrates w/ other Pin-tools for repeatable analysis and debugging Replay occurs on any machine and works across OSes (Windows to Linux) Pinballs are OS-independent and self-contained –Ideal for sharing workloads among researchers, for Pin-based analyses We will release PinPlay libraries in future 15

Software & Services Group 16 Q&A