Chrysalis Analysis: Incorporating Synchronization Arcs in Dataflow-Analysis-Based Parallel Monitoring Michelle Goodstein*, Shimin Chen †, Phillip B. Gibbons.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Butterfly Analysis 1  Michelle Goodstein Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring Michelle L. Goodstein*, Evangelos.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.
Secure web browsers, malicious hardware, and hardware support for binary translation Sam King.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman
ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Rahul Sharma (Stanford) Michael Bauer (NVIDIA Research) Alex Aiken (Stanford) Verification of Producer-Consumer Synchronization in GPU Programs June 15,
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Dataflow Analysis for Concurrent Programs using Datarace Detection Ravi Chugh, Jan W. Voung, Ranjit Jhala, Sorin Lerner LBA Reading Group Michelle Goodstein.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Computer Architecture Lab at Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
The Standford Hydra CMP  Lance Hammond  Benedict A. Hubbert  Michael Siu  Manohar K. Prabhu  Michael Chen  Kunle Olukotun Presented by Jason Davis.
Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:
FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.
IThreads A Threading Library for Parallel Incremental Computation Pramod Bhatotia Pedro Fonseca, Björn Brandenburg (MPI-SWS) Umut Acar (CMU) Rodrigo Rodrigues.
Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015.
Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.
Optimistic Hybrid Analysis
Lecture 20: Consistency Models, TM
Speculative Lock Elision
Multiscalar Processors
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Lecture 11: Consistency Models
Effective Data-Race Detection for the Kernel
Amir Kamil and Katherine Yelick
Threads and Memory Models Hal Perkins Autumn 2011
How to improve (decrease) CPI
Threads and Memory Models Hal Perkins Autumn 2009
Instruction Level Parallelism (ILP)
Parallelizing Dynamic Information Flow Tracking
COMP60621 Fundamentals of Parallel and Distributed Systems
Amir Kamil and Katherine Yelick
Lecture 2 The Art of Concurrency
Chapter 6: Synchronization Tools
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Chrysalis Analysis: Incorporating Synchronization Arcs in Dataflow-Analysis-Based Parallel Monitoring Michelle Goodstein*, Shimin Chen †, Phillip B. Gibbons ‡, Michael A. Kozuch ‡ and Todd C. Mowry* *Carnegie Mellon University † HP Labs China ‡ Intel Labs Pittsburgh

Motivation Software bugs are common, even in sequential code Chip multi-processors increasing importance of parallel software Parallel software introduces new “species” of bugs Bugs can lead to crashes, security exploits and other harms to system We would like to detect bugs before they cause harm One solution: Monitor programs at runtime using lifeguards Chrysalis Analysis2Michelle Goodstein

Update p 2 ’s metadata. taint p 2. *p 2. Dynamic Program Monitoring Application is dynamically monitored by a lifeguard as it runs – Monitors each dynamic instruction Lifeguard maintains finite-state machine model of correct execution – Checks metadata to see if program does something wrong Ex: Is performing *p2 safe (e.g., is p2 untainted)? Lifeguard Update metadata Application p1p1 0 p2p2 p3p3. p4p4. Metadata: Tainted? Commit Order Chrysalis Analysis3Michelle Goodstein 01

Is *p 2 safe ? ERROR: metadata for p 2 tainted. taint p 2. *p 2. Dynamic Program Monitoring Application is dynamically monitored by a lifeguard as it runs – Monitors each dynamic instruction Lifeguard maintains finite-state machine model of correct execution – Checks metadata to see if program does something wrong Ex: Is performing *p2 safe (e.g., is p2 untainted)? Lifeguard Check metadata Application p1p1 0 p2p2 1 p3p3. p4p4. Metadata: Tainted? Commit Order Chrysalis Analysis4Michelle Goodstein

. untaint p *p. Dynamically Monitoring Parallel Programs Updating metadata straightforward for sequential programs Intuition: Monitor parallel applications with parallel lifeguards Parallel apps: inter-thread data dependences complicate lifeguards – Ideal: Lifeguards process trace in app instructions’ global commit order – Butterfly Analysis [ASPLOS 2010] : No inter-thread data dependences Cannot measure using today’s hardware Relaxed memory consistency models: no total order Thread 1. taint p. Thread 2 Lifeguard 2Lifeguard 1 Commit Order Chrysalis Analysis5Michelle Goodstein Thread 0 Lifeguard 0

. untaint p *p. Butterfly Analysis: Dynamic Parallel Monitoring Butterfly Analysis +Proceed without capturing inter-thread data dependences + Supports relaxed memory consistency models -Ignores explicit software synchronization Thread 1. taint p. Thread 2 Lifeguard 2Lifeguard 1 Chrysalis Analysis6Michelle Goodstein Thread 0 Lifeguard 0 Commit Order

Chrysalis Analysis: Generic Dynamic Dataflow Analysis Platform Generic parallel dynamic dataflow analysis framework – Lifeguards can be built on top of generic dataflow examples – This talk: TaintCheck Not only race detection: Analyses robust even when races present Behaves conservatively but correctly – When two conflicting metadata values possible, assume worst case Incorporates high-level synchronization arcs – Our experiments: 97% reduction in false positives (relative to Butterfly) Chrysalis AnalysisMichelle Goodstein7 Lifeguard 2Lifeguard 1Lifeguard 0. lock L untaint p *p unlock L. Thread 1 Thread 2. lock L taint p: unlock L. Commit Order Thread 0

Roadmap for Remainder of Talk Review of Butterfly Analysis Highlight key changes to execution model to incorporate sync arcs – Vector clocks – Asymmetry Illustrate research challenges and solutions – Calculating local/global states – Computing side-in/side-out primitives Experimental evaluation Template color coding: Butterfly, Chrysalis Chrysalis AnalysisMichelle Goodstein8

untaint p *p. taint p. Butterfly Analysis: Fundamentals Key Insight: Only consider a window W of uncertainty – W must account for all buffering in pipeline and memory system Large relative to ROB, memory access latency Smallrelative to total execution – Our experiments: 1000s-10,000s of instructions/thread. Chrysalis Analysis9Michelle Goodstein Commit Order Window

Butterfly Analysis: Reasoning About Concurrent Regions Chrysalis Analysis10Michelle Goodstein. A: untaint p B: *p. Thread 1 Thread 2. C: taint p. Commit Order Thread 0 Lifeguard 1 Concurrent Region of Execution Traces Lifeguard must behave conservatively Three Possible Orderings A B C p tainted *p unsafe A B C p untainted *p safe A B C

Butterfly Analysis: Ignoring Sync Arcs Causes False Positives Chrysalis Analysis11Michelle Goodstein. D: lock L A: untaint p B: *p E: unlock L. Thread 1 Thread 2. F: lock L C: taint p G: unlock L. Commit Order Thread 0 Lifeguard 1 Concurrent Region of Execution Traces Butterfly Analysis considers an impossible interleaving to be valid. D: lock L A: untaint p B: *p E: unlock L. Thread 1 Thread 2. F: lock L C: taint p G: unlock L. Commit Order Thread 0 Three Possible Orderings A B C p tainted *p unsafe A B C p untainted *p safe A B C

Chrysalis Analysis: Incorporating Sync Arcs Improves Precision Chrysalis Analysis12Michelle Goodstein. D: lock L A: untaint p B: *p E: unlock L. Thread 1 Thread 2. F: lock L C: taint p G: unlock L. Commit Order Thread 0 Lifeguard 1 Concurrent Region of Execution Traces Under all possible orderings, *p safe! p untainted *p safe Two Possible Orderings A B C D E F G A B C D E F G p untainted *p safe

Chrysalis Analysis: Incorporating Sync Arcs Into Butterfly Analysis Chrysalis Analysis: Generalize Butterfly Analysis to include sync arcs +Improved precision (compared to Butterfly Analysis) + Relaxed consistency models OK, no explicit hardware required Research challenges solved More complex thread execution model More complex dataflow analysis framework Chrysalis Analysis13Michelle Goodstein Lifeguard 2Lifeguard 1Lifeguard 0. D: lock L A: untaint p B: *p E: unlock L. Thread 1 Thread 2. F: lock L C: taint p G: unlock L. Commit Order Thread 0

Butterfly Analysis: A Brief Review Consider an online execution trace. untaint p *p. taint p Chrysalis Analysis14Michelle Goodstein Commit Order

Butterfly Analysis: Epochs Partition Thread Execution taint p untaint p *p Epoch 1 Epoch 0 Epoch 2 Epoch 3 Epoch 4 Execution divided into epochs separated by at least W events/thread Chrysalis Analysis15Michelle Goodstein Commit Order W

Epochs: Reasoning About Concurrency From the perspective of the center epoch Most epochs are non-adjacent – Instructions in these epochs execute strictly before or strictly after Two epochs are adjacent to center epoch 3 epoch window of potentially concurrent instructions taint p untaint p *p Sliding window limited to 3 epochs W Relative To Center Epoch W untaint p *p Chrysalis Analysis16Michelle Goodstein Commit Order

Tail Body Head Butterfly Analysis: Concurrency Within Three Epoch Window Epochs l l-1 l+1 Thread t Wings Chrysalis Analysis17Michelle Goodstein Commit Order

Butterfly Analysis: Parallel Forward Dataflow Analysis Extend standard dataflow primitives (In, Out, Gen, Kill) Introduced two new primitives: Side-Out and Side-In – Side-Out: Effects of concurrency a block exposes to other threads – Side-In: Effects of concurrency other threads expose to a block Head Tail Body Epochs l l-1 l+1 Thread t Wings Chrysalis Analysis18Michelle Goodstein Commit Order

Butterfly Analysis: Parallel Dataflow Analysis Extend standard dataflow primitives (In, Out, Gen, Kill) Introduced two new primitives: Side-Out and Side-In – Side-Out: Effects of concurrency a block exposes to other threads – Side-In: Effects of concurrency other threads expose to a block Head Tail Body Epochs l l-1 l+1 Thread t Wings Chrysalis Analysis19Michelle Goodstein Commit Order

Butterfly Analysis: Parallel Dataflow Analysis Head Tail Body Epochs l l-1 l+1 Thread t Wings Two-pass lifeguard analysis over 3-epoch sliding window Lifeguard threads execute in parallel Maintains state Global state: Summarizes earlier epochs outside the window Local state: Global state augmented with info from the head Chrysalis Analysis20Michelle Goodstein Commit Order

Generalizing Butterfly Analysis: Incorporating Sync Arcs Thread 1 Thread 0 Epoch 1 Epoch 2 lock L taint p unlock L lock L untaint p *p unlock L Thread 1 Thread 0 Epoch 1 Epoch taint p. untaint p *p Chrysalis Analysis21Michelle Goodstein Butterfly Analysis: p conservatively tainted at *p in Thread 0, epoch 2 If mutual exclusivity is enforced, *p must be untainted! – Useful ordering information implied by sync also lost

Chrysalis Analysis: Incorporating Sync Arcs To Improve Precision Goal: Incorporate synchronization-based happens-before arcs Butterfly Analysis framework not general enough to handle arbitrary arcs… Thread 1 Thread 0 Epoch 1 Epoch lock L taint p unlock L. lock L untaint p *p unlock L. Chrysalis Analysis22Michelle Goodstein Commit Order

Chrysalis Analysis: Incorporating Synchronization Arcs Goal: Incorporate synchronization-based happens-before arcs Instrument sync with vector clocks to capture happens-before arcs Calculate dataflow primitives (In, Out, Side-In, Side-Out, Gen, Kill) at boundaries Chrysalis Analysis considers p untainted at *p in subblock Thread 1 Thread 0 Epoch 1 Epoch 2 lock L taint p unlock L lock L untaint p *p unlock L No longer simple, symmetric graph… Chrysalis Analysis23Michelle Goodstein Commit Order Asymmetry causes complexity

Butterfly Analysis: Recall Graph Model Head Tail Body Epochs l l-1 l+1 Thread t Wings Original Butterfly Analysis: From perspective of the body Commit Order Chrysalis Analysis24Michelle Goodstein

Butterfly Analysis: Creating Local State taint p untaint p *p Epochs l l-1 l+1 Thread t Wings Local State ( ) calculated by augmenting Global State with effects of Head Commit Order Chrysalis Analysis25Michelle Goodstein

Butterfly Analysis: Calculating Side-Out taint p untaint p *p Epochs l l-1 l+1 Thread t Wings Each block in the wings has a side-out ( ) generated by lifeguard p: 1 taint: {p} Commit Order Chrysalis Analysis26Michelle Goodstein

Butterfly Analysis: Computing Side-In taint p untaint p *p Epochs l l-1 l+1 Thread t Wings All side-out from the wings are combined into one side-in ( ) p:1 taint: {p} Commit Order Chrysalis Analysis27Michelle Goodstein

Chrysalis Analysis: Incorporating Sync Arcs Head Tail Body Epochs l l-1 l+1 Thread t Wings In general: Sync introduces asymmetry/complexity, in body and wings Chrysalis Analysis28Michelle Goodstein Head Body Commit Order

Chrysalis Analysis: Calculating Local State Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Highlighted blocks involved in local state computation for body Chrysalis Analysis29Michelle Goodstein *p taint p meet untaint p p:0 untaint: {p} Commit Order p:1 taint: {p}

Chrysalis Analysis: Calculating Local State Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Calculating local state becomes increasingly complex with more arcs Chrysalis Analysis30Michelle Goodstein *p meet Commit Order

Chrysalis Analysis: Side-In/Side-Out Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Arcs to/from the body alter the wings for each subblock, and the side-in Chrysalis Analysis31Michelle Goodstein Commit Order *p

Chrysalis Analysis: Side-In/Side-Out Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Arcs to/from the body alter the wings for each subblock, and the side-in Chrysalis Analysis32Michelle Goodstein *p Commit Order

Chrysalis Analysis: Side-In/Side-Out Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Arcs to/from the body alter the wings for each subblock, and the side-in Chrysalis Analysis33Michelle Goodstein *p Commit Order

Chrysalis Analysis: Side-In/Side-Out Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Arcs to/from the body alter the wings for each subblock, and the side-in Chrysalis Analysis34Michelle Goodstein *p Commit Order

Chrysalis Analysis: Side-In/Side-Out (Reversed Arc) Epochs l l-1 l+1 Thread t Wings taint p untaint p *p Each subblock in the body can have different set of wings Chrysalis Analysis35Michelle Goodstein *p Commit Order

Contrast: Butterfly vs Chrysalis Analyses Butterfly Analysis Local state: calculate from head One set of wings/side-in per body “Simple” epoch summary updates global state - False positives due to missed synch Chrysalis Analysis Local state: calculate from all predecessors Wings/side-in differ for each body subblock Epoch summary must consider partial order – Includes arcs from epochs l+1 to l [extended epoch] +Improved precision Head Tail Body Epochs l l-1 l+1 Thread t Wings Head Tail Body Epochs l l-1 l+1 Thread t Wings Chrysalis Analysis36Michelle Goodstein Research Challenges

Chrysalis Analysis: Parallel Forward Dataflow Analysis With Sync Arcs General dataflow analysis framework – 2-pass lifeguards + global state update – Canonical examples: Reaching Definitions, Available Expressions – Memory/Security lifeguards: TaintCheck, AddrCheck Provably sound – Framework never misses an error (zero false negatives) Efficient analysis – Use dataflow meet to avoid excessive recomputations Chrysalis Analysis37Michelle Goodstein Head Tail Body Epochs l l-1 l+1 Thread t Wings Commit Order

Experimental Methodology Prototype built upon the Log-Based Architecture (LBA) framework [Chen08] – Full Butterfly & Chrysalis Analysis stacks implemented in software – Simulated hardware on shared-memory CMP using Simics – Used LBA for dynamic instruction traces, inserting epoch boundaries – Used LBA shim library to dynamically instrument synchronization calls Measured 2 CMP configurations: {4,8} cores – Corresponds to {2,4} application and {2,4} lifeguard threads 4 SPLASH Benchmarks: FFT, FMM, LU, BARNES Comparison of Butterfly Analysis and Chrysalis Analysis Chrysalis Analysis38Michelle Goodstein

Performance Results: Chrysalis Slowdown (relative to Butterfly) Average Slowdown: 1.9x Chrysalis Analysis39Michelle Goodstein

Precision Results: Potential Errors, Chrysalis vs Butterfly Chrysalis Analysis40Michelle Goodstein Average Reduction in Reported Errors: 17.9x

Precision Results: Percent Reduction in Potential Errors Average Reduction in Reported Errors: 97% Chrysalis Analysis41Michelle Goodstein

Chrysalis Analysis: Conclusions and Future Work General purpose parallel dynamic dataflow analysis platform Provably sound (never misses an error) Generalization retains advantages of Butterfly Analysis Supports relaxed memory consistency models Software framework No detailed inter-thread data dependence tracking TaintCheck Implementation Large reduction in false positives (average: 17.9x) Modest relative increase in overhead (average: 1.9x) Future work: Build many sophisticated runtime analysis tools in framework Chrysalis Analysis42Michelle Goodstein

Questions?