Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang, Xinyu Feng & Ming Fu Univ. of Science and Technology.
OSDI ’10 Research Visions 3 October Epoch parallelism: One execution is not enough Jessica Ouyang, Kaushik Veeraraghavan, Dongyoon Lee, Peter Chen,
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy.
CSE , Autumn 2011 Michael Bond.  Name  Program & year  Where are you coming from?  Research interests  Or what’s something you find interesting?
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.
ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
School of Information Technologies Michael Cahill 1, Uwe Röhm and Alan Fekete School of IT, University of Sydney {mjc, roehm, Serializable.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Data races, informally [More formal definition to follow] “race condition” means two different things Data race: Two threads read/write, write/read, or.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Lecture 20: Consistency Models, TM
Lightweight Data Race Detection for Production Runs
Healing Data Races On-The-Fly
An Operational Approach to Relaxed Memory Models
Speculative Lock Elision
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Threads Cannot Be Implemented As a Library
Specifying Multithreaded Java semantics for Program Verification
Anders Gidenstam Håkan Sundell Philippas Tsigas
Threads and Memory Models Hal Perkins Autumn 2011
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Changing thread semantics
Atomicity in Multithreaded Software
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Design and Implementation Issues for Atomicity
Locking Protocols & Software Transactional Memory
Relaxed Consistency Part 2
Why we have Counterintuitive Memory Models
Compilers, Languages, and Memory Models
Lecture: Consistency Models, TM
Rethinking Support for Region Conflict Exceptions
Presentation transcript:

Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Programming Language Semantics? Data Races C++ no guarantee of semantics – “catch-fire” semantics Java provides weak semantics

Weak Semantics T1 T2 a = new A(); init = true; if (init) a.field++; A a = null; boolean init = false;

Weak Semantics T1 T2 a = new A(); init = true; if (init) a.field++; No data depende nce

Weak Semantics T1 T2 a = new A(); init = true; if (init) a.field++; A a = null; boolean init = false;

Weak Semantics T1T2 init = true; a = new A(); if (init) a.field++;

Weak Semantics T1T2 init = true; a = new A(); if (init) a.field++; Null derefere nce

DRF0 Atomicity of synchronization-free regions for data-race-free programs Data races - no semantics C++, Java follow variants of DRF0 – Adve and Hill, ISCA, 1990

Need for Stronger Memory Models “The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in the foundation of our languages and systems…” Give better semantics to programs with data races Stronger memory models – Adve and Boehm, CACM, 2010

Sequential Consistency (SC) Shared memory accesses interleave arbitrarily while each thread maintains program order

Sequential Consistency DRF0 Sequential Consistency

An Example Program Under SC T1T2 buffer[pos++ ]= 5 buffer[pos++ ] = 6 int pos = 0 int [ ] buffer 0

An Example Program Under SC T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t2 = pos buffer [t2] = 6 t2 = t2 + 1 pos = t2 0 int pos = 0 int [ ] buffer

An Example Program Under SC T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t2 = pos buffer [t2] = 6 t2 = t2 + 1 pos = t2 Time int pos = 0 int [ ] buffer 0

An Example Program Under SC T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t2 = pos buffer [t2] = 6 t2 = t2 + 1 pos = t2 Time pos = = 1 buffer 6 0 int pos = 0 int [ ] buffer 0

An Example Program Under SC T1 T2 buffer[pos++ ]= 5 buffer[pos++ ] = 6 pos = = 1 buffer pos = = 2 buffer buffer int pos = 0 int [ ] buffer 0

An Example Program Under SC T1 T2 buffer[pos++ ]= 5 buffer[pos++ ] = 6 SC execution pos = = 1 buffer 6 0 int pos = 0 int [ ] buffer 0

Programmer Assumption Atomicity of high- level operations

Can SC Eliminate Common Concurrency Bugs? “...programmers do not reason about correctness of parallel code in terms of interleavings of individual memory accesses…” SC does not prevent common concurrency bugs Data races dangerous even under SC – Adve and Boehm, CACM 2010

Run-time cost vs Strength Run-time cost Strength 1. Ouyang et al.... and region serializability for all. In HotPar, DRF0 Synchronization- free region serializability 1 SC

Run-time cost vs Strength Run-time cost Strength 1. Ouyang et al.... and region serializability for all. In HotPar, DRF0 SC Synchronization- free region serializability 1

Run-time cost vs Strength Run-time cost Strength Statically Bounded Region Serializability

Contribution Run-time cost Strength Statically Bounded Region Serializability EnfoRSer: An analysis to enforce SBRS practically Evaluation: Low run- time cost, eliminates real bugs

New Memory Model: Statically Bounded Region Serializability (SBRS)

Program Execution Behaviors Statically Bounded Region Serializability SC DRF0

Statically Bounded Region Serializability (SBRS) rel(lock) acq(lock) methodCall() Synchroniz ation operations Method calls Loop backedges

Statically Bounded Region Serializability (SBRS) rel(lock) acq(lock) methodCall()

Statically Bounded Region Serializability (SBRS) Statically and dynamicall y bounded Loop backedge s

Under SBRS T1T2 buffer[pos++ ]= 5 buffer[pos++ ] = 6 pos = 0 buffer 0

Under SBRS T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t2 = pos buffer [t2] = 6 t2 = t2 + 1 pos = t2 pos = 0 buffer 0

Under SBRS T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t1 = pos buffer [t1] = 6 t1 = t1 + 1 pos = t1 pos = = 2 buffer pos = 0 buffer 0 5 6

Under SBRS T1T2 t1 = pos buffer [t1] = 5 t1 = t1 + 1 pos = t1 t1 = pos buffer [t1] = 6 t1 = t1 + 1 pos = t1 pos = = 2 buffer 6 5 pos = 0 buffer 0

Bug Elimination x+= 42; if (o != null) {...= o.f; } buffer[pos++] = val; read–modify– write check before use multi- variable operation

EnfoRSer: A Hybrid Static- Dynamic Analysis to Enforce SBRS

EnfoRSer, an efficient enforcement of SBRS Compiler Transformations Runtime Enforcement Two-phase Locking

Basic Mechanism Y = = X Y =

Basic Mechanism Y = = X Y = Read lock Write lock

Basic Mechanism Y = = X Y =

Basic Mechanism Y = = X Y = Program access

Basic Mechanism Y = = X Y = Ownership transferred

Basic Mechanism X = = X Y =

Basic Mechanism Y = = X Y = X = = Y Z = T1T2 Conflict

Basic Mechanism Y = = X Y = X = = Y Z = T1T2 Deadlock

Basic Mechanism Y = = X Y = X = = Y Z = T1T2 Deadlock Lightweight reader-writer locks 2 Biased synchronization Lose ownership while acquiring locks 2. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOPSLA, 2013.

Basic Mechanism Y = = X Y = X = = Y Z = T1T2 Ownership transferred Two-phase locking violated

Basic Mechanism Y = = X Y = X = = Y Z = T1T2

Basic Mechanism Locks acquired Y = = X Y =

Challenges in Basic Mechanism Y = = X Y = Stores executed

EnfoRSer Atomicity Transformations Idempotent: Defer stores until all locks are acquired Speculation: Execute stores speculatively and roll back in case of a conflict

Idempotent Transformation X = = Y X =

Idempotence Transformation X = = Y X =

Idempotence Transformation … = Y … X = … Stores deferred

Idempotence Mechanism … = Y … X = … Conflict

Idempotence Mechanism … = Y … X = … Side-effect free

Idempotence Mechanism … = Y … X = … Execute stores

Idempotence Challenges = Y X = … X = = X

Idempotence Challenges = Y … X = = X Loads data dependent on stores

Idempotence Challenges = Y … X = = Z Aliasing between loads and stores Data dependence?

Speculation Transformation X = = Y X =

Speculation Transformation X = = Y Z = Z = old_Z X = old_X old_X = X old_X = Z Backup store values Generate roll- back code

Speculation Mechanism Z = old_Z X = old_X X = = Y Z = old_X = X old_X = Z Conflict Execute roll-back code

Speculation Mechanism Z = old_Z X = old_X X = = Y Z = old_X = X old_X = Z Conflict

Speculation Mechanism Z = old_Z X = = Y Z = old_X = X old_X = Z All locks acquired X = old_X

Speculation Challenges = Z X = Y =

Speculation Challenges = Z X = Y = Executed store

Speculation Challenges = Z X = Y = Executed store Conflict Y = old_Y X = old_X

Similar to Software Transactional Memory (STM)? Idempotent approach similar to STMs that defer stores until a transaction commits Speculation approach similar to STMs that execute stores but undo them if a transaction needs to abort

Similar to Software Transactional Memory (STM)? Idempotent approach similar STMs that defer stores until a transaction commit Speculation approach similar STMs that undo stores before a transaction abort EnfoRSer provides atomicity of statically bounded regions more efficiently!

Similar to Software Transactional Memory (STM)? Idempotent approach similar STMs that defer stores until a transaction commit Speculation approach similar STMs that undo stores before a transaction abort EnfoRSer provides atomicity of statically bounded regions more efficiently! Bounded regions: efficient code generation Short regions: conservative conflict detection

Implementation and Evaluation

Implementation Developed in Jikes RVM Code publicly available on Jikes RVM Research Archive

Experimental Methodology Benchmarks DaCapo 2006, 9.12-bach Fixed-workload versions of SPECjbb2000 and SPECjbb2005 Platform AMD Opteron system: 32 cores

Whole-Program Static Analysis Remove instrumentation from data-race- free accesses [Naik et al.’s 2006 race detection algorithm, Chord]

EnfoRSer: Run-time Performance % overhead over unmodified JVM 32% overhead on average

EnfoRSer: Run-time Performance % overhead over unmodified JVM 27% overhead on average

EnfoRSer: Run-time Performance % overhead over unmodified JVM pjbb2005, over 100% overhead Cao et al., WoDet 2014

EnfoRSer: Run-time Performance % overhead over unmodified JVM 53% overhead on average (speculation via log)

EnfoRSer: Run-time Performance with and without static race detection % overhead over unmodified JVM geomean w race det geomean w/o race det

Evaluation: Concurrency Errors Avoidance SBRS’s potential to eliminate concurrency bugs exposed on relaxed memory models

Avoiding Concurrency Errors DRF0 (AM)SCSBRS hsqldb6Infinite loopCorrect sunflow9Null pointer exception Correct jbb2000Corrupt output Correct jbb2000Infinite loopCorrect sorInfinite loopCorrect lufactInfinite loopCorrect moldynInfinite loopCorrect raytracerFails validation Correct AM = Adversarial Memory, Flanagan and Freund, PLDI 2010

Avoiding Concurrency Errors DRF0(AM)SCSBRS hsqldb6Infinite loopCorrect sunflow9Null pointer exception Correct jbb2000Corrupt output Correct jbb2000Infinite loopCorrect sorInfinite loopCorrect lufactInfinite loopCorrect moldynInfinite loopCorrect raytracerFails validation Correct AM = Adversarial Memory, Flanagan and Freund, PLDI 2010

Avoiding Concurrency Errors DRF0SCSBRS hsqldb6Infinite loopCorrect sunflow9Null pointer exception Correct jbb2000Corrupt output Correct jbb2000Infinite loopCorrect sorInfinite loopCorrect lufactInfinite loopCorrect moldynInfinite loopCorrect raytracerFails validation Correct AM = Adversarial Memory, Flanagan and Freund, PLDI 2010 Avoids all the errors exposed by AM.

Related Work Checks conflicts in bounded region DRFx, Marino et al., PLDI 2010 Checks conflicts in synchronization-free regions Conflict Exceptions, Lucia et al., ISCA 2010 Enforces atomicity of bounded regions Bulk Compiler, Ahn et al., MICRO 2009 Enforces atomicity of synchronization free regions … and region serializability for all, Ouyang et al., HotPar 2013 Requires customized hardware Requires additional cores

Conclusion Run-time cost Strength Statically Bounded Region Serializability EnfoRSer: An analysis to enforce SBRS practically Evaluation: Low run- time cost, eliminates real bugs