Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,

Slides:

Advertisements

Similar presentations

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.

Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Ermenegildo Tomasco University of Southampton, UK Omar Inverso University of Southampton, UK Bernd Fischer Stellenbosch University, South Africa Salvatore.

CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.

Gennaro Parlato (LIAFA, Paris, France) Joint work with P. Madhusudan Xiaokang Qie University of Illinois at Urbana-Champaign.

The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.

Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.

Α ϒ ʎ …… Reachability Modulo Theories Akash Lal Shaz Qadeer, Shuvendu Lahiri Microsoft Research.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Chapter 6: Process Synchronization

Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.

Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.

Reducing Context-bounded Concurrent Reachability to Sequential Reachability Gennaro Parlato University of Illinois at Urbana-Champaign Salvatore La Torre.

PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.

The Tree-Width of auxiliary storage Gennaro Parlato (University of Southampton, UK) Joint work: P. Madhusudan – UIUC, USA.

The Language Theory of Bounded Context-Switching Gennaro Parlato (U. of Illinois, U.S.A.) Joint work with: Salvatore La Torre (U. of Salerno, Italy) P.

On Sequentializing Concurrent Programs Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA, University of Paris 7, France Michael Emmi LIAFA, University.

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

The Tree-Width of automata with auxiliary storage Gennaro Parlato (LIAFA, CNRS, Paris, France) joint work with P. Madhusudan (Univ of Illinois at Urbana-Champaign,

Lazy-CSeq A Lazy Sequentialization Tool for C Omar Inverso University of Southampton, UK Ermenegildo Tomasco University of Southampton, UK Bernd Fischer.

1/25 Context-Bounded Analysis of Concurrent Queue Systems Gennaro Parlato University of Illinois at Urbana-Champaign Università degli Studi di Salerno.

Verifying Concurrent Programs by Memory Unwinding Ermenegildo Tomasco University of Southampton, UK Omar Inverso University of Southampton, UK Bernd Fischer.

272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Scope-Bounded Pushdown Languages Salvatore La Torre Università degli Studi di Salerno joint work with Margherita Napoli Università degli Studi di Salerno.

Thread Quantification for Concurrent Shape Analysis Josh BerdineMSR Cambridge Tal Lev-AmiTel Aviv University Roman ManevichTel Aviv University Mooly Sagiv.

The Tree-Width of Decidable Problems 1 Gennaro Parlato (U. Southampton, UK) joint work with: P. Madhusudan (UIUC, USA) Salvatore La Torre (U. Salerno,

Scope-bounded Multistack Pushdown Systems: - fixed-point - sequentialization - tree-width 1 Salvatore La Torre Gennaro Parlato (U. Salerno, Italy) (U.

On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing, June 8-10, 2015.

Cpr E 308 Spring 2004 Real-time Scheduling Provide time guarantees Upper bound on response times –Programmer’s job! –Every level of the system Soft versus.

Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

On Sequentializing Concurrent Programs (Bounded Model Checking) Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing,

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi State merging, Concolic Execution.

Compositionality Entails Sequentializability Pranav Garg, P. Madhusudan University of Illinois at Urbana-Champaign.

Lecture 4 Introduction to Promela. Promela and Spin Promela - process meta language G. Holzmann, Bell Labs (Lucent) C-like language + concurrency dyamic.

Bounded Model Checking of Multi-Threaded C Programs via Lazy Sequentialization Omar Inverso University of Southampton, UK Ermenegildo Tomasco University.

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Relaxed Memory Effects and its Verification.

Parosh Aziz Abdulla 1, Mohamed Faouzi Atig 1, Zeinab Ganjei 2, Ahmed Rezine 2 and Yunyun Zhu 1 1. Uppsala University, Sweden 2. Linköping University, Sweden.

Implementing Mutual Exclusion Andy Wang Operating Systems COP 4610 / CGS 5765.

On Sequentializing Concurrent Programs

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lazy Sequentialization via Shared Memory Abstractions

Verification for Concurrent Programs

Memory Consistency Models

Sequentializing Parameterized Programs

Lecture 11: Consistency Models

Memory Consistency Models

Sequentialization by Read-implicit Coarse-grained Memory Unwindings

Ermenegildo Tomasco1, Truc L

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Over-Approximating Boolean Programs with Unbounded Thread Creation

Threads and Memory Models Hal Perkins Autumn 2009

Lazy Sequentialization Unbounded Concurrent Programs

Synthesis of Memory Fences via Refinement Propagation

Implementing Mutual Exclusion

A Lazy Sequentialization Tool for Unbounded Context Switches

Relaxed Consistency Part 2

Why we have Counterintuitive Memory Models

Abstraction-Guided Synthesis of synchronization

Presentation transcript:

Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA, University of Paris 7, France Gennaro Parlato ✓ Gennaro Parlato ✓ University of Southampton, UK

Sequential consistency memory model (SC) Write(var,val): sh_mem[ var ]  val ; (immidialy visible to all threads Read( var ): returns sh_mem[ val ]; SC= actions of different threads interleaved in any order action of the same thread maintain the execution order WMM= For performance reason modern multi-processors reorder memory operations of the same thread T1T1 Shared Memory TnTn …

Total Store Ordering (TSO) (x  4) (z  7) (y  3) T1T1 M1M1 Shared Memory (z  4) (y  4) TnTn MnMn … … Each thread has its store-buffer (FIFO) Each thread has its store-buffer (FIFO) Write(var,val): the pair (var  val ) is sent to the buffer Write(var,val): the pair (var  val ) is sent to the buffer Memory update = execution of a Write taken from some buffer Memory update = execution of a Write taken from some buffer Read( var ) returns val Read( var ) returns val - If ( var  val ) the last value written into var still in the store-buffer - the buffer does not contain any Write to var, and sh_mem( var ) = val fence requires that the store-buffer is empty fence requires that the store-buffer is empty …

Correct under SC -- Wrong under TSO Dekker’s mutual exclusion protocol Thread 1 a: y:=1 b: r 1 :=x c: if (r 1 ==0) then d: critical section Thread 2 1: x:=1 2: r 2 :=y 4: if (r 2 ==0) then 4: critical section Bad Schedule for TSO: a b c d both threads in the critical section!!! y1y1 x1x1

Verification for TSO? For finite state programs For finite state programs reachability is non-primitive recursive [Atig, Bouajjani, Burckhardt, Masuvathi – POPL’10] What shall we do? What shall we do? Symbolic representation of the store buffers? Symbolic representation of the store buffers? [Linden, Wolper—SPIN’10]: Regular model-checking Our approach reduce the analysis from TSO to SC Our approach reduce the analysis from TSO to SC can be done only with approximations … can be done only with approximations …

What is this talk about If we restrict to only executions where each thread is executed at most k times with no interruption (for a fixed k ) we can translate any concurrent program P TSO (recursion, thread creation, heap, …) into another program P SC s.t. P SC (under SC) simulates all possible executions of P TSO (under TSO) where each thread is executed at most k times P SC (under SC) simulates all possible executions of P TSO (under TSO) where each thread is executed at most k times P SC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals P SC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals P SC has linear size in the size of P TSO P SC has linear size in the size of P TSO Advantage: use off-the-shelf SC tools for the analysis of TSO programs Advantage: use off-the-shelf SC tools for the analysis of TSO programs

Code-to-code translation from TSO to SC

k-round (for each thread) reachability Run = (T i1+ +M i1 ) + (T i2+ +M i2 ) +... round P i1 round P i2 round P i1 round P i2 A k-round run : i # round P i ≤ k T1T1 M1M1 Shared Memory TiTi MiMi … … …… PiPi P1P1

Compositional reasoning [(T i +M i )*] k round 0 round 1 round 2    (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 )

Getting rid of store-buffers (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 ) is a copy of the shared vars (as locals) is a copy of the shared vars as Boolean (as locals) Mask i    Buff i

Invariant : Mask 0 Buff 0 Buff 1 Buff 2 Mask 1 Mask 2 (x  0) (y  1) (z  4) (y  7) (x  0) (x  4) (x  7) (x  3) (x  7) (y  5) round 0 round 1 round 2 store-buffer at each time in the simulation Mask i [ var ]=1 iff there is a store in the store-buffer for var that update the Shared memory at round i Buff i [ var ] containts the last value sent for var

Simulation 1,2 1,3 0,0 0,1 0,2 Before simulation: Masks set to False r_SC  0; r_TSO  0; Simulation: All statements not involving shared vars are executed Write( var, val ) Mask r_TSO [ var ]  T; Queue r_TSO [ var ]  val ; Read( var ) Let i be the greatest index s.t. i>=r_SC & Mask i (var) =1 if i>=0 return Queue i [ var ] else return var ; Buff i round 0 round 1 round 2 End of round : (Update shared vars): For all var if Mask r_SC ( var ) ==1 var  Buff r_SC [ var ]; (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 )

Skeleton of the translation Shared sh_vars; Thread_i()Begin locals l_vars; locals l_vars; stmt_1; stmt_1; stmt_2; stmt_2; … stmt_n; stmt_n;end r_TSO, r_SC, sim, Mask 0, Buff 0, …, Mask k, Buff k ; Init(); // initialize Masks to False, r_SC=0, r_TSO, sim=0; stmt_j  before(); stmt_j; after(); before(){ // start round if (!sim){ lock; sim=1; r_SC++; if (r_TSO< r_SC) r_TSO=r_SC; } while(*) r_TSO++; } after() { if(*) //end round Update_shared(r_SC, Mask, Queue) sim=0; unlock; }

Characteristics of the translation For fixed k, P SC is linear in the size of P TSO For fixed k, P SC is linear in the size of P TSO 2k copies of the shared variable as locals (no store-buffer) 2k copies of the shared variable as locals (no store-buffer) P SC and P TSO are in the same class P SC and P TSO are in the same class no restriction on the programs is imposed no restriction on the programs is imposed The reachable shared states are the same in P SC and P TSO The reachable shared states are the same in P SC and P TSO A state S is reachable in P TSO with at most k rounds per thread iff S is reachable in P SC S is reachable in P SC

Bounding Store Ages Observation: When r_SC =1 (Mask 0, Buff 0 ) are not used any longer Reuse the Mask and Queue variables: Translation: (Mask j, Buff j ) are used circularly (modulo k+1). k store-ages : Unbounded rounds! Constraint : each write pair remains in the store-buffer for at most k rounds (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 ) (Mask 0 Buff 0 ) … …

How can we use this code-to-codetranslation?

Corollaries schedules (k fixed) Concurrent Boolean Prog. ComplexityReferences k-store-agesno recursionPspace k context- switches RecursionExptime [Qadeer, Rehof – TACAS’05] k round-robinRecursion Finite # threads |parameterized Exptime [Lal, Reps–CAV’08] [La Torre, P., Madhusudan—CAV’09] [La Torre, P., Madhusudan—CAV’10] k-rounds per thread recursion thread-creation 2-Expspace [Atig, Bouajjani, Qadeer – TACAS’09] k-delay boundrecursion thread- creation Exptime [Emmi, Qadeer, Rakamaric—POPL’11] k-compositionalrecursion thread-creation Exptime [Bouajjani, Emmi, P.—SAS’11] Decidability results for TSO reachability Our code-to-code translation is a linear reduction TSO -> SC. Inherit decidability from SC

Tools for SC  Tools for TSO (our code-to-code translation as a plug-in) A convenient way to get new tools for TSO … Concurrent Program SC tools: Bounded model checking: Bounded model checking: ESBMC (FSE’11) ESBMC (FSE’11) Poirot (by MSR) Poirot (by MSR) Storm (CAV’09) Storm (CAV’09) … Boolean Programs: Boolean Programs: Boom, Boppo Boom, Boppo G ETAFIX (PLDI’09) G ETAFIX (PLDI’09) jMoped [SPIN’08] jMoped [SPIN’08] … CHESS (MSR) CHESS (MSR) Sequentialization + sequ. tools Sequentialization + sequ. tools

Experiments Mutual exclusion Protocols P OIROT (by MSR) Loop unrolling: 2 D stands for Delay bound No fences (buggy for TSO) D=1 With fences (correct for TSO) D=1 D=2 Dekker 7 s 6 s 72 s Lamport26 s 110 s1608 s Peterson 5 s 6 s 47 s Szymanski 8 s 6 s 978 s P OIROT : SMT-based bounded model-checkers for SC programs Errors due to TSO discovered in few seconds! P OIROT can also be a model-checker for TSO!

Conclusions

Conclusions We have proposed a code-to-code translation from TSO to SC allows to use existing and future tools designed for SC to analyze programs running under TSO allows to use existing and future tools designed for SC to analyze programs running under TSO under-approximation (error finding) under-approximation (error finding) restrictions imposed on the analyzed runs is useful to find errors in programs restrictions imposed on the analyzed runs is useful to find errors in programs Beyond TSO ? Generic approach ? Thanks!