Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,

Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA, University of Paris 7, France Gennaro Parlato ✓ Gennaro Parlato ✓ University of Southampton, UK

Sequential consistency memory model (SC) Write(var,val): sh_mem[ var ]  val ; (immidialy visible to all threads Read( var ): returns sh_mem[ val ]; SC= actions of different threads interleaved in any order action of the same thread maintain the execution order WMM= For performance reason modern multi-processors reorder memory operations of the same thread T1T1 Shared Memory TnTn …

Total Store Ordering (TSO) (x  4) (z  7) (y  3) T1T1 M1M1 Shared Memory (z  4) (y  4) TnTn MnMn … … Each thread has its store-buffer (FIFO) Each thread has its store-buffer (FIFO) Write(var,val): the pair (var  val ) is sent to the buffer Write(var,val): the pair (var  val ) is sent to the buffer Memory update = execution of a Write taken from some buffer Memory update = execution of a Write taken from some buffer Read( var ) returns val Read( var ) returns val - If ( var  val ) the last value written into var still in the store-buffer - the buffer does not contain any Write to var, and sh_mem( var ) = val fence requires that the store-buffer is empty fence requires that the store-buffer is empty …

Correct under SC -- Wrong under TSO Dekker’s mutual exclusion protocol Thread 1 a: y:=1 b: r 1 :=x c: if (r 1 ==0) then d: critical section Thread 2 1: x:=1 2: r 2 :=y 4: if (r 2 ==0) then 4: critical section Bad Schedule for TSO: a b c d 1 2 3 4 both threads in the critical section!!! y1y1 x1x1

Verification for TSO? For finite state programs For finite state programs reachability is non-primitive recursive [Atig, Bouajjani, Burckhardt, Masuvathi – POPL’10] What shall we do? What shall we do? Symbolic representation of the store buffers? Symbolic representation of the store buffers? [Linden, Wolper—SPIN’10]: Regular model-checking Our approach reduce the analysis from TSO to SC Our approach reduce the analysis from TSO to SC can be done only with approximations … can be done only with approximations …

What is this talk about If we restrict to only executions where each thread is executed at most k times with no interruption (for a fixed k ) we can translate any concurrent program P TSO (recursion, thread creation, heap, …) into another program P SC s.t. P SC (under SC) simulates all possible executions of P TSO (under TSO) where each thread is executed at most k times P SC (under SC) simulates all possible executions of P TSO (under TSO) where each thread is executed at most k times P SC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals P SC has no buffer at all! Simulation of the store-buffers using 2k copies of the shared variables as locals P SC has linear size in the size of P TSO P SC has linear size in the size of P TSO Advantage: use off-the-shelf SC tools for the analysis of TSO programs Advantage: use off-the-shelf SC tools for the analysis of TSO programs

Code-to-code translation from TSO to SC

k-round (for each thread) reachability Run = (T i1+ +M i1 ) + (T i2+ +M i2 ) +... round P i1 round P i2 round P i1 round P i2 A k-round run : i # round P i ≤ k T1T1 M1M1 Shared Memory TiTi MiMi … … …… PiPi P1P1

Compositional reasoning [(T i +M i )*] k round 0 round 1 round 2    (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 )

Getting rid of store-buffers (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 ) is a copy of the shared vars (as locals) is a copy of the shared vars as Boolean (as locals) Mask i    Buff i

Invariant : Mask 0 Buff 0 Buff 1 Buff 2 Mask 1 Mask 2 (x  0) (y  1) (z  4) (y  7) (x  0) (x  4) (x  7) (x  3) (x  7) (y  5) round 0 round 1 round 2 store-buffer at each time in the simulation Mask i [ var ]=1 iff there is a store in the store-buffer for var that update the Shared memory at round i Buff i [ var ] containts the last value sent for var

Simulation 1,2 1,3 0,0 0,1 0,2 Before simulation: Masks set to False r_SC  0; r_TSO  0; Simulation: All statements not involving shared vars are executed Write( var, val ) Mask r_TSO [ var ]  T; Queue r_TSO [ var ]  val ; Read( var ) Let i be the greatest index s.t. i>=r_SC & Mask i (var) =1 if i>=0 return Queue i [ var ] else return var ; Buff i round 0 round 1 round 2 End of round : (Update shared vars): For all var if Mask r_SC ( var ) ==1 var  Buff r_SC [ var ]; (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 )

Skeleton of the translation Shared sh_vars; Thread_i()Begin locals l_vars; locals l_vars; stmt_1; stmt_1; stmt_2; stmt_2; … stmt_n; stmt_n;end r_TSO, r_SC, sim, Mask 0, Buff 0, …, Mask k, Buff k ; Init(); // initialize Masks to False, r_SC=0, r_TSO, sim=0; stmt_j  before(); stmt_j; after(); before(){ // start round if (!sim){ lock; sim=1; r_SC++; if (r_TSO< r_SC) r_TSO=r_SC; } while(*) r_TSO++; } after() { if(*) //end round Update_shared(r_SC, Mask, Queue) sim=0; unlock; }

Characteristics of the translation For fixed k, P SC is linear in the size of P TSO For fixed k, P SC is linear in the size of P TSO 2k copies of the shared variable as locals (no store-buffer) 2k copies of the shared variable as locals (no store-buffer) P SC and P TSO are in the same class P SC and P TSO are in the same class no restriction on the programs is imposed no restriction on the programs is imposed The reachable shared states are the same in P SC and P TSO The reachable shared states are the same in P SC and P TSO A state S is reachable in P TSO with at most k rounds per thread iff S is reachable in P SC S is reachable in P SC

Bounding Store Ages Observation: When r_SC =1 (Mask 0, Buff 0 ) are not used any longer Reuse the Mask and Queue variables: Translation: (Mask j, Buff j ) are used circularly (modulo k+1). k store-ages : Unbounded rounds! Constraint : each write pair remains in the store-buffer for at most k rounds (Mask 0 Buff 0 ) (Mask 1 Buff 1 ) (Mask 2 Buff 2 ) (Mask 0 Buff 0 ) … …

How can we use this code-to-codetranslation?

Corollaries schedules (k fixed) Concurrent Boolean Prog. ComplexityReferences k-store-agesno recursionPspace k context- switches RecursionExptime [Qadeer, Rehof – TACAS’05] k round-robinRecursion Finite # threads |parameterized Exptime [Lal, Reps–CAV’08] [La Torre, P., Madhusudan—CAV’09] [La Torre, P., Madhusudan—CAV’10] k-rounds per thread recursion thread-creation 2-Expspace [Atig, Bouajjani, Qadeer – TACAS’09] k-delay boundrecursion thread- creation Exptime [Emmi, Qadeer, Rakamaric—POPL’11] k-compositionalrecursion thread-creation Exptime [Bouajjani, Emmi, P.—SAS’11] Decidability results for TSO reachability Our code-to-code translation is a linear reduction TSO -> SC. Inherit decidability from SC

Tools for SC  Tools for TSO (our code-to-code translation as a plug-in) A convenient way to get new tools for TSO … Concurrent Program SC tools: Bounded model checking: Bounded model checking: ESBMC (FSE’11) ESBMC (FSE’11) Poirot (by MSR) Poirot (by MSR) Storm (CAV’09) Storm (CAV’09) … Boolean Programs: Boolean Programs: Boom, Boppo Boom, Boppo G ETAFIX (PLDI’09) G ETAFIX (PLDI’09) jMoped [SPIN’08] jMoped [SPIN’08] … CHESS (MSR) CHESS (MSR) Sequentialization + sequ. tools Sequentialization + sequ. tools

Experiments Mutual exclusion Protocols P OIROT (by MSR) Loop unrolling: 2 D stands for Delay bound No fences (buggy for TSO) D=1 With fences (correct for TSO) D=1 D=2 Dekker 7 s 6 s 72 s Lamport26 s 110 s1608 s Peterson 5 s 6 s 47 s Szymanski 8 s 6 s 978 s P OIROT : SMT-based bounded model-checkers for SC programs Errors due to TSO discovered in few seconds! P OIROT can also be a model-checker for TSO!

Conclusions

Conclusions We have proposed a code-to-code translation from TSO to SC allows to use existing and future tools designed for SC to analyze programs running under TSO allows to use existing and future tools designed for SC to analyze programs running under TSO under-approximation (error finding) under-approximation (error finding) restrictions imposed on the analyzed runs is useful to find errors in programs restrictions imposed on the analyzed runs is useful to find errors in programs Beyond TSO ? Generic approach ? Thanks!

Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,

Similar presentations

Presentation on theme: "Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,

Similar presentations

Presentation on theme: "Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,"— Presentation transcript:

Similar presentations

About project

Feedback