Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

Slides:

Advertisements

Similar presentations

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.

Advertisements

An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

1 Eran Yahav Technion Joint work with Martin Vechev (ETH), Greta Yorsh (ARM), Michael Kuperstein (Technion), Veselin Raychev (ETH)

Silberschatz, Galvin and Gagne ©2007 Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Chapter 6 (a): Synchronization.

Chapter 6 Process Synchronization Bernard Chen Spring 2007.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Mutual Exclusion.

CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.

ISBN Chapter 3 Describing Syntax and Semantics.

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.

1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.

Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.

1 Lecture 09 – Synthesis of Synchronization Eran Yahav.

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

Synchronization (other solutions …). Announcements Assignment 2 is graded Project 1 is due today.

Describing Syntax and Semantics

Synchronization CSCI 444/544 Operating Systems Fall 2008.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Program Analysis with Dynamic Change of Precision Dirk Beyer Tom Henzinger Grégory Théoduloz Presented by: Pashootan Vaezipoor Directed Reading ASE 2008.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

Program analysis with dynamic change of precision. Philippe Giabbanelli CMPT 894 – Spring 2008.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

CY2003 Computer Systems Lecture 04 Interprocess Communication.

Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.

Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.

Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,

CS533 Concepts of Operating Systems Jonathan Walpole.

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Relaxed Memory Effects and its Verification.

Abstractions for Relaxed Memory Models Andrei Dan, Yuri Meshman, Martin Vechev, Eran Yahav 1.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.

On Concurrency Idioms and their Effect on Program Analysis Weizmann Institute of Science Guy Katz and David Harel.

740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.

An Operational Approach to Relaxed Memory Models

Memory Consistency Models

Memory Consistency Models

Chapter 5: Process Synchronization

Abstraction-Guided Synthesis

Shared Memory Consistency Models: A Tutorial

Module 7a: Classic Synchronization

Threads and Memory Models Hal Perkins Autumn 2009

Shared Memory Consistency Models: A Tutorial

Synthesis of Memory Fences via Refinement Propagation

Memory Consistency Models

CSE 153 Design of Operating Systems Winter 19

Relaxed Consistency Finale

Abstraction-Guided Synthesis of synchronization

Presentation transcript:

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1

Textbook Example: Peterson’s Algorithm Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } 2

Beyond Textbooks: Relaxed Memory Models Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; }  Re-ordering of operations  Non-atomic stores 3

Memory Fences  Enforce order… at a cost  Fences are expensive  10s-100s of cycles  collateral damage (e.g., prevent compiler opts)  example: removing a single fence yields 3x speedup in a work-stealing queue  Required fences depend on memory model  Where should I put fences? 4

Where should I put fences? On the other, synchronization bugs can be very difficult to track down, so memory barriers should be used liberally, rather than relying on complex platform-specific guarantees about limits to memory instruction reordering. 5 On the one hand, memory barriers are expensive (100s of cycles, maybe more), and should be used only when necessary. – Herlihy and Shavit, The Art of Multiprocessor Programming.

May Seem Easy… Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } 6

Chase-Lev Work-Stealing Queue 1 int take() { 2 long b = bottom – 1; 3 item_t * q = wsq; 4 bottom = b 5 long t = top 6 if (b < t) { 7 bottom = t; 8 return EMPTY; 9 } 10 task = q->ap[b % q->size]; 11 if (b > t) 12 return task 13 if (!CAS(&top, t, t+1)) 14 return EMPTY; 15 bottom = t + 1; 16 return task; 17 } 1 void push(int task) { 2 long b = bottom; 3 long t = top; 4 item_t * q = wsq; 5 if (b – t >= q->size – 1) { 6 wsq = expand(); 7 q = wsq; 8 } 9 q->ap[b % q->size] = task; 10 bottom = b + 1; 11} 1 int steal() { 2 long t = top; 3 long b = bottom; 4 item_t * q = wsq; 5 if (t >= b) 6 return EMPTY; 7 task = q->ap[t % q->size]; 8 if (!CAS(&top, t, t+1)) 9 return ABORT; 10 return task; 11} Specification: no lost items, no phantom items, memory safety 7

Fender  Help the programmer place fences  Find optimal fence placement  Ideal setting for synthesis  Restrict non-determinism s.t. program stays within set of safe executions 8

Goal  P’ satisfies the specification S under M FENDER Program P Specification S Memory Model M Program P’ with Fences 9

Naïve Approach: Recipe  Compute reachable states for the program  Compute constraints on execution that guarantee that all “bad states” are avoided  Implement the constraints with fences Bad News [Atig et. al POPL’10] Reachability undecidable for RMO Non-primitive recursive complexity for TSO/PSO 10

Store Buffers 11 … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn

Unbounded Store Buffers… 12  Even for very simple patterns  e.g. spin-loops with writes in loop body flag := true while other_flag = true { flag := false //Do something flag := true } truefalsetruefalsetruefalsetrue

What can we do?  Under-approximate  Bound the length of execution buffers [FMCAD’10]  Implies a bound on state space  Bound context switches (Ahmed’s talk from yesterday)  Dynamic synthesis of fences (In progress)  Over-approximate  Sound abstraction of buffer content [PLDI’11] 13

Abstract Interpretation for RMM 14  Main Idea: Bounded over-approximation of unbounded buffers  Hierarchy of abstract memory models  Strictly weaker than concrete TSO/PSO  Maintain partial-coherence

First Attempt: Set Abstraction 15  Abstract each store buffer as a set Process 0: while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn { } {1} {true} { } { false } { false, true } { }

Second Attempt: record most recent value 16 Process 0 X := 1 Process 1 while (X != 1) { nop } X := 2 fence e := X assert e == 2; Initially X == 0 P0 Main Memory P1 { } { 1 } X = 0 X = 1 {2 } X = 2 flush (1 st time) { } flush (2 nd time)

Abstract Memory Models - Requirements  Intra-process coherence: a process should see the most recent value it wrote  Preserve fence semantics  Partial inter-process coherence: preserve as much order information as feasible (bounded)  Enable strong flushes  Sound  Simple construction! 17

Partial Coherence Abstractions 18 … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn P0 Main Memory P1 ent0 turn ent0 ent1 turn Recent value Bounded length k Unordered elements ent1 Allows precise fence semantics Allows precise loads from buffer Keeps the analysis precise for “normal” programs Fallback for soundness

Intuition  Making unbounded number of writes to a memory location without a flush?  Seems very esoteric  Your program is suspicious 19 flag := true while other_flag = true { flag := false //Do something flag := true }

Adjusted Recipe 20  Compute (abstract) reachable states for the program using partial-coherence abstraction  Compute constraints on execution that guarantee that all “bad states” are avoided  Implement the constraints with fences

Precision/fences tradeoff  Different abstractions lead to different fences  More precise abstraction – potentially fewer fences  In the paper  SC-Recoverability  PSO as an abstraction of TSO  Partially disjunctive abstraction allows more aggressive merging of buffer, scales better 21

Inference Results 22  Mostly mutual exclusion primitives  Different variations of the abstraction ProgramFD k=0FD k=1PD k=0PD k=1PD k=2 Sense0   Pet0  Dek0  Lam0T/O  Fast0T/O  Fast1a   Fast1b    Fast1cT/O  

Limitations… 23  More relaxed memory models  Need reasonable concrete semantics first  What if the program is infinite-space?  Or simply large  Partial-coherence isn’t enough  We want to also use “standard” abstract interpretation

Composing Abstractions 24  Trivial for value abstractions  Abstract buffers of abstract values  Challenging for more complex abstractions  Heap abstractions  Predicate abstractions  Buffers for “abstract locations” (?)

Summary  Partial-coherence abstractions  Verification without context bounds  but possibly with false alarms  Automatic fence inference  Precision affects quality of results  In progress  Combination of abstractions  Scalable (dynamic) fence inference 25

Invited Questions 1. Do you ever need to use k>1 ? 2. In your abstraction, do you ever fall into the unordered set? 3. How far can we expect this to scale? 4. Why over-approximation for fence inference? 5. Can you explain how the inference works? 26