Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Concurrent Programming James Adkison 02/28/2008. What is concurrency? “happens-before relation – A happens before B if A and B belong to the same process.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
ISBN Chapter 3 Describing Syntax and Semantics.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Lecture 13: Consistency Models
Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of.
Specifying Java Thread Semantics Using a Uniform Memory Model Jason Yue Yang Ganesh Gopalakrishnan Gary Lindstrom School of Computing University of Utah.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
The Structure of the “THE” -Multiprogramming System Edsger W. Dijkstra Jimmy Pierce.
Describing Syntax and Semantics
Analyzing the CRF Java Memory Model Yue Yang Ganesh Gopalakrishnan Gary Lindstrom School of Computing University of Utah.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Chapter 2 The process Process, Methods, and Tools
Design patterns. What is a design pattern? Christopher Alexander: «The pattern describes a problem which again and again occurs in the work, as well as.
Concurrency, Mutual Exclusion and Synchronization.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Multithreading Chapter Introduction Consider ability of human body to ___________ –Breathing, heartbeat, chew gum, walk … In many situations we.
Java Thread and Memory Model
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
CS533 Concepts of Operating Systems Jonathan Walpole.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Agenda  Quick Review  Finish Introduction  Java Threads.
An Operational Approach to Relaxed Memory Models
Distributed Shared Memory
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Memory Consistency Models
About the Presentations
runtime verification Brief Overview Grigore Rosu
Specifying Multithreaded Java semantics for Program Verification
Threads and Memory Models Hal Perkins Autumn 2011
Over-Approximating Boolean Programs with Unbounded Thread Creation
Threads and Memory Models Hal Perkins Autumn 2009
Memory Consistency Models
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
Chapter 6: Synchronization Tools
Relaxed Consistency Finale
Tools for the development of parallel applications
Presentation transcript:

Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR and SRC Task Doctoral Dissertation Defense

2 Memory architectures - more aggressive Central Problem – shared memory consistency models - Need a clear specification of memory ordering rules - Need an executable version of memory ordering rules - Need a method to analyze thread executions against the rules Load/store Data dependence Semaphore Memory fence Load-acquire/store-release Write atomicity Motivation Multithreaded software – popular, BUT hard to analyze - Thread libraries: e.g., P-thread, Win32, Solaris - Language level support of threads: e.g., Java

3 What Is a Memory Model? It defines the legal orderings of memory operations that can be perceived at the user level CPU memory st a,1; st b,1; ld r1,b; ld r2,a; st a,1 ; st.rel b,1; ld.acq r1,b; ld r2,a; CPU memory Example (Itanium assembly code, initially: a = b = 0) Can’t observe 0 store/load less restriction store-release/load-acquire more restriction 0 is OK

4 Classical Memory Models 1.Common total order 2.Program order 3.Read sees the “latest” write Sequential Consistency (SC) Other Weaker Models: Parallel Random Access Memory (PRAM), Coherence, Causal Consistency, Processor Consistency, Release Consistency, Lazy Release Consistency, Location Consistency, and more … memory They execute as if connected to a single memory through a non-deterministic switch Non-operational View: Operational View:

5 Industrial Memory Models The Intel Itanium® Memory Model Intel application note contains more than 30 pages of semi-formal rules English + large amount of special notations Many non-obvious consequences Use litmus tests to illustrate properties Cannot automatically execute litmus tests Use pencil-and-paper reasoning Example:

6 Language Level Memory Models Original JMM: Chapter 17 of Java Language Specification Poorly understood Flawed - too weak (may introduce security hole) - too strong (prevents common optimizations) Currently under revision (JSR-133) - Extensive discussions for more than 3 years - Several replacement proposals - Issues still remain Example: The Java Memory Model (JMM)

7 Why Does a Memory Model Matter? Initially, flag1 = flag2 = false, turn = 0. Thread 1 Thread 2 flag1 = true; turn = 2; while (turn == 2 && flag2) ; flag1 = false; flag1 = true; turn = 2; while (turn == 2 && flag2) ; flag1 = false; flag2 = true; turn = 1; while (turn == 1 && flag1) ; flag2 = false; flag2 = true; turn = 1; while (turn == 1 && flag1) ; flag2 = false; Can both threads enter the critical section simultaneously? For sequential consistency: No (the “intended behavior” is guaranteed) For many weaker models: Yes (the algorithm would be broken) Example: Peterson’s Algorithm for Mutual Exclusion

8 Do Programmers Really Care? Another example: Double-Checked Locking for Singleton creation class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } return helper; } Only use locking as needed “Double-check” the reference

9 Broken Under the Current JMM class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } return helper; } Only use locking as needed “Double-check” the reference Problem: Broken under the JMM! - on weak architectures - with race conditions - reference can be “visible” before constructor completes Can’t guarantee Helper is fully constructed!

10 Problems with Previous Approaches Virtually for all industrial weak memory models They don’t have formal specifications For those that do have a formal spec on paper They can’t be executed For those that have a machine-readable formal spec They use a “state machine” approach that -employ architecture-specific data structures -cannot be decomposed into orthogonal components -have not been verified against higher level rules No support for verifying “programmer expectations” in multithreaded software

11 Analysis of Multithreaded Software Intra-procedural Inter-procedural Inter-threadIntra-thread More precise Memory-model insensitive More Scalable My thesis work Memory-model sensitive

12 Contributions Operational style framework - UMM Applications: Language level memory model issues Applications: Prototype tools based on various solvers: CLP, SAT, QBF Incremental SAT solving; Different encoding Intel Itanium Memory Model, Classical memory models Execution validation Race detection Atomicity verification Operational Specification Method Axiomatic Specification Method Constraint Solving Method Concurrency Analysis Non-Operational style framework - Nemos Applications: Java Memory Model, Classical memory models

13 Operational Approach: UMM 1. Supports formal verification  Integrates a model checker (Murphi)  Inspired by Park & Dill’s work on Sparc 2. Employs a generic memory abstraction  To eliminate architecture-specific complexities  Uniform notation  A parameterized method Uniform Memory Model

14 UMM Abstract Machine LIB – Local Instruction Buffer GIB – Global Instruction Buffer LIB j LIB i Thread j Thread i GIB - Only two layers - GIB can grow as needed Key insight : make it easy to configure program order and visibility order

15 General Strategy in UMM Enabling mechanism - Program order may be relaxed to enable - certain interleaving - Controlled via bypassing table Filtering mechanism - Visibility order constructed from GIB following - proper ordering requirements - Enforced in read selection rules

16 UMM Example: Sequential Consistency EventConditionAction read  i  LIB t(i) : ready(i)  op(i) = Read  (  w  GIB: legalWrite(i, w)) i.data := data(w); LIB t(i) := delete(LIB t(i), i); write  i  LIB t(i) : ready(i)  op(i) = Write GIB := append(GIB, i); LIB t(i) := delete(LIB t(i), i); Transition Table ready(i)    j  LIB t(i) : pc(j) < pc(i)  BYPASS[op(j)][op(i)] = No legalWrite(r, w)  op(w) = Write  var(w) = var(r)  (   w’  GIB : op(w’) = Write  var(w’) = var(r)  time(r) > time(w’)  time(w’) > time(w)) Program order Visibility order

17 Non-Operational Approach: Nemos Desired Features Easy to understand, flexible Precise Compositional, modular Executable Solutions Declarative (axiomatic) Predicate logic “Higher order” logic Make “hidden” rules explicit Key insights (1)Make the rules higher order - pass down the order relation through all the rules - Compositional, reusable, scalable, easy to compare (2) Make all rules explicit - Executable using a constraint-programming system (Non-operational yet Executable Memory Ordering Specifications)

18 legal ops order  requireProgramOrder ops order  requireReadValue ops order  requireWeakTotalOrder ops odder  requireTransitiveOrder ops order  requireAsymmetricOrder ops order Nemos Example: Sequential Consistency Formal Definition of SC - Program order requireTransitiveOrder ops order   i, j, k  ops. (order i j  order j k)  order i k requireProgramOrder ops order   i, j  ops. (t i = t j  pc i < pc j)  (t i = t_init  t j  t_init)  order i j - Common total order - Read sees “latest” write order is repeatedly refined Hidden rules are explicit (ops is the execution; order is the ordering relation)

19 The Itanium Memory Ordering Rules legal ops order  requireLinearOrder ops order  requireWriteOperationOrder ops order  requirePO ops odder  requireMemoryDataDependence ops order  requireDataFlowDependence ops order  requireCoherence ops order  requireReadValue ops order  requireAtomicWBRelease ops order  requireNoUCBypass ops order legal ops order  requireLinearOrder ops order  requireWriteOperationOrder ops order  requirePO ops odder  requireMemoryDataDependence ops order  requireDataFlowDependence ops order  requireCoherence ops order  requireReadValue ops order  requireAtomicWBRelease ops order  requireNoUCBypass ops order

20 –requireLinearOrder Irreflexive Transitive Total Asymmetric –requireWriteOperationOrder Local/Remote case Remote/Remote case –requireProgramOrder Acquire Rule Release Rule Fence Rule –requireMemoryDataDependence MD:RAW MD:WAR MD:WAW –requireDataFlowDependence DF:RAW DF:WAR DF:WAW – requireCoherence Local/Local case Remote/Remote case – requireReadValue ValidWr ValidLocalWr ValidRemoteWr ValidDefaultWr ValidRd – requireAutomicWBRelease – requireSequentialUC –RAR Rule –RAW Rule –WAR Rule –WAW Rule – requireNoUCBypasss Specification Hierarchy for Itanium

21 Execution Validation: Memory Model Specification Constraints How to Make an Axiomatic Specification Executable? SAT UNSAT Solver CLP SAT QBF Test Program validateExecution ops   order. legal ops order - Effective for revealing critical properties - Effective for verifying common programming patterns

22 Implementation in FD-Prolog is straightforward Universal quantification handled via enumeration Existential quantification handled via backtracking Built-in constraint solver from FD-Prolog: - logical variables - Finite-domain (FD) variables Using Constraint Logic Programming (CLP)

23 How to Encode the Ordering Relation? Given a test program with N operations, use a 2D precedence matrix with N 2 constraint variables Interpret the symbolic execution, impose constraints to the 2D matrix When interpretation finishes, x values reveal latitude in weak order When an x changes to a 1, an attempt to set it to 0 later triggers backtracking x x x j i Values of entry Mij: 1: i is ordered before j 0: i is not ordered before j x: value not bound yet Precedence matrix M nn Encoding: The Method:

24 Example of Prolog Implementation requireProgramOrder ops order   i, j  ops. (t i = t j  pc i < pc j)  (t i = t_init  t j  t_init)  order i j requireProgramOrder(Ops,Order):- for_each_elem(Ops,Order,doProgramOrder). elem_prog(doProgramOrder,Ops,Order,I,J):- nth(I,Ops,Oi), nth(J,Ops,Oj), p(Oi,P_i), p(Oj,P_j), pc(Oi,PC_i), pc(Oj,PC_j), length(Ops,N), matrix_elem(Order,N,I,J,Oij), (T_i #= T_j #/\ PC_i #< PC_j) #\/ T_i #= 0 #/\ T_j #\= 0) #=> Oij. Formal Specification (e.g., requireProgramOrder) SICStus Prolog Code

25 Interactive and Incremental Analysis Initially, a = b = 0. P1 st a,1; st b,1; P1 st a,1; st b,1; P2 ld r1,b; ld r2,a; P2 ld r1,b; ld r2,a; Can r1 = 1 and r2 = 0? P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1); P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1); Itanium Test ProgramExecution (ops) x x x x x x x x x x x x x x 0 x x x x x x x x x x x x x x x x x x 1 x x x x 0 Result: legal Order satisfying all constraintsAn instantiated Order Interleaving:

26 The SAT/QBF Approach Initially, we “retro-fit” our Prolog version with SAT- generating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation

27 Gist of Results 1. SAT seems to be better than QBF 2. The nn encoding method is better than nlogn - d espite using more bits - many unit clauses, good for SAT solving 2. Check pointing method does pay-off up to 64 tuples 3. We can easily handle 128 operations 4. Latest result: completed Intel-provided test run (experiment done by Hemanthkumar Sivaraj) - test contains 500 Itanium memory operations - had to suppress the total-order constraint, UNSAT - takes 10 sec to generate SAT instance; 0.1 sec to solve - still lots of room for improvement

28 How to Verify Programmer Expectations? Program properties e.g., race / atomicity (2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem and solve it automatically SAT UNSAT Solver Test Program Constraints (1) Define both intra-thread and inter-thread semantics as constraints Program semantics + Memory model semantics

29 Race Detection What’s a data-race? Informally: conflicting and concurrent accesses Initially, a = b = 0. Thread 1 r1 = a; if (r1 > 0) b = 1; Thread 1 r1 = a; if (r1 > 0) b = 1; Thread 2 r2 = b; if (r2 > 0) a = 1; Thread 2 r2 = b; if (r2 > 0) a = 1; Is this program race-free? Control flow interwoven with memory consistency requirements Hence, the question depends on the memory model - Under SC, this program is race-free - Under a weaker model, this program might contain races Are these two instructions conflicting and concurrent?

30 Constraints for Control Flow Treat control operations similar to memory operations –Imagine “assigns” and “uses” of “control variables” Add an auxiliary control variable c k for each branch statement k, and convert the if-statement to an auxiliary assign of c k –E.g. if(r1>0) becomes c1=r1>0 Every op k has a path predicate ctrExpr –K is a use of those control variables in ctrExpr k is feasible if ctrExpr evaluates to ture Feasibility of ops are checked when setting the rules

31 Data and Control Dependence Data/control flow can be treated similar to global read value rule, i.e., a read should see the “latest” write Global Reads: for all r = x, exists a x = … Local Reads: for all x = r, exists a r = … Control Reads: for all op that depends on c, exists a c = … requireReadValue ops order  globalReadValue ops order  localReadValue ops order  controlReadValue ops order

32 How to Formalize Data-Race? detectDataRace ops   scOrder, hbOrder. legalSC ops scOrder  requireHbOrder ops hbOrder  mapConstraints ops hbOrder scOrder  existDataRace ops hbOrder requireHbOrder ops hbOrder  requireProgramOrder ops hbOrder  requireSyncOrder ops hbOrder  requireTransitiveOrder ops hbOrder existDataRace ops hbOrder   i, j  ops. conflictingAccess i j  ¬ (hbOrder i j)  ¬ (hbOrder j i)

33 Atomicity Verification What’s Atomicity?  Informally: a block of code executed atomically  Neither a necessary nor a sufficient condition for race-freedom Our approach:  Annotate the atomic block with AtomicEnter and AtomicExit  Verify it automatically  Our definition is generic, can be fine-tuned

34 Constraints for Atomicity verifyAtomicity ops   order. legalSC ops order  existsAtomicityViolation ops order existsAtomicityViolation ops order   i, j, k  ops. matchedAtomicPair i j  (t k  t i)  ¬ (order k i)  ¬ (order j k)

35 Conclusion My thesis addressed the following issues - How to make memory ordering rules clear and executable? -How to analyze thread executions against these rules? Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry Many “customers” can benefit from our methods - Software developers, compiler writers, system designers

36 Publications Analyzing the CRF Java Memory Model (APSEC’01) Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’02) UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE) Operational Specification Method Axiomatic Specification Method Constraint Solving Method Concurrency Analysis Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’03) Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’04) A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP) QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV) Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA)

37 Continuing Research Opportunities  Scale-up our approach even further - Give up certain precision - Compositional methods - Create assertion language to help abstraction  Improve solving algorithms - Exploit the structural information  “Memory-model-sensitive” compilers - Code synthesis, optimization  Other application domains - Security, embedded systems

Thank You ! The dissertation is available at The prototype tools are available at