Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of.

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by Intel.
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
ISBN Chapter 3 Describing Syntax and Semantics.
Formal Methods in Software Engineering Credit Hours: 3+0 By: Qaisar Javaid Assistant Professor Formal Methods in Software Engineering1.
The Design Process Outline Goal Reading Design Domain Design Flow
Concurrent & Distributed Systems Lecture 4: ME solutions Lecture 3 considered possible algorithms for achieving Mutual Exclusion between the critical sections.
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
Verifying MP Executions against Itanium Orderings using SAT* Ganesh Gopalakrishnan Yue Yang Hemanthkumar Sivaraj School of Computing, University of Utah.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Lecture 13: Consistency Models
Specifying Java Thread Semantics Using a Uniform Memory Model Jason Yue Yang Ganesh Gopalakrishnan Gary Lindstrom School of Computing University of Utah.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.
A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,
Describing Syntax and Semantics
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
Presenter : Ching-Hua Huang 2013/7/15 A Unified Methodology for Pre-Silicon Verification and Post-Silicon Validation Citation : 15 Adir, A., Copty, S.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Joseph Cordina 1/11 The Use of Model-Checking for the Verification of Concurrent Algorithms Joseph Cordina Department of C.S.&A.I.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Validation - Formal verification -
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,
Agenda  Quick Review  Finish Introduction  Java Threads.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Victoria Ibarra Mat:  Generally, Computer hardware is divided into four main functional areas. These are:  Input devices Input devices  Output.
PipeliningPipelining Computer Architecture (Fall 2006)
CS161 – Design and Architecture of Computer
Memory Protection: Kernel and User Address Spaces
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Memory Protection: Kernel and User Address Spaces
Threads and Memory Models Hal Perkins Autumn 2011
Example Cache Coherence Problem
Memory Protection: Kernel and User Address Spaces
Memory Protection: Kernel and User Address Spaces
Central Processing Unit
Threads and Memory Models Hal Perkins Autumn 2009
Memory Consistency Models
Programming with Shared Memory Specifying parallelism
Chapter 13: I/O Systems.
Memory Protection: Kernel and User Address Spaces
Presentation transcript:

Analyzing the Intel Itanium Memory Ordering Rules using Logic Programming and SAT Yue Yang Ganesh Gopalakrishnan Gary Lindstrom Konrad Slind School of Computing University of Utah Work supported in part by NSF Awards CCR and , and SRC Contract

2 cpu …. mem What are Memory Ordering Rules? Aggressive load/store reorderings ‘Bypassing’ (read back own store before others) Strong orderings only at acquires/releases cpu …. mem The effects of aggressive hardware optimizations…...that are visible as out-of-order executions to a programmer st a,1 ; st b,2; ld b,2; ld a,0; cpu st c,1 ; st.rel d,2; ld.acq d,2; ld c,1; “out of order” usually means “with respect to SC”

3 Why Relaxed Ordering Rules? All modern high-end processors employ relaxed ordering rules Modern multi-threaded languages also follow suit WHY? Aggressive updates are too expensive –CPU / Memory speed mismatch getting progressively worse Enables performance enhancing optimizations at the bus / interconnect level Simplifies directory protocols (less waiting, avoid deadlocks by relaxing message traffic rules,...)

4 Contrast between `strict’ and `relaxed’ orderings Strict (e.g., Sequential Consistency) Relaxed (e.g., PRAM) Each processor’s instructions come according to program order memory They execute as if connected to a single serial memory thru a non-deterministic switch One memory per processor in effect (details omitted) No write-atomicity - only program order obeyed

5 Contrast between Relaxed Academic and Industrial Models Relaxed (e.g., PRAM) Relaxed + Strict + Hybrid +... (e.g., Itanium) See our ICCD’99 paper for a very approximate operational model Lamport et.al. have one in TLA, too...

6 Who depends on Memory Orderings? Compiler / OS developers –many of the proposed high-performance kernels exploit weakness to a high degree People who port existing code-bases –code-bases must port between platforms Implementers of thread-based systems, JVMs,.... –it has to mesh with the language-level memory model as well It is a central issue even in “uniprocessors” in which multiple threads share memory

7 A taxonomy of methods to specify industrial Relaxed Memory Models Informal “A Store Release flushes out earlier pended operations. All Store Releases appear to commit in a global total order. They allow Read Bypassing, except for non-Cacheable addresses Full Intel spec available by searching `251429’ under google –A dozen or so litmus tests also given as a supplement P1 P2 st.rel A,1; st.relB,1; ld.acq r1,A; [1] ld.acqr3,B; [1] ld r2,B; [0] ld r4,A; [0] Formal –Operational –Axiomatic

8 A taxonomy of Formal methods to specify industrial Relaxed Memory Models Operational –Operational models of industrial memory models are complex –Running them inside a standard model-checker is too slow! –Utility for verification is limited –Provides limited insight Axiomatic –Much more precise –Orderings must ideally be expressed thru an ORTHOGONAL set of rules –No such prior axiomatic specs of industrial memory models

9 How to Organize Axiomatic Memory Ordering Specs? Ad-hoc Visibility Order Based

10 Visibility Order Specs st A,1 ; st B,2; ld B [v1] ld A [v2] A memory model (spec of Memory Ordering Rules) is a mapping from executions to a set of allowed total orders called visibility orders; it is a 1-to-many mapping: st(A,1) st(B,2) ld(B,v1) ld(A,v2) ld(A,v2) ld(B,v1) st(B,2) st(A,1) Relaxed Ordering allowed too st.rel A,1 ; st B,2; ld.acq B [v1] ld A [v2] For “complex” instructions, we generate more visibility events After specifying all allowed Visibility Orders, the Load-Value Rule specifies how Loads return their values..... see below ld(A,?) st(A,1) st(A,1) st(B,2) st(B,2); ld(B,?) 0 2 st.rel(A,1), st(B,2), st.rel(A,1), st(B,2), seen in P1 seen in P2 ld.acq(B,v1), ld(A,v2) initial memory Strict Ordering Allowed { }

11 Our first contribution Developed Axiomatic, Visibility Order based Spec for most of Itanium Orderings ( semaphores will be added in next version ) –Orderings implicit in their document made explicit 3-pages of HOL as opposed to 24 pages of prose + tables –Also developed an executable constraint-Prolog version Can reason using a theorem prover –will attempt claim found in Intel’s manual about causality Written in a generic style - several other memory models specified in the same framework –pre-requisite to formally comparing memory models Comprised of orthogonal sub-rules

12 legalItanium Style of specification legalItanium(ops) = Exists order. ( constraint1 ops order /\ constraint2 ops order /\... ) Can selectively disable constraints and compare results Since the constraints are orthogonal, we can localize errors Visibility Order described by order : visevent -> visevent -> bool We use the “id” of each visevent which is an int; so order : int -> int -> bool

13 legalItanium legalItanium(ops) = Exists order. ( requireLinearOrder ops order /\ requireWriteOperationOrder ops order /\ requireProgramOrder ops order /\ requireMemoryDataDependence ops order /\ requireDataFlowDependence ops order /\ requireCoherence ops order /\ requireReadValue ops order /\ requireAtomicWBRelease ops order /\ requireSequentialUC ops order /\ requireNoUCBypass ops order )

14 requireProgramOrder requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j

15 Where do we use our Formal Spec of Memory Orderings? To help solve one of the nastiest problems encountered during Post-Silicon Validation –An MP system has just been built (boards, fan,...) –How do we certify that it obeys the memory ordering rules? Limited observability (forced to observe via “final effects” on programs) Unverified inter-module assumptions examined for the first time at GHz speeds! WHY IS POST-SILICON VERIFICATION HARD?

16 Typical Post-Si Memory Ordering Verification Approach Manual reasoning of executions generated by random tests –Highly labor intensive designers have to think through ALL ordering rules at EACH step –No systematic methods to write the tests Ad-hoc tools employed for behavior matching –No Formal Guarantees even on small executions –No insights provided upon failure –Cannot pinpoint onset of divergence from allowed behaviors

17 Our Idealized Approach to a solution (currently under development) BUILD THIS BOX !! An Arbitrary Specification of Memory Ordering Rules in HOL An Arbitrary Litmus Test, e.g.... st.rel a,1;st.relb,1; ld.acq r1,a; [V2]ld.acqr3,b;[V3] ld r2,b;[0] ld r4,a;[0] LEGAL! Explanation script + ALL bindings to V2 and V3 ILLEGAL! explanation script...

18 The first approach presented here Spec of Memory Ordering Rules Coded-up Nicely as a Constraint Logic Program An Arbitrary Ground Litmus Test, e.g.... st.rel a,1;st.relb,1; ld.acq r1,a; [1]ld.acqr3,b;[1] ld r2,b;[0] ld r4,a;[0] LEGAL! explanation script... ILLEGAL! only ground values allowed

19 The second approach presented here Spec of Memory Ordering Rules Coded-up Nicely as a Constraint Logic Program An Arbitrary Ground Litmus Test UNSAT! implies ILLEGAL! A SAT checker SAT! implies LEGAL!

20 How does Approach #1 work ? Need to know a little bit about Constraint Logic Programs (e.g., –GnuProlog, Sicstus Prolog, Mozart,... support constraints directly –Available as “free-standing” packages callable from C, Java, Ocaml,... evens_below_Y( X,Y) :- X is in (0..10), X < Y, (X mod 2) = 0 Allocates constraint-store entry for X with some user-chosen initial range called with Y = W, X unbound Imposes X=W-1 Imposes constraint (W-1) mod 2 = 0 into constraint store backtracking triggered if W is later found = 6

21 How to model requireProgramOrder (e.g.) as a Constraint Logic Program? requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j x x x i j Allocate 2D constraint-var array Interpret Litmus test, adding constraint to 2D array When Interpretation Finishes, all “x” reveals latitude in weak order When an “x” changes to a 1, an attempt to set it 0 later triggers backtracking = 1 means i is ordered before j

22 Our Prolog Code is VERY close to the HOL spec! requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j

23 Our Prolog Code is VERY close to the HOL spec! requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j ( % Rule (ACQ): ACQ>>I..... #\/ % Rule (REL): Op_j #= StRel #/\ ( IsWr_i #==> (WrType_i #= Local #/\ WrType_j #= Local #\/ WrType_i #= Remote #/\ WrType_j #= Remote #/\ WrProc_i #= WrProc_j) ).... #==> Oij. IMPOSES CONSTRAINT ON MATRIX ENTRY Oij

24 Idea behind the SAT approach ( % Rule (ACQ): ACQ>>I..... #\/ % Rule (REL): Op_j #= StRel #/\ ( IsWr_i #==> (WrType_i #= Local #/\ WrType_j #= Local #\/ WrType_i #= Remote #/\ WrType_j #= Remote #/\ WrProc_i #= WrProc_j) ).... #==> Emit Boolean Expression here (as opposed to imposing constraint on constraint-store)

25 What did we learn? A really elegant approach to study Memory Ordering Many bugs in spec caught through finite executions –Formal `paper-and-pencil’ memory ordering specs are very unreliable! Prolog Code may not scale –Prolog Quirks (memory resources scattered in stack, trail-stack, constraint-store,... - execution halts if one exhausted) –Prolog’s search may not be “as smart” as SAT’s (?) SAT generation time dominates –Pretty naive coding and CNF generation –Could scale considerably; for example: FD-solving SAT-gen SAT-vars SAT-clauses SAT-solving 22 s 200s k 0.01s Best long-term approach is the `ideal’ one mentioned earlier –(explain details if there is time)

26 Summary of Key Contributions We provide a formal specification of the entire Itanium memory ordering specification in Higher Order Logic ( barring semaphores that change the ‘data structures’ we need ) –Our Spec (3 pages of hol ) replaces 24 pages of Intel spec –Our Spec is EASIER to understand (said the Charme reviewers!) –We can now prove theorems to increase confidence We present TWO ways to use this hol spec to check executions obtained from the post-silicon environment –Encode as a Constraint-Logic program that interprets assembly executions and checks conformance with the rules –Constraint-Logic program that interprets assembly executions, and generates a SAT instance embodying conformance Our tool was given to engineers in Intel’s post-Si validation group – highly encouraging feedback obtained

27 Some of the Related Work Classical approaches –Mostly paper-and-pencil specs – Executable specs (Murphi) used to verify critical section codes Spec of the Alpha memory ordering rules in FOL/HOL –Yuan Yu (personal communication) - unpublished –VCs generated for assembly programs and given to ESC prover –Our work is for a modern system (Itanium) and uses SAT TLA+ spec of the Itanium ordering rules –Details are not published –Not amenable to execution (very slow execution speeds) –Impractical for use in checking assembly program executions

28 Questions?

29 Work in progress An Arbitrary Specification of Memory Ordering Rules in HOL An Arbitrary Litmus Test (non-ground values allowed) LEGAL! Explanation script + ALL bindings to V2 and V3 ILLEGAL! explanation script... Generate a QBF formula for the size of the Litmus test DNF representation of Litmus test (“ROM”) Generate “compact” CNF QBF Solver QBF is natural for memory ordering rules