Formal Design and Verification Methods for Shared Memory Systems Ratan Nalumasu Dissertation Defense September 10, 1998.

Slides:



Advertisements
Similar presentations
Verification of architectural memory models by model checking Shaz Qadeer Compaq Systems Research Center
Advertisements

Functional Decompositions for Hardware Verification With a few speculations on formal methods for embedded systems Ken McMillan.
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
Multiple Processor Systems
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Translation-Based Compositional Reasoning for Software Systems Fei Xie and James C. Browne Robert P. Kurshan Cadence Design Systems.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by Intel.
ESP: A Language for Programmable Devices Sanjeev Kumar, Yitzhak Mandelbaum, Xiang Yu, Kai Li Princeton University.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Presenter: PCLee – This paper outlines the MBAC tool for the generation of assertion checkers in hardware. We begin with a high-level presentation.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Software Reliability CIS 640 Adapted from the lecture notes by Doron Pelel (
1 Formal Methods in SE Qaisar Javaid Assistant Professor Lecture 05.
Cache Coherence in Scalable Machines (IV) Dealing with Correctness Issues Serialization of operations Deadlock Livelock Starvation.
Verification of Hierarchical Cache Coherence Protocols for Future Processors Student: Xiaofang Chen Advisor: Ganesh Gopalakrishnan.
Transaction Ordering Verification using Trace Inclusion Refinement Mike Jones 11 January 2000.
Verifying Conformance to Memory Models: the Test Model-checking Approach Ganesh Gopalakrishnan (funded by NSF) presenting work done by Ratan Nalumasu.
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
Formal Verification of Shared Memory Systems During their Design Ganesh Gopalakrishnan Department of Computer Science University of Utah
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Advances in Language Design
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Using Model-Checking to Debug Device Firmware Sanjeev Kumar Microprocessor Research Labs, Intel Kai Li Princeton University.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
Scientific Computing By: Fatima Hallak To: Dr. Guy Tel-Zur.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Overview of Formal Methods. Topics Introduction and terminology FM and Software Engineering Applications of FM Propositional and Predicate Logic Program.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
Distributed Database Systems Overview
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
A Mechanized Model for CAN Protocols Context and objectives Our mechanized model Results Conclusions and Future Works Francesco Bongiovanni and Ludovic.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
Verification & Validation By: Amir Masoud Gharehbaghi
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Agenda  Quick Review  Finish Introduction  Java Threads.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
Advanced Computer Systems
Distributed Shared Memory
Lecture 18: Coherence and Synchronization
Ivy Eva Wu.
Specifying Multithreaded Java semantics for Program Verification
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Presentation transcript:

Formal Design and Verification Methods for Shared Memory Systems Ratan Nalumasu Dissertation Defense September 10, 1998

9/10/1998Design Complexity2 Problems Facing Digital Design Complexity Longer design time Shorter time to market

9/10/1998Design Complexity3 Current Debugging Technology +Full model –Partial examination  No assurance –Weaker properties –Difficult correctness metrics –Full model

9/10/1998Introduction to FM4 Formal Methods Formal methods = Math based techniques Continuous math : Engineering = Discrete math : Digital system design “It is what the designers want. It’s just challenging to prove.”

9/10/1998Introduction to FM5 Formal Methods based Design –Reduced model +Complete examination +Better assurances (on the reduced model) +Stronger property language +Better correctness metrics +Reduced model

9/10/1998Introduction to FM6 FM Taxonomy Manual verification techniques: Interactive theorem provers Automatic verification techniques: Model checkers Compilation techniques: Refinement rules

9/10/1998Theorem Provers7 Interactive Theorem Provers +Can deal with infinite state systems –Extensive manual reasoning Proof of a compilation scheme +Good for algorithm verification

9/10/1998Model Checking8 process p(x) { global G; local L; while (...) { recv...; send...; } process q(x,y) (G=0, p.L=0,...)

9/10/1998Model Checking9 Model Checking Strengths Automatic If property fails, model checker shows the error trace –Deadlock: How initial state reached it –Assertion: How initial state reached it –Starvation: A loop where no progress is made

9/10/1998Model Checking10 Model Checking: Example Construct graph of the system, and check the property: Deadlock at (22) State Explosion Partial Order Reductions

9/10/1998Refinement Algorithms11 Refinement Algorithms Need to verify only high-level protocols Domain-specific compilers can generate efficient implementations Refinement rules for DSM protocols

9/10/1998Applied FM12 State of the art of Applied FM +General purpose +Widely applicable techniques –Inefficient algorithms –Inefficient “compilers” –Do not help with domain specific concerns

9/10/ Thesis Statement Domain specific formal methods Efficient verification techniques Address domain specific concerns Domain: Memory CPU Memory

9/10/ Overview Introduction to formal verification K Shared memory systems Contributions Conclusions

9/10/1998Memory Bottleneck15 Memory Bottleneck Processor speed increases at 55% a year, while memory speed increases at 7% –Caches Tendency toward multiprocessors –Further imbalance  complex protocols –SMP systems –DSM systems

9/10/1998SMP Architecture16 Symmetric Multiprocessors Can scale upto 10s of processors Modern caches have support for such SMP protocols CPU $ Memory CPU $ CPU $

9/10/1998SMP Protocols17 SMP Protocol Design Bus protocols –Bus arbitration algorithm –Cache invalidation scheme –Lack of atomicity on the bus Bus and CPU interaction –Does CPU have out-of-order execution? –Does bus allow out-of-order completion? Are these decisions visible to software?

9/10/1998DSM Architecture18 Distributed Shared Memory NODE MEM Network Each node may be a SMP or a single CPU

9/10/1998DSM Protocols19 DSM Protocol Design Network port arbitration Coherency maintenance across the network –Maintaining distributed state –Little atomicity –“Ghost” messages –Transient states Are these decisions visible to software?

9/10/1998Shared Memory Systems20 Shared Memory Correctness Low level: –deadlock –forward progress –bus arbitration Intermediate level: –at most one owner of a cache line at a time High-level: –abstraction provided to the software

9/10/1998Software Interface21 Abstraction Provided to Software Multiprocessor: P1 write(a,new) read(b) P2 write(b,new) read(a) P1 read(b) write(a,new) P2 read(a) write(b, new) Not ok under S.C. Uniprocessor: P1 write(a,new) read(b) P1 read(b) write(a,new) ok cache/compiler/ out-of-order execution Test model checking

9/10/ Overview Introduction to formal verification Shared Memory systems K Contributions –mitigating state explosion Partial order reduction algorithm –facilitating high-level design Protocol synthesis algorithm –enhancing applicability High-level correctness such as SC Conclusions

9/10/1998Contributions23 Contributions Protocol PO algorithm 1 Test Model checking 2 2 Refinement rules 3 Efficient implementation 3

Contribution #1 Mitigating State Explosion Problem Partial Order Reductions

9/10/1998PO Reductions25 Partial Order Reductions If two transitions are independent, then explore one of them postponing the other

9/10/1998PO Reductions26 Ignoring Problem Select some transitions, and postpone others  but do not postpone forever S0 S1 Postponed

9/10/1998PO Reductions27 Proviso based Solution Godefroid, Valmari, Holzmann, Peled’s solutions are very similar: Proviso –Expands the “last” state of the loop completely S0 S1 Postponed Expand

9/10/1998PO Reductions28 Problem with Proviso Q postponed ALL 9 states 0 12P 0 12Q

9/10/1998PO Reductions29 Our Algorithm: 2-phase P 0 12Q Only 5 states

9/10/1998PO Reductions30 Performance Comparison (20x)

Contribution #2 Facilitating High-level Design Protocol Refinement

9/10/1998Refinement Algorithms32 Protocol Refinement PO reductions not sufficient, theorem provers ruled out Compile from high-level protocol specification –easier to design –easier to verify –can generate efficient implementation using domain knowledge

9/10/1998Refinement Algorithms33 Unexpected Messages P recv ack from Q Send a req to Q Some request ??? Always nack  no forward progress Always Silence  Deadlock

9/10/1998Refinement Algorithms34 Refinement Procedures Debug the high-level specification: Synchronous communication with no transient states Automatic refinement procedures transforms it into detailed implementation –No need to verify the implementation –Needs domain specific knowledge for efficiency

9/10/1998Refinement Algorithms35 Related Work Buckley & Silberschatz, 83 –For OS environments, not fit for hardware Gribomont,90 –Protocols where synchronous messages can be simply replaced by asynchronous messages

9/10/1998Refinement Algorithms36 Related Work (contd) Teapot, 96 for DSM systems (Chandra) –Protocol programming language –“Suspend” construct for transient states –Not high-level: Suspend states still specify what to do in a transient state

9/10/1998Refinement Algorithms37 Context: DSM Protocols Network Protocol per each cache line 1 home, n “remote” nodes per each line Home is responsible for maintaining consistency (Hub) NODE MEM NODE MEM NODE MEM

9/10/1998Refinement Algorithms38 Refinement Rules Req Ack or Nack Home Remote Req Ack or Nack HomeRemote

9/10/1998Refinement Algorithms39 Refinement Rules (2) Req1 is ignored by both processes HomeRemote Req1 Req2 Ack or Nack

9/10/1998Refinement Algorithms40 Debugging Effort Protocol compilation scheme has been proved using a theorem prover

Contribution #3 Enhancing Applicability Shared Memory Model Verification

9/10/1998Test Model Checking42 Relaxing Instruction Orders P1 write(a,new) read(b) P2 write(b,new) read(a) P1 read(b) write(a,new) P2 read(a) write(b,new) Under SC

9/10/1998Test Model Checking43 Verification of HW/SW Interface SC: The result can be explained by some interleaving of the instructions. Test model checking CPU $ Memory CPU $ CPU $

9/10/1998Test Model Checking44 Current Verification Techniques Simulation –Must study lengthy executions –Must choose non-trivial programs Formal techniques (next slide)

9/10/1998Test Model Checking45 Related Work Graf’s Lazy caching in ACTL* Gibbons approach  run programs and check if the results are SC McMillan’s thesis  data abstraction for a test Hojati  data abstraction in a different context Undecidability result by Alur et al

9/10/1998Test Model Checking46 ACTL* for (stronger than) SC AG(enabled( read(a,d) ))  avail(a,d) AG(avail(a,d) AND EF(enable(read(a,d))))  A[NOT avail(a,d) W AG NOT avail(a,d)]... init  AG[after(write(a,d))  A(NOT enabled(read(a,d) W avail(a,d))] Such MODEL DEPENDENT SPECS do not fit in an iterative industrial frame

9/10/1998Test Model Checking47 Test Model Checking Adaptation of simulation to model checking –model checking (full coverage) + testing (“black box approach’’) Tests are independent of the model being verified  manual effort is considerably reduced –Test model-checking can be used early in the design cycle

9/10/1998Test Model Checking48 Results Defined a shared memory description language –“data is not used for control decisions” –“addresses are symmetric” –Can specify HP’s Runway/PA,... Model checking technique –“Small number of addresses is sufficient” Application to runway/PA using PV

9/10/1998Test Model Checking49 If P1 executes two write instructions, then P2 sees them in the program order of P1 P1 A := 1 A := 2 A := A := k P2 X1 := A X2 := A X3 := A.... Xk := A Many deficiencies Read Order, Write Order X(i+1)  X(i)

9/10/1998Test Model Checking50 Deficiencies of the Test Finite k –What if an error occurs for a really large k? Location “A” is never written by P2 –What if an error occurs when the ownership changes? Only 1-address –The definitions of RO and WO are not restricted to a single address at a time –How many addresses to consider?

9/10/1998Test Model Checking51 Data abstraction + non-determinism Unbounded k rd(1) rd(0) rd(1) wr(0) wr(1) Non-deterministic change

9/10/1998Test Model Checking52 Ownership Changes rd(1) rd(0) rd(1) wr(0) wr(1) or rd(-) or wr(2) Complete 1-address test

9/10/1998Test Model Checking53 2-address (RO, WO) test wr(1) rd(0) rd(-) OR wr(0) rd(-) OR wr(2) rd(-) OR wr(1) rd(1) rd(A,-) OR rd(B,-) OR wr(A,0) OR wr(B,0) rd(A,-) OR wr(A,1) OR rd(B,-) OR wr(B,1) rd(A,-) OR or rd(B-) OR wr(A,2) OR wr(B,2) rd(B,1)wr(A,1) rd(A,0)

9/10/1998Test Model Checking54 2-address (RO, WO) test rd(A,-) OR rd(B,-) OR wr(A,0) OR wr(B,0) rd(A,-) OR wr(A,1) OR rd(B,-) OR wr(B,1) rd(A,-) OR or rd(B-) OR wr(A,2) OR wr(B,2) rd(B,1)wr(A,1) rd(A,0)

9/10/1998Test Model Checking55 Complete Test for (RO, WO) Theorem: A system implements (RO, WO) if and only if it has no errors on all 1- and 2-address programs Complete 1-address and 2-address tests

9/10/1998Test Model Checking56 Program Order PO generalizes RO and WO to include orderings between a read followed by write, and write followed by read rd(A) rd(B) wr(A) rd(B) RO RW WR PO

9/10/1998Test Model Checking57 All processors agree on the order of writes –WO imposes the order only if the writes are from same program Write Atomicity wr(A,0) wr(B,1) SC is (PO, WA)

9/10/1998Test Model Checking58 1-address SC test ORDER: 1, 4 OR 4, 1P0 A := 0 rd(A) A := 1 A := 2 rd(A)P1 A := 3 rd(A) A := 4 A := 5 rd(A) Barrier

9/10/1998Test Model Checking59 Complete Tests for SC Theorem: A system with N processors implements SC if and only if it has no errors on programs n<N address programs Scheme for N processors –N barriers –Data written before, at, and after barrier are different data 0, 1, 2 for P0, and data 3, 4, 5 for P1

9/10/1998Test Model Checking60 Case Studies Serial memory (operational semantics of SC) Lazy caching Runway/PA system model –Bus based design –An aggressive split transaction protocol –Out-of-order completion of transactions on Runway for high-performance –In-order completion of instructions in PA for sequential consistency

9/10/1998Test Model Checking61 Test Model checking of HP/Runway

9/10/ Conclusion Showed that specializing formal methods for a particular domain (shared memory) leads to efficient verification techniques for the domain, and increases the applicability of the formal methods –Two phase algorithm –Refinement procedure –Memory model verification

9/10/ Future Work Model checking algorithms –better partial order algorithms –tune for test model checking Protocol synthesis –More optimizations Test model checking –Weaker memory models, other objects –Application to other fields

9/10/1998Model Checking64 h!msg (Remote) T h?m h?m2 (Home) T r(i)?m r(j)!m’ Communication States

9/10/1998Model Checking65 Debugging Effort Protocol compilation scheme has been proved using a theorem prover