Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing,

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Lecture 12 Reduce Miss Penalty and Hit Time
Hierarchical Cache Coherence Protocol Verification One Level at a Time through Assume Guarantee Xiaofang Chen, Yu Yang, Michael Delisi, Ganesh Gopalakrishnan.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by Intel.
Ensuring Robustness via Early- Stage Formal Verification Multicore Power Management: Anita Lungu *, Pradip Bose **, Daniel Sorin *, Steven German **, Geert.
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Reporter:PCLee With a significant increase in the design complexity of cores and associated communication among them, post-silicon validation.
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Specification and Encoding of Transaction Interaction Properties Divjyot Sethi Yogesh Mahajan Sharad Malik Princeton University Hardware Verification Workshop.
1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan.
The Design Process Outline Goal Reading Design Domain Design Flow
Design For Verification Synopsys Inc, April 2003.
1 A Compositional Approach to Verifying Hierarchical Cache Coherence Protocols Xiaofang Chen 1 Yu Yang 1 Ganesh Gopalakrishnan 1 Ching-Tsun Chou 2 1 University.
Sim2Imp (Simulation to Implementation) Breakout J. Wawrzynek, K. Asanovic, G. Gibeling, M. Lin, Y. Lee, N. Patil.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Verification of Hierarchical Cache Coherence Protocols for Future Processors Student: Xiaofang Chen Advisor: Ganesh Gopalakrishnan.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
Chapter 17 Parallel Processing.
1 Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Joint work with Xiaofang Chen (PhD student) Ching-Tsun Chou (Intel Corporation,
Partial Order Reduction for Scalable Testing of SystemC TLM Designs Sudipta Kundu, University of California, San Diego Malay Ganai, NEC Laboratories America.
1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing,
Utah Verifier Group Research Overview Robert Palmer.
Counterexample Guided Invariant Discovery for Parameterized Cache Coherence Verification Sudhindra Pandav Konrad Slind Ganesh Gopalakrishnan.
1 Formal Engineering of Reliable Software LASER 2004 school Tutorial, Lecture1 Natasha Sharygina Carnegie Mellon University.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee Xiaofang Chen 1, Yu Yang 1, Ganesh Gopalakrishnan 1, Ching-Tsun.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
The chapter will address the following questions:
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Presenter : Cheng-Ta Wu Vijay D’silva, S. Ramesh Indian Institute of Technology Bombay Arcot Sowmya University of New South Wales, Sydney.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Extreme Makeover for EDA Industry
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Design methodologies.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
1 Introduction to Software Engineering Lecture 1.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,
Agenda  Quick Review  Finish Introduction  Java Threads.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Introduction to Software Engineering 1. Software Engineering Failures – Complexity – Change 2. What is Software Engineering? – Using engineering approaches.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Xiaofang Chen1 Yu Yang1 Ganesh Gopalakrishnan1 Ching-Tsun Chou2
Michael D. Jones, Ganesh Gopalakrishnan
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
CMSC 611: Advanced Computer Architecture
Lecture 2: Snooping-Based Coherence
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Presentation transcript:

Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing, University of Utah, Salt Lake City, UT {ganesh, GRC CADTS Review, Berkeley, March 18, 2008 Supported by SRC Contract TJ-1318 (Intel Customization)

2 Multicores are the future! Their caches are visibly central… (photo courtesy of Intel Corporation.) > 80% of chips shipped will be multi-core

3 Hierarchical Cache Coherence Protocols will play a major role in multi-core processors Chip-level protocols Inter-cluster protocols Intra-cluster protocols dir mem dir mem …  State Space grows multiplicatively across the hierarchy!  Verification will become harder

4 Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability). From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.

5 Future Coherence Protocols  Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng] Producer-consumer sharing pattern-aware protocol [Cheng et.al, HPCA07]  21% speedup and 15% reduction in network traffic Interconnect-aware coherence protocols [Cheng et.al., ISCA06]  Heterogeneous Interconnect  Improve performance AND reduce power  11% speedup and 22% wire power savings Bottom-line: Protocols are going to get more complex!

6 Main Result #1 : Hierarchical RAC L2 Cache+Local Dir L1 Cache Main Mem Home ClusterRemote Cluster 1Remote Cluster 2 L1 Cache Global Dir RAC L2 Cache+Local Dir L1 Cache RAC L2 Cache+Local Dir L1 Cache Intra-cluster Inter-cluster Developed way to reduce verification complexity of hierarchical (CMP) protocols using A/G

7 Main Result #2 : Refinement Developed way to Verify a Proposed Refinement of ONE unit into its low level (RTL) implementation

8 Main Result #2 : Refinement Developed way to Verify a Proposed Refinement of ONE unit into its low level (RTL) implementation Murphi

9 Main Result #2 : Refinement Developed way to Verify a Proposed Refinement of ONE unit into its low level (RTL) implementation Murphi

10 Main Result #2 : Refinement Developed way to Verify a Proposed Refinement of ONE unit into its low level (RTL) implementation Murphi HMurphi

11 Differences in Modeling: Specs vs. Impls home remote One step in high-level Multiple steps in low-level an atomic guarded/command home router buf remote

12 Our Refinement Check Spec( I ) I Spec( I ’) Spec transition Multi-step Impl transaction I’ Guard for Spec transition must hold I is a reachable Impl state Observable vars changed by either must match

13 Workflow of Our Refinement Check Hardware Murphi Impl model Product model in Hardware Murphi Product model in VHDL Murphi Spec model Property check Muv Check implementation meets specification

14 Anticipated Future Result Developed way to Verify a Proposed Refinement of the ENTIRE hierarchy

15 Anticipated Future Result Deal with pipelining Sequential Interaction Pipelined Interaction

16 Anticipated Future Result Develop ways to “tease apart” protocols that are “blended in” e.g. for power-down or post-si observability enhancement More protocols….. do they interfere?

17 Basics  PI : Ganesh Gopalakrishnan  Industrial Liaisons : Ching Tsun Chou (Intel), Steven M. Geman (IBM), John W. O’Leary (Intel), Jayanta Bhadra (Freescale), Alper Sen (Freescale), Aseem Maheshwari (TI)  Primary Student : Xiaofang Chen  Graduation Date : Writing PhD Dissertation; in the market  Other Students: Yu Yang (PhD), Guodong Li (PhD), Michael DeLisi (BS/MS)  Anticipated Results: Hierarchical : Methodology for Hierarchical (Cache Coherence) Protocol Verification, with Emphasis on Complexity Reduction (was in original SRC proposal) Refinement : Methodology for Expressing and Verifying Refinement of Higher Level Protocol Descriptions (not in original SRC proposal)

18 Basics  Deliverables (Papers, Software, Xiaofang’s Dissertation) Hierarchical:  Methodology for Applying A/G Reasoning for Complexity Reduction  Verified Protocol Benchmarks – Inclusive, Non-Inclusive, Snoopy (Large Benchmarks)  Automatic Abstraction Tool in support of A/G Reasoning Refinement:  Muv Language Design (for expressing Designs)  Refinement Checking Theory and Methodology  Complete Muv tool implementation

19 What’s Going On  Accomplishments during the past year Hierarchical:  Finishing Non-inclusive Hierarchical Protocol Verif  Developing and Verifying a Hier. Protocol with a Snoopy First Level

20 Insert Table of Hier + Snoopy Here

21 What’s Going On  Accomplishments during the past year (contd.) Refinement:  HMurphi was fleshed out in great detail  Most of Muv was implemented (a large portion during IBM T.J. Watson Internship) – joint work with Steven German and Geert Janssen

22 What’s Going On  Future directions Hierarchical + Refinement  Develop ways to verify hierarchies of HMurphi modules interacting Pipelining Teasing out protocols supporting non-functional aspects  Power-down protocols  Protocols to enhance Post-si Observability Architectural Characterization  How do we describe the “ISA” of future multi-core machines?  How do we make sure that this ISA has no hidden inconsistencies

23 What’s Going On  Technology Transfer & Industrial Interactions With Liaisons  Publications FMCAD 06, 07, HLDVT 07, TECHCON 07 (best session paper award), Journal paper (under prep), Dissertation (under prep) Request to IBM for Open-sourcing Muv has been placed

24 Overview of “Hierarchical”  Given a protocol to verify, create a verification model that models a small number of clusters acting on a single cache line Verification Model Inv P Home Remote Global directory

25 2. Exploit Symmetries  Model “home” and the two “remote”s (one remote, in case of symmetry) Verification Model Inv P

26 4. Initial abstraction will be extreme; slowly back-off from this extreme… Inv P1 Inv P2 Inv P3  P1 fails  Diagnose failure  Bug  report to user  False Alarm  Diagnose where guard is overly weak  Add Strengthening Guard  Introduce Lemma to ensure Soundness of Strengthening

27 Overview of Theory Involved

28 3. Create Abstract Models (three models in this example) Inv P Inv P1Inv P2 Inv P3

29 Step 1 of Refinement Inv P1 Inv P2 Inv P3 Inv P1 Inv P2 Inv P3’

30 Step 2 of Refinement Inv P1 Inv P2 Inv P3 Inv P1 Inv P2 Inv P3’ Inv P1 Inv P2’ Inv P3’

31 Final Step of Refinement Inv P1 Inv P2 Inv P3 Inv P1 Inv P2 Inv P3’ Inv P1’ Inv P2’ Inv P3’ Inv P1 Inv P2’ Inv P3’’

32 Detailed Presentation of Refinement Note: Three examples have been presented in full detail at

33 Here, arrange the rest of the slides + the new ones you are making as you feel best. Most of the remaining slides are quite good, so your work need not include any “clean-up” but just delete those already covered…

34 Project Summary: Year 2  Verification of hierarchical cache coherence protocols Non-inclusive multicore benchmark Compositional approach one level a time Can reduce >95% explicit state space  Refinement check: protocol RTL Impls vs. Specs Refinement theory and methodology Compositional approach theory  Publications FMCAD 2007, HLDVT 2007 TECHCON 2007 (best session paper award)

35 Yearly Summary:  Refinement check: protocol RTL Impls vs. Specs A comprehensive tool path Can find bugs on RTL protocols with realistic features A simple pipelined stack example  Verification of hierarchical cache coherence protocols A snoop multicore protocol benchmark

36 A Simple Snoop Multicore Protocol  Motivation: Snoop protocols commonly used in 1st level of caches Have applied our approach on directory protocols How about snoop protocols?

37 Applying Our Approach  Abstracted protocols  Model checking results

38 Cycle accurate RTL level Refinement Check Spec vs. Impl Specification Abstraction level Model size

39 Differences in Execution: Specs vs. Impls Interleaving in HL Concurrency in LL

40 Our Approach of Refinement Check  Modeling Specification: Murphi Implementation: Hardware Murphi  Use transactions in Impl to relate to Spec  Verification Muv: Hardware Murphi  synthesizable VHDL Tool: IBM SixthSense and RuleBase

41 What Are Transactions?  Group a multi-step execution in implementations Spec Impl

42 Outline Project background  Extensions to the tool path  Experimental results  Future work

43 Tool Path  Initial efforts from IBM By German and Janssen Hardware Murphi language Muv: Hardware Murphi  Synthesizable VHDL  Our extensions -- enable refinement check Language extensions Muv extensions

44 Basic Features of Hardware Murphi vs Murphi … signal s1, s2 … s1 <= … chooserule rules; end; … firstrule rules; end; … transaction rule-1; rule-2; … end; …

45 Language Extensions to Hardware Murphi (I) --include spec.m correspondence u1[0..7] :: v1[1..8]; u1 :: v2; end;  Directives  Joint variables correspondence

46 Language Extensions to Hardware Murphi (II) transactionset p1:T1; p2:T2 do transaction … end;  Transactionset rule:id guard ==> action; ruleset p1:T1; p2:T2 do rule:id … end;  Rules with IDs

47 Language Extensions to Hardware Murphi (III) >; >; …  Execute a rule by ID var[i] <:= data;  Fine-grained assignments for write-write conflicts

48 How to Annotate an Impl Model with Spec? … transaction rule-1 g1  a1; rule-2 g2  a2; end; … rule:id g  a; … impl.m spec.m

49 How to Annotate an Impl Model with Spec? --include spec.m correspondence u1 :: v1 ; … end ; … transaction rule-1 g1  a1; >; rule-2 g2  a2; end; … … rule:id g  a; … impl.m spec.m

50 The Framework of Muv Hardware Murphi model AST parser AST’ pre- processor AST’’ refinement check analysis VHDL model translator Constant propagation rule:id, > ruleset, transactionset …

51 Our Extensions to Muv  Language extension support  Refinement check assertions generation Ensure exclusive write to a variable Serializability for Spec rules Enableness for Spec rules Joint variables equivalence when inactive  Mostly done with static analysis

52 Refinement Extensions to Muv (I) v := d; for i: s1..s2 do assert (update_bits[i] = false); end; v := d; for i: s1..s2 do update_bits[i] := true; end;  No write-write conflicts

53 Refinement Extensions to Muv (II)  Serializability for specification rules S0S0 S1S1 S0S0 S1S1 t1t1 t2t2 t3t3 S’ 1 S’ 2 t1t1 t2t2 t3t3  Obtain read and write sets of variables of each rule  Analyze read-write dependency  Check for cycles

54 Check for Dependency Cycles S0S0 S1S1 S0S0 S1S1 t1t1 t2t2 t3t3 S’ 1 S’ 2 t1t1 t2t2 t3t3 t 3 write v 2 t 3 read v 1 r(v1) w(v 3) t 2 write v 1 t 2 read v 2 t1

55 Refinement Extensions to Muv (III) rule:id guard  action; bool function id_guard() {…} void procedure id_action(…) {…}  Enableness of specification rules >; assert id_guard(); id_action();

56 Refinement Extensions to Muv (IV)  Joint variables equivalence when inactive  For each joint variable v When all transactions that write to v are inactive v must be equivalent in Impl and Spec … Transaction T1 … Transaction T2 … … Assert inactive(T1) & inactive(T2) => v = v’;

57 Outline Project background Extensions to the tool path  Experimental results  Future work

58 A Driving Protocol Benchmark S. German and G. Janssen, IBM Research Tech Report 2006 Buf Remote DirCache Mem Router Buf Local Home Remote DirCache Mem Local Home

59 More Detail of the Cache Example  Hardware Murphi model ~2500 LOC 15 transactionsets  Generated VHDL ~1800 assertions, of which ~1600 are write-write conflicts check assertions Took ~16min with SixthSense for all assertions Took ~13min w/o write-write conflicts check

60 Bugs Found with Refinement Check  Benchmark satisfies cache coherence already  Bugs still found Bug 1: router unit loses messages Bug 2: home unit replies twice for one request Bug 3: cache unit gets updated twice from 1 reply  Refinement check is an automatic way of constructing such checks

61 Model Checking Approaches  Monolithic Straightforward property check  Compositional Divide and conquer

62 Compositional Refinement Check  Reduce the verification complexity  Basic Techniques Abstraction  Removing details to make verification easier Assume guarantee  A simple form of induction which introduces assumptions and justifies them

63 Experimental Results Verification Time 1-bit 10-bit 1-day Datapath  Configurations 2 nodes, 2 addresses, SixthSense 30 min Monolithic approach Compositional approach

64 A Simple 2-Stage Pipelined Stack pipelined pushes pipelined pops overlapped pop & push  Push: push data + increase counter  Pop: decrease counter + pop data

65 Outline Project background Extensions to the tool path Experimental results  Future work

66 Future Work  Muv-like refinement check for interaction modules RTL modules interaction via communication protocols Interfaces involving buffers and pipelining  Refinement of initial RTL protocols Power-down issues Post-silicon validation support Runtime verification support Safe augmentation of verified protocols Cheap re-verification