Download presentation
Presentation is loading. Please wait.
1
Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors Intel SRC Customization Award 2005-TJ-1318 Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported in part by SRC Contract 1031.001, NSF Award 0219805 and an equipment grant from Intel Corporation
2
2 Project Personnel Intel Mentor : Dr. Ching Tsun Chou - Outstanding Mentor ’05 Students: - Ritwik Bhattacharya - Defended PhD! - Dr. Steven German was mentor - Outstanding Mentor ‘04 - Joining Intel cache coherence protocol team in Portland - Xiaofang Chen (mentored by Chou) - Joined this project in June 2005 Visitors : - Igor Melatti (postdoc from 7/05 to 12/05)
3
3 The problem addressed: Bridge the gap between protocol “reality” and protocol verification capability * Shared memory multiprocessors employ protocols at several levels : - Chip-level protocols - Intra-cluster protocols - Inter-cluster protocols * Impossible to formally model all these levels and verify : - Currently, protocol verification via model-checking done per level - Limited to 2-4 nodes - Parameterized verification shows promise; more work needed - Protocols at each level are separately verified; cross-cutting issues are only informally captured or verified * Formal Methods are badly needed to bridge verification gaps
4
4 Two axes of decomposition: Spatial and Functional Chip-level protocols Inter-cluster protocols Intra-cluster protocols SPATIAL FUNCTIONAL Issues: Ordering issues Forward Progress Resource conflicts … Each issue above is studied end-to-end mem … dir mem … dir Issues: Different kinds of protocols (write-back vs write-thru) Changes in size of coherence unit Each level is separately modeled and verified
5
5 Cross-cutting Issues in Hierarchical Protocols * How are coherence, memory ordering, and IO ordering handled at each level of the protocol ? * Which aspects of memory orderings realized at various levels ? * How to abstractly represent each interacting protocol (A/G) ? … dir Chip-level protocols Inter-cluster protocols Intra-cluster protocols mem
6
6 Proposed Approach Many complex issues – we will focus on a few: -Spatial Decomposition : - Verify safety properties of the protocol of interest (POI) - Model the interacting environment protocols (EP) - Develop an assume / guarantee approach that scales - Functional Decomposition: - Each functional aspect may need its own custom handling - e.g. Forward progress is guaranteed through a variety of techniques (ticketing, buffer reservation, converging retries) - We will focus on functional aspect amenable to more general attack … namely memory ordering issues
7
7 Anticipated Primary Result (from Catalog Page) Knowledge of how to verify hierarchical memory consistency protocols and a tool framework for intuitive protocol modeling and scriptable verification
8
8 Re-enacting what has happened so far… 1. Choose good raw material 2. Develop infrastructure 3. Develop driving examples 4. Develop verification solutions
9
9 Outline of Presentation: 1.Overview of work during this project 2.Brief Overview of PAM (Pred Abs for Murphi) 3.Brief Overview of BT-Murphi 4.Brief Overview of Eddy_Murphi 5.Creation of a benchmark Hierarchical Cache Coherence Protocol (60 pgs. of Murphi) 6.A “circular” A/G reasoning method for verifying this protocol for a finite instance -Could not verify home || remote1 || remote2 directly -Could verify home || REMOTE1 || REMOTE2 and HOME || remote1 || REMOTE2 plus lemmas, and argue that it is equivalent.
10
10 Re-enacting… 1.Choose good raw material a. Explicit state enumeration is still the best “core” approach to verify cache coherence protocols b. Murphi is still the best language and verifier for our purposes
11
11 Re-enacting … 2. Develop infrastructure a. Developed POeM (Partial Order enabled Murphi) -- Software released -- Ritwik Bhattacharya -- SPIN’06 regular paper b. Developed Predicate Abstraction tool for Murphi using CVC-Lite -- Software released -- Xiaofang Chen – U of U TR c. Developed a Bounded Transaction (BT) version of Murphi Software released – Xiaofang Chen – U of U TR d. Developed a Parallel and Distributed version of Murphi called Eddy Murphi -- Software released – Igor Melatti et al SPIN’06 regular paper e. An Assume / Guarantee Method for Verifying Hierarchical Cache Coherence Protocols – soon to be submitted to an FM conference requiring double-blind reviews
12
12 Re-enacting … 3. Develop Driving Examples a.Modeled an SGI protocol in Murphi (non-hierarchical but large) – Xiaofang Chen b.Currently developing hierarchical protocol with the help of Ching Tsun Chou First version DONE! Nudges from afar towards “realistic” features (avoids IP issues) Combination of FLASH (chip level protocol) and DASH (inter-chip protocol) Protocol has been developed (Chou’s Utah visit helped immensely!) Being debugged using all the Murphi’s we have DONE! Also being debugged through “mock” Assume / Guarantee debugging models DONE! WILL NOTE DOWN bug scenarios and abstractions for use in “real” A/G proof DONE!
13
13 Re-enacting … 4. Develop Verification Solutions. In progress (Xiaofang’s PhD research)
14
Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors Intel SRC Customization Award 2005-TJ-1318 Xiaofang Chen (PhD expected in 2007) Ganesh Gopalakrishnan* (PI) School of Computing, University of Utah, Salt Lake City, UT * Past work supported in part by SRC Contract 1031.001, NSF Award 0219805 and an equipment grant from Intel Corporation EXTRA FOILS : RECENT RESULTS
15
15 Multiple-Chip Multiprocessor
16
16 Our approach to designing our benchmark Built and verified each level of the protocol separately Combined the protocols and verified the full model
17
17 The Prototype Hierarchical Protocol We Designed Two level MESI protocols –Modified (dirty), Exclusive (clean), Shared, Invalid L2 inclusive of L1 Support silent-drop and write-back Support backward-invalidation Use non-FIFO network ordering
18
18 M-CMP Protocol Details An M-CMP protocol was designed by Xiaofang Chen with `nudges towards realism’ by our industrial mentor Ching Tsun Chou Protocol design took nearly a month Explicit state enumeration of just THREE nodes does not finish model checking (typical industrial situation!) Cannot verify by separately verifying the clusters in the hierarchical coherence protocol (not obvious, at least)
19
19 Overall Architecture (rough) as modeled for verification RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache L1 Cache L1 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 3000+ lines (60+pages) of Murphi just to model this picture !
20
20 Protocol Actions: One Example
21
21 Data Structure for one Cluster RAC L2 Cache L1 Cache L1 Cache Intra-cluster details Inter-cluster details
22
22 State Representation in the Hierarchical System
23
23 Verification Approach: Verify this abstraction first… RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 * State representation * 31,919,219 states
24
24 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 * State representation * 78,689,678 states …Then verify this abstraction next…
25
25 “Circular” Assume-Guarantee Reasoning The two abstracted protocols are coherent The hierarchical protocol is coherent! Simulation Proof Idea (paper under preparation): We can’t do h || r1 || r2 |= coherence so Check-1: h || R1 || R2 |= coherence /\ Lemmas-2 and Check-2: H || r1 || R2 |= coherence /\ Lemmas-1 Helps create abstraction R1 and R2 but checked in Check-2 Helps create abstraction H but checked in Check-1
26
26 Details of our Verification (1a) RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 Inv ?? “ I was not expecting an Inv from you”
27
27 Details of our Verification (1b) RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 Inv ?? “ I was not expecting an Inv from you” !! “ OK I won’t send Inv unless I am in state ”
28
28 Details of our Verification (1c) RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 Inv ?? “ I was not expecting an Inv from you” !! “ OK I won’t send Inv unless I am in state ” ?? “ I know state but what is ?
29
29 Details of our Verification (1d) RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 Inv ?? “ I was not expecting an Inv from you” !! “ OK I won’t send Inv unless I am in state ” ?? “ I know state but what is ? !! “..but this guy knows about and !
30
30 Details of our Verification (1e) RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache L1 Cache L1 Cache RAC L2 Cache RAC L2 Cache Directory Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 Inv ?? “ I was not expecting an Inv from you” !! “ OK I won’t send Inv unless I am in state ” ?? “ I know state but what is ? !! “..but this guy knows about and ! !! “ So let this guy ensure that no Inv unless and !!
31
31 Details of our Verification (2) ------> g a ------> g1 a1 ------> g2 a2 ------> p /\ g1 a1 ------> g2 a2 Added obligation: g => p
32
32 Near-term Work Planned: Build tool support for this “circular” A/G reasoning Apply it to other hierarchical cache coherence protocols (e.g. recent publication in ISCA’06 of a ring interconnect) Handle selective abstraction AND parametric reasoning Main feature of our contribution: A/G reasoning without leaving the familiar environment of an explicit state model checker! (as in Chou FMCAD04 – our work is directly inspired by their work)
33
33 Concluding Remarks 1.Presented infrastructure for aggressive enumerative model checking (BT, Eddy Murphi) 2.Presented infrastructure for symbolic analysis (PAM) 3.Briefly discussed combination (POeM) 4.Presented Hierarchical Cache Coherence protocol under construction; first version done 5.Presented verification plans 6.Have verified a non-trivial hier. cache coherence protocol using our A/G method PLUG: Co-organizing – with John O’Leary - TV06 in Seattle (www.cs.utah.edu/tv06) on threads verificationwww.cs.utah.edu/tv06
34
34 References: 1.Xiaofang Chen and Ganesh Gopalakrishnan, “Bounded Transaction Model Checking,” CAV 2006 tools paper (submitted) 2.Igor Melatti, Robert Palmer, Geof Sawaya, Yu Yang, Robert M. Kirby, and Ganesh Gopalakrishnan, “Parallel and Distributed Model Checking in Eddy,” SPIN 2006 (published) 3.Ritwik Bhattacharya, Steven German, and Ganesh Gopalakrishnan, “Exploiting Symmetry and Transactions for Partial Order Reduction of Rule Based Specifications,” SPIN 2006 (published) Related work: Ching-Tsun Chou, Phanindra K. Mannava, and Seungjoon Park, “A Simple Method for Parameterized Verification of Cache Coherence Protocols,” FMCAD 2004: 382-398FMCAD 2004
35
35 Predicate Abstraction for Murphi (PAM)
36
36 Predicate Abstraction for Murphi (PAM) Using CVC-Lite as a symbolic simulation library for sequential execution, and as a decision procedure –Concrete data structures in Murphi symbolic data structures in CVC-Lite –Predicates in Murphi global Bool variables in abstraction –Concrete transition relation in Murphi symbolic transition relation in abstraction Once the fixed-point is reached in abstraction, the validity of each predicate can be declared.
37
37 Abstraction and Concretization Let –C be concrete states –A be abstract states –{Φ 1, …, Φ n } be predicates Then –Concrete transition relation Rc: C × C Bool –Abstraction: α: C Aα(x) = (Φ 1 (x), …, Φ n (x)) –Concretization γ: A 2 C γ(s) = {x | s=(s 1, …,s n ), Φ i (x) = s i forall i in [1, …,n] }
38
38 Example of Concrete and Abstract Systems Concrete System Abstract System ruleset i:1..3 rule “r1” p:= 0; p:= p+1; p1: (p=q) q := i; q:= q+1; p2: (q=1) (init state) (transition rule) (predicates) {(0,0)} {(0,0)} {(0,1)} {(0,0)} {(0,0), (0,1)} (init states) (successor states) (fixed-points) Claim: Predicate p1 is invalid!
39
39 Mapping: Concrete Transition -> Abstract Transition Rule “test” p = 0 ==> p := p+1; q := p; Endrule; void Abs_Rule(Expr & abs_rule) { Expr guard = vc->eqExpr(vc->recSelectExpr(X, “p”), vc->ratExpr(0)); Expr Z = X; Z = vc->recUpdateExpr(Z, “p”, vc->plusExpr(vc->recSelectExpr(Z, “p”), vc->ratExpr(1))); Z = vc->recUpdateExpr(Z, “q”, vc->recSelectExpr(Z, “p”)); abs_rule = vc->andExpr(guard, vc->eqExpr(Y, Z)); }
40
40 The Complete Abstraction Transition Relation Rc = ((rule_no = 1) Λ abs_rule_1) V … V ((rule_no = m) Λ abs_rule_m) V ((rule_no = 0) Λ (cur_state= nxt_state) Λ (¬guard_1 Λ … Λ ¬guard_m))
41
41 Optimization Decision procedures become slower on larger expressions At any time, only one enabled rules can be fired So Rc can be split, resulting with smaller and faster satisfiability checks
42
42 Experiment Results PAMBakeryAlternatingBitGermanFLASH Before Optim.8 sec240 sec450 sec> 24 hour After Optim.5 sec175 sec150 sec45 min Table 1. Performance of PAM on protocols
43
43 Bounded Transaction Model Checking (BT)
44
44 Bounded Transaction Model Checking Background –The basic unit of activity in many protocols is a transaction – a complete causal cycle of actions In cache coherence protocols, a transaction begins with an agent making a request and ends with a reply being supplied –Traditional BMC cannot guarantee that the right kind and maximal number of interacting transactions are selected under the available resource limits –Hence, Bounded Transaction Model Checking !
45
45 How BT Works? BT only allows a certain number of transactions, chosen from a set of potentially interfering set of transactions, to be explored.
46
46 Current Status of BT Has been implemented for Murphi Is able to obtain the complete set of “pure” transactions in protocols automatically Is able to differentiate “read” and “read exclusive” transactions Is able to hunt down many bugs much quicker May be able to find a way to “finish off” the omitted parts of the search (tried, but in hibernation now…)
47
47 Automatically Determining “pure”Transactions through Concrete Executions (illustration on German prot.) simplify
48
48 Bounding Transactions Bound the num. of rounds of debugging at N In each round, –If the current transaction is “read”-type, only M “read exclusive”-type transactions are allowed to initiated at different stages in the lifetime of the current transaction –For “read-exclusive”-type transactions, both “read”- and “read-exclusive”-type transactions are allowed to interfere with the current one
49
49 Experiment Results of BT Node Data BTMurphi BMC # of states Time (s)Found bug # of statesTime (s) Found bug 3 210681220Yes21327527Yes 5 2404924215Yes3112000821No 8 3389749415Yes18360001726No 10 3361344465Yes15070002417No Node Data BTMurphi BMC # of states Time (s)Found bug # of statesTime (s) Found bug 327985414Yes218700265No 5214044Yes42600046No 8219364Yes16900033No 1024764Yes11600021No Table 3. BT on a buggy German protocol model Table 2. BT on a buggy FLASH protocol model
50
50 Parallel AND Distributed Model Checking using Eddy Murphi
51
51 Motivation for Eddy Murphi (not ) $10k/week on Blue Gene (180 GFLOPS) at IBM’s Deep Computing Lab 136,800 GFLOPS Max
52
52 Parallel Model Checking A parallel AND distributed model checker will exploit ‘multicore’s very well; for example, –One computational thread and one communication thread per dual-core –Message passing between dual-cores Each computation node “owns” a portion of the state space –Each node locally stores and analyzes its own states –Newly generated states which do not belong to the current node are sent to the owner node Standard distributed algorithm may be chosen for termination
53
53 Eddy Algorithm For each node, two threads are used –Worker thread: analyzes, generates and partitions states If there are no states to be visited, it sleeps –Communication thread: repeatedly sends/receives states to/from the other nodes, coalescing states into bigger lines before shipment It also handles termination Communication between the two threads –Via shared memory –Via mutex signals primitives
54
54 Worker ThreadCommunication Thread Hash Consumption Queue Communication Queue Take State Off Consumption Queue Expand State (get new set of states) Make decision about Set of states Receive and process inbound Messages Initiate Isends Check completion of Isends
55
55 The Communication Queue There is one communication queue for each node Each communication queue has N lines and M states per line States additions are made (by the worker thread) only on one active line The other lines may be already full or empty
56
56 The Communication Queue Summing up, this is the evolution of a line status: WTBAActiveWTBSCBS Legend : WTBA: Waiting to become active Active: Active WTBS: Waiting to be sent CBS: Completed being sent
57
57 Eddy-Murphi Performance Tuning of the communication queuing mechanism –High number of states per line is required Much better sending many states at a time –Not too few number of lines Or the worker will not be able to submit new states
58
58 Eddy-Murphi Performances Comparison with previous versions of parallel Murphi –When ported to MPI, old versions of parallel Murphi perform worse than serial Murphi Comparison with serial Murphi; almost linear speedup is expected
59
59 Eddy-Murphi Performance
60
60 Eddy-Murphi Performance
61
61 Designing and Verifying a Hierarchical Cache Coherence Protocol
62
62 Multiple-Chip Multiprocessor
63
63 Designing Hierarchical Cache Coherence Protocols Keep each level of the protocol loosely- coupled Build and verify each level of the protocol separately Combine all level protocols into the full protocol afterwards
64
64 The Prototype Hierarchical Protocol We Designed Two level MESI protocols –Modified (dirty), Exclusive (clean), Shared, Invalid L2 inclusive of L1 Support silent-drop and write-back Support backward-invalidation Use non-FIFO network ordering
65
65 FIFO Network Ordering was Introduced Because of a Livelock Problem
66
66 To Avoid the Livelock Problem Blocking write-back Explicit write-back Ackn Notify Dir of silent-drop only when the agent is –Invalid –Not blocked on write-back –No pending requests
67
67 Sample Scenarios Scen.1. Agent-1 sends ‘Normal_Nak’ to Dir when it has a request pending Scen.2. Agent-1 sends ‘Notify_SD’ to Dir when not blocking, Invld, not pending
68
68 Old slides
69
69 “Circular” Assume-Guarantee Reasoning The two abstracted protocols are coherent The hierarchical protocol is coherent! Simulation Proof Idea (paper under preparation): We can’t do h || r1 || r2 |= coherence so Check-1: h || R1 || R2 |= coherence /\ g1 => p1 and Check-2: H || r1 || R2 |= coherence /\ g2 => p2 p1 helps obtain H by strengthening rule guard g1, changing it to g1 /\ p1 (likewise for p2, g2, R1/R2)
70
70 Original Hierarchical Cache Coherence Protocol
71
71 Keep One CMP, Abstract the Rest of the System
72
72 Summarize Each CMP as One Atomic Agent
73
73 Verifying Hierarchical Cache Coherence Protocols Keep the intra-cluster protocol details, while abstracting the rest of the system Summarize each cluster as an atomic agent Use assume-guarantee reasoning to formally prove that the hierarchical protocol is correct
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.