Post-Silicon Verification under Limited Observability

Slides:



Advertisements
Similar presentations
Test process essentials Riitta Viitamäki,
Advertisements

Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by Intel.
Chapter 4 Quality Assurance in Context
Evaluating Requirements. Outline Brief Review Stakeholder Review Requirements Analysis Summary Activity 1.
Evaluating Requirements
Programming Types of Testing.
On characterizing hardware platforms Ganesh Gopalakrishnan Lecture of , Week 5, CS 5966/6966.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
July 11 th, 2005 Software Engineering with Reusable Components RiSE’s Seminars Sametinger’s book :: Chapters 16, 17 and 18 Fred Durão.
Software Testing and Quality Assurance
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Evaluating Requirements
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Evaluating Requirements
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
1 Post-Silicon Verification under Limited Observability Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT Ching Tsun.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Chapter 6 Requirements Engineering Process.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Requirements Engineering Processes l Processes used to discover, analyse and.
Verification and Validation Overview References: Shach, Object Oriented and Classical Software Engineering Pressman, Software Engineering: a Practitioner’s.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Evaluating Requirements
September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Cooperative Caching in Wireless P2P Networks: Design, Implementation And Evaluation.
FROM THE ESSENCE OF AN ENTERPRISE TOWARDS ENTERPRISE SUPPORTING INFORMATION SYSTEMS Tanja Poletaeva Tutors: Habib Abdulrab Eduard Babkin.
Pepper modifying Sommerville's Book slides
Prepared by: Fatih Kızkun
User Stories > Big and Small
COSC6385 Advanced Computer Architecture
Recent trends in estimation methodologies
THE OSI MODEL By: Omari Dasent.
Extending Model-Driven Engineering in Tango
Introduction to Wireless Sensor Networks
Chapter 8 – Software Testing
CSE 486/586 Distributed Systems Consistency --- 1
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Verification and Validation Overview
Multiprocessor Cache Coherency
runtime verification Brief Overview Grigore Rosu
Rigorous Development Of a Safety-Critical System Based on Coordinated Atomic Actions By Subash M S.
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
Computer Simulation of Networks
Example Cache Coherence Problem
Design for Quality Design for Quality and Safety Design Improvement
Human Complexity of Software
Outline Announcements Fault Tolerance.
Fundamental Test Process
CSE 486/586 Distributed Systems Consistency --- 1
HAPPY NEW YEAR! Lesson 7: If-statements unplugged
Analysis models and design models
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Requirements Engineering Processes
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Software Verification, Validation, and Acceptance Testing
Data-Centric Networking
Speculative execution and storage
Applying Use Cases (Chapters 25,26)
Applying Use Cases (Chapters 25,26)
Dr. Jiacun Wang Department of Software Engineering Monmouth University
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Abstractions for Fault Tolerance
Presentation transcript:

Post-Silicon Verification under Limited Observability Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112 Ching Tsun Chou Intel Corporation, 3600 Juliette Lane, Santa Clara, CA Supported in part by NSF award CCR 0219805

Why Post-Silicon Verification? Why verify the silicon? Isn’t doing FV enough? (!) FV cannot be applied to entire MP systems yet MP systems contain several CPUs and several “chip-sets” We cannot verify the silicon exhaustively - so why bother? Formal analysis applied to particular executions can yield far more insights than ad hoc criteria applied to executions e.g. “Runtime Verification” of software (Havelund, Rosu, Lee ,..)

Why Post-Silicon Verification? Runtime verification can cover more! 1 GHz in silicon instead of 100 Hz during simulation With well-designed “stress tests” one often finds out a lot

Where Post-Si Verification fits in the Hardware Verification Flow Specification Validation Design Verification Testing for Fabrication Faults Post-Silicon Verification Spec product Pre-manufacture Post-manufacture Does functionality match designed behavior?

More Facts about Post-Silicon Verification Post-Si Verification can be for uniprocessor functionality .. or to determine if MP Orderings are being obeyed ... or to check if cache coherence protocols are behaving Directly impacts the time to market The industry spends huge amounts of effort in this area Great opportunities to apply FV

How Formal Methods can enhance Post-Si Verification Reduces manual effort Helps in test-case selection Helps analyze execution results comprehensively

Overview of the talk How the paradigm for post-Si verification must change How Limited Observability impacts post-Si verification The use of Constraints A paper design for a Post-Si verification system based on constraints - based on actual experience developing prototypes in an industrial context Concluding Remarks

Post-Si Verification for Cache Protocol Execution PRESENT-DAY Assume there is a “front-side bus” Record bus transactions in response to test programs Generate detailed cache states from bus transactions See if behavior matches cache coherence protocol that was supposedly realized cpu cpu …. cpu mem “Front-side Bus”

Post-Si Verification for Cache Protocol Execution Future CANNOT Assume there is a “front-side bus” CANNOT Record all link traffic CAN ONLY Generate sets of possible cache states HOW BEST can one match against designed behavior? cpu cpu cpu cpu Invisible “miss” traffic Visible “miss” traffic

Potential Carry-over of Techniques Runtime verification of distributed embedded systems Hundreds of processors, FPGAs, SoCs, ... interacting Cannot assume system will work correctly on its own Must detect onset of crashes, intrusions, ... EARLY Cannot easily observe all the nodes Even if observable, information corrupts - bandwidth limitations (need to compress / discard) - time uncertainties

Back to our specific problem domain... Verify the operation of systems at runtime when we can’t see all transactions Could also be offline analysis of a partial log of activities a b x y c d a x c d y b …

Possible Outcomes of Post-Si Verification Observed Behavior is “Definitely wrong” “Potentially dangerous” (rely on statistics to give this verdict?) “Worth noting” (based on past experience and bug logs?) ….. “Totally benign” (not even worth noting event) Caveat: we are partially observing a potentially incorrect system

Concrete example: Coherence Protocol Verification Requester Home Potential Owners …. req sreq sresp Retries or Completion Direct Supply of Data

Packet encodings, and example trace-file Users All the packets pertaining to a transaction share the same mid and tid Address not shipped with responses sreq Req Home req resp …. req / sreq Pkt_type mid tid sender dest addr data resp Pkt_type mid tid sender dest data A transaction and various packets it may involve: req first-snoop-req subseq-snoop-reqs subseq-snoop-resps Data Completion

The actual trace-file is an interleaving of the packets of all active transactions: Individual transactions and their possible temporal overlap The actual trace-file analyzed looks something like this: The transactions may pertain to the same address (or not); many of the shown events may be missing…

Transaction (packet) semantics: Requester Potential Owners …. p p p p Each packet “p” can only be issued under certain cache-line states After issuing it, the cache-line state often changes After receiving a packet, the cache-line state changes These details are VERY complex, and often need to be extracted from cache protocol tables...

c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 Verification consists of abstract interpretation driven by transaction history: c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 c1 c2 c3 c4 Knowing transaction (packet) semantics, we can compute sets of possible states in which each cache line can be in after each packet goes by ... (well, during offline analysis) . Error is flagged when inconsistency is noted in sets of cache states.

General approach: Know all possible communication patterns of various transactions, and how to record progress along a particular pattern; use constraints to bridge gap. Communication patterns State within comm. pattern

How many of the packets can be invisible How many of the packets can be invisible? At first cut (and based on some practical experience) having one missing in any “causal loop” seems tolerable – more than one appears TOO under-constrained. OK OK OK Not OK

General statements pertaining to invisibility In a “fork/join” situation, how many responses can be invisible? Generally there are invariants governing the responses (e.g., “at most one supplier of the value) If one response is invisible, we can assume it met the invariant -- and remember this to cross-check against future behavior If more than one response is invisible, we will have to increase the space of assumptions If we do not see a response, we have to delay “closing out” the transaction till another pertinent event involving the same address occurs OR

Expected overlap of transactions under proper arbitration Verification of Mutual Exclusion of Resource Usage (proper arbitration): Tr 1 Tr 2 Tr 3 Expected overlap of transactions under proper arbitration Check: Transaction 1 must “close-out” before transactions 2 and 3 are found to make progress Snoop of 1 1 2 3 Possible idea: Assume that the “first snoop request” tells who won the arbitration Problem: What if the first snoop request was on an invisible link?

Approach initially tried Wrote a prototype in Ocaml to analyze given cache protocol execution trace For each new packet read, its corresponding communication pattern and state within communication pattern was determined For each packet, we obtained WP and SP WP : Weakest Precondition (in a sense) The most general set of cache states under which packet could be generated SP ; Strongest Postcondition (in a sense) The tightest set of states the cache could be after the packet is sent Many transaction-types and “conflict situations” made state maintenance and update highly unstructured (about 8 versions of the code were written, with each version becoming soon ugly)

A Conflict Scenario (for example) Potential Owners sreq Requester Home req sresp Retries or Completion …. Requester issues “flush” packet Arbitration conflict at home Packet sent back for re-issue Meanwhile another request gets past home Home sends new request to requester New request “hijacks” flush-line away! Transaction never gets reissued Direct Supply of Data

Constraints to the rescue.... but.... Constraint-programming was viewed as a possible solution Would permit local behavior to be expressed in terms of constraints Constraint formalisms can “solve” for missing information But, traditional constraint frameworks found inadequate After extensive search, we could not find a constraint paradigm that can deal with interacting automata What we need is a method for back-tracing precursors to observed actions When multiple observations trace back to the same precursor, we can ‘vote the precursor up or down’ Conditional probabilities of events are involved in guiding search

Approach being planned for implementation Given a packet, determine comm pattern and state within comm pattern Trace precursors along comm pattern till we reach origin of transaction (which is at a cache where the transaction missed and issued) Determine the cache state for the particular transaction using the WP rule for the packet

Approach being planned for implementation If cache state not previously determined, mark it speculative If cache state previously determined and present WP determines a compatible cache state, convert `speculative’ to committed If previously determined cache state is being contradicted by present WP, mark cache state unknown and trigger backtracing (cancel this precursor computation path and explore another)

Cache Agent that was a “responder” for one transaction may be “originator” for another.... Responder to two different transactions How two precursor computations may lead back in time to a common node and how we will have to “vote” its cache state (red deposits a speculative state - purple votes it up or down...)

Why today’s constraint approaches don’t give these capabilities readily.. Today’s constraint solving approaches (“CSP”) appear to be about “static” situations Various algorithms based on arc consistency and propagators can be found in the literature Temporal Concurrent Constraint Programming is in its infancy (I also don’t know much about these areas... tell me if I’m wrong! But I’ve not seen very much despite intense literature searches...) Constraint Solving in the context of Coupled Reactive Processes can be have multiple uses Environments such as Comet (van Hentenryck) may offer a powerful way to organize such a constraint-based system

Constraint Languages Surveyed (and some evaluated...) GnuProlog Sicstus Prolog Mozart / Oz Erlang FaCile .. or even Murphi perhaps? Reading List (Books / Papers...) Stuckey’s book on Constraint Logic Programming Dechter’s book on Constraints Modeler++ / Localizer++ / Comet Ultimately will roll our own constraint system

Concluding Remarks Limited Observability is going to be a central concern in future system verification Plenty of opportunities for formal methods, constraint-solving methods, and abstract interpretation methods to work in concert Formal Methods communities must talk to other communities to significantly enhance the scope and relevance of what they are doing testing communities diagnosis communities