Sparse Coding for Specification Mining and Error Localization Runtime Verification September 26, 2012 Wenchao Li, Sanjit A. Seshia University of California - Berkeley
Runtime Verification 2012 Assertion-Based Verification 2 Problem: assertions are created manually Simulator Assertions Coverage Tests Circuit/Program Generate stimulus to patch coverage holes Find bugs with assertions “…typically 20% of specifications pass vacuously during the first formal verification runs of a new hardware design…” [IBM Haifa]
Runtime Verification 2012 Error Localization 3 Fatal Error Where? Challenges: Limited observability Long error detection latency Transient and hard-to-reproduce bugs Idea: assertions can provide local observability and correctness checks
Runtime Verification 2012 Related Work Specification Mining: –Programs: single-state invariants, pre-/post-conditions, automata learning, alternating patterns –Circuits: fixed-delay pairs, temporal logic patterns –Require templates Error Localization: –Programs: model checking, predicates –Circuits: instruction footprints, SAT-based, mined assertion-based –Require system model and good observability –Require templates 4 Our technique is template-free and does not require having the system model
Runtime Verification 2012 What can you tell by just observing a trace? Obj1.m1() Obj1.m2() Obj1.m1() Obj2.m1() Cloud Hardware trace Program trace Human interaction/behavior Sensor network Distributed system
Runtime Verification 2012 A Sparse Coding Approach 6 0.8 * * * x 0.8 * * * Key idea: Express each subtrace as a Boolean combination of a few “basis subtraces”– a (sparsity- constrained) Boolean matrix factorization problem Sparsity helps to uncover latent structure of the data
Runtime Verification 2012 Contributions and Outline A new formalism for discovering structure in a trace A definition of the sparsity-constrained Boolean matrix factorization problem and an algorithm for solving it Applications to specification mining and error localization –Does not rely on redefined templates –Simultaneous perform error localization and explanation Outline: Problem formulation Algorithm Error localization and explanation Results 7
Runtime Verification 2012 Problem Formulation = ○ basis coefficient Multiplication as “AND” Addition as “OR” columns are sparse Subtrace
Runtime Verification 2012 Sparsity-Constrained Boolean Factorization 9 C = 2
Runtime Verification 2012 Algorithm Idea Observe that the data matrix X can be viewed as the adjacency matrix for a bipartitie graph. Idea: factorization → biclique cover (biclique ↔ basis subtrace) 10 v u
Runtime Verification 2012 Algorithm Overview Incrementally generate maximal bicliques –Consensus-based algorithm –Extend to a maximal biclique Keep track of closeness to sparsity constraint Heuristically optimize for basis sharing 11 B C D A C E A C DA E C DA E C DA E
Runtime Verification 2012 Y Z X G C D A E F B Algorithm Overview Step 1: start with the set of v-rooted star bicliques Step 2: Pick two stars and form a consensus Step 3: Extend the consensus to a maximal biclique Step 4: Add the biclique to the cover if possible Step 5: update sparsity constraint at the covered nodes 12 B C DAA C E C DA E … C D A E F C D A E F
Runtime Verification 2012 An Arbiter Example 13 A 2-input 2-output arbiter with round-robin scheme p0p …… p1p …… q0q …… q1q …… Sample mined assertions (basis subtrace): Number of subtraces 01 10
Runtime Verification 2012 Error Localization and Explanation …… …… …… ……
Runtime Verification 2012 All subtraces Example Illustration Error localization and explanation (arbiter example): Error traceError subtraceError explanation Alternative error Explanation Space spanned by the learned basis Correct subtraces Error
Runtime Verification 2012 Experimental Results Chip Multiprocessor Router: –Observe 14 control signals –Subtrace width of 2 cycles –Learn the basis from a single error- free trace of 1000 cycles: seconds to obtain 189 basis subtraces from 93 distinct subtraces 16 Error Localization: –Inject a single bit flip at a random cycle for each of 99 error traces –Localize the error to the subtrace (out of 999) where it was injected Comparisons: –Baseline approach (1): hash all distinct subtraces – report error even before an error is injected for the 99 traces –Baseline approach (2): use unit basis – 0% localization –Sparse Coding: 55.6% localization A CMP Router in a NoC
Runtime Verification 2012 Conclusion A template-free assertion miner that can explore embedded patterns in digital circuit traces Effective assertion-mining based error localization and explanation Potential applications to other domains, e.g. programs or distributed systems 17 THANK YOU