Download presentation
Presentation is loading. Please wait.
Published byBeatrice Horn Modified over 9 years ago
2
Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code specifications
3
2 Verification: beyond engine-less cars Recent successes. specifications languages checkers abstractors What’s still missing? ? specifications Drivers wanted.
4
3 So who formulates specifications? Programmers? Probably not. Why they won’t: too busy; Yet another language to learn? specifications aren’t cool. Why they shouldn’t: may misunderstand usage rules. may not know all usage rules. Mining Specifications: Convenience. Like in data mining, discover surprise rules.
5
4 Advantages of mining Exploits the massive programmers’ effort reflected in the code. Programmers resolved many problems: incomplete system requirements. incomplete API documentation. implementation-dependent rules. Want redundancy? (without redundant programming) ask multiple programmers (and vote).
6
5 Our output: a specification x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)
7
6 How do we mine? Underlying premise: Even bad software is debugged enough to show hints of correct behavior. Maxim: Common usage is the correct usage.
8
7 Mining = machine learning Reduce the problem into the well-known problem of learning regular languages. Obstacles: 1. bugs from source code may be learned into specification 2. what is “common” behavior? Solutions: 1. learn from dynamic behavior 2. learn probabilistically learn from traces into probabilistic FSMs
9
8 Input: trace(s) 7 = socket(2, 1, 0); bind(7, 0x400120, 16); listen(7, 5); 8 = accept(7, 0x400200, 0x400240); read(8, 0x400320, 255); write(8, 0x400320, 12); read(8, 0x400320, 255); write(8, 0x400320, 7); close(8); 10 = accept(7, 0x400200, 0x400240); read(10, 0x400320, 255); write(10, 0x400320, 13); close(10); close(7); … x = socket() bind(x) listen(x) y = accept(x) write(y) close(y) close(x) read(y)
10
9 The mining algorithm dynamic execution (traces) trace abstraction usage scenarios (strings) (off-the-shelf) RegExp learner generalized scenarios (probabilistic NFA) extract heavy core (and approve) specification (NFA) dynamic checker dynamic exe. to be checked (trace) OK/bug
11
10 Trace abstraction: 4 challenges Traces interleave useful and useless events. RegExp learner cannot separate them. Specifications must include both temporal and value-flow constraints. RegExp learner only good with temporal constraints. Only some of API calls’ arguments impose “true” dependences. Infeasible to learn value-flow constraints on all arguments. Specifications may impose only partial order. Encoding all legal partial orders would produce a huge FSM.
12
h(_, ) a(, ) d(, ) b(_, ) e( ) Trace abstraction h(3, 5) c(10) a(4, 5) d(4, 7) b(0, 5) f(10) h(8, 11) e(7) f(50) d(15, 1) c(7) a(9, 11) b(6, 7) d(9, 14) f(20) e(7) … h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z) h(_, 5) c(10) a(4, 5) d(4, 7) b(_, 5) f(10) h(_, 11) e(7) f(_) d(_, _) c(7) a(9, 11) b(_, 11) d(9, _) e(_) f(_) … h(_, X) a(Y, X) d(Y, Z) b(_, X) e(Z) h(_, X) a(Y, X) b(_, X) d(Y, Z)
13
12 Attempted to learn and verify two published X Windows rules As of Friday: 1. A timestamp-passing rule learned the rule! (compact: 6 states) bugs in 2 out of 17 programs (ups, e93) 2. SetOwner(x) must be followed by GetSelection(x) failed to learn the rule (small learning set) but bugs in 2 out of 5 programs (xemacs, ups) Preliminary experiments
14
13 Related work Arithmetic pre/post conditions Daikon, Houdini properties orthogonal from us eventually, we may need to include and learn some arithmetic relationships Temporal relationships over calls intrusion detection: [Ghosh et al], [Wagner and Dean] software processes: [Cook and Wolf] error checking: [Engler et al SOSP 2001] lexical and syntactic pattern matching user must write templates (e.g., always follows )
15
14 Ongoing work Mechanize tool. Find more gold.
16
15 Future work Mining Give gold to jewelers. SPIN Vault Verisoft SLAM ESP … ? code specificationsbugs inputs
17
16 Summary Semi-automatically creating well-formend, non- trivial specifications is an important part of the verification tool chain. Contributions: introduced specifications mining phrased it as probabilistic learning from dynamic traces decomposed it into a sequence of subproblems (using an off-the-shelf learner) developed dynamic checker found bugs
18
17 Discussion Expressibility what classes of properties can/should we learn? can we learn more than we can check? can a single-threaded specification avoid race conditions?
19
Backup Slides
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.