Download presentation
Presentation is loading. Please wait.
1
Applying Model Checking To Large Programs Madan Musuvathi Microsoft Research
2
The Model Checking Problem –A system model S –A property P –Check if S satisfies P
3
The Model Checking Problem –A system model S –An environment E –A property P –Check if S in E satisfies P
4
In Previous Lectures –A system model S –An environment E –A property P –Check if S in E satisfies P Assume Given Mighty hard stuff
5
When Applied to Large Systems –A system model S –An environment E –A property P –Check if S in E satisfies P Even this is challenging! Try the simple st thing that works!
6
Model Checking : An Engineer's View Given a system and its environment Expose nondeterminism –Environment nondeterminism : inputs, timers, events –Internal nondeterminism: arising from abstractions Systematically explore all states of the system Do this exploration intelligently If lucky, you find a bug If luckier, you verify the system
7
Explicit State Model Checking Explicitly generate the individual states Systematically explore the state space –State space: Graph that captures all behaviors Model checking == Graph search Generate the state space graph "on-the-fly" –State space is typically much larger than the reachable set of states
8
Guarded Transition System System = State + Transitions –Readily models event-driven systems State{ int x; } Init: {x = 0;} // Transitions Trans1(){ if (x < 3) x' = x + 1; } Trans2(){ if (x == 3) x' = 0; }
9
The Algorithm Hashtable states_seen; Queue pending; insert init_state into pending; while(pending is not empty){ current = pending.remove(); for each enabled transition T { restore_state(current); execute transition T successor = save_state(); if(successor in states_seen) continue; check successor for correctness; insert successor into pending queue; }
10
How to write a model checker in an hour Specify the system and the environment as a class –State = member fields –Transitions = member functions –Each member function has a Boolean guard function Capturing state : provide serialization functions –GetState() returns the state in a buffer –SetState() copies the state from a buffer Implement the search algorithm
11
State Explosion Problem Simple descriptions result in (very) large state spaces State space reduction techniques –Identify behaviorally equivalent states –Process symmetry reduction –Heap symmetry reduction –Identify behaviorally equivalent transition orderings –Partial-order reduction
12
How to write a model checker in a week Specify the system and the environment as a class –State = member fields –Transitions = member functions –Each member function has a Boolean guard function Capturing state : provide serialization functions –GetState() returns the state in a buffer –SetState() copies the state from a buffer Implement the search algorithm –Implement some state space reduction techniques
13
Practical Challenges Reduce manual intervention –How to specify the system? –What is the environment? Guarantees –Soundness –If the tool terminates without finding a bug (of a certain type), then the program has no bugs –Preciseness –If the tool reports an error, then it is indeed a real error Orthogonal to the difficulty of model checking algorithms
14
Specifying the Model Conventional model checkers require an intermediate description (or "model") –Describes the system at a high level –Throws away implementation details Good for checking designs, rather than implementations –Success stories: hardware, cache-coherence protocols Problems Specifying a model is HARD for large systems –As the system evolves model has to be updated –What you check is not what you run! –Manual errors can miss or introduce errors
15
Automatically Extract the Model Statically analyze the code to generate a model –Models usually mimic the implementation Rule "PI Local Get (Put)" 1:Cache.State = Invalid & ! Cache.Wait 2: & ! DH.Pending 3: & ! DH.Dirty ==> Begin 4: Assert !DH.Local; 5: DH.Local := true; 6: CC_Put(Home, Memory); EndRule; FLASH Murphi model void PILocalGet(void) { //... Boilerplate setup 2 if (!hl.Pending) { 3 if (!hl.Dirty) { 4! // ASSERT(hl.Local);... 6 PI_SEND(F_DATA, F_FREE, F_SWAP, F_NOWAIT, F_DEC, 1); 5 hl.Local = 1;
16
Automatic Extraction FeaVer : C program -> Promela (SPIN) model –User provided patterns to extract features Bandera: Java -> Bandera model –Sophisticated property-driven slicing techniques –Can throw away unrelated parts, if applicable Problems –Not all primitives are available in the modeling language –Pointers, dynamic object creation, dynamic threads, exceptions –A precise-enough slice could be as large as the program iteself
17
Code as the model Directly execute the code Pioneered by Verisoft –State-less model checking Explicit model checkers –Java Path Finder (Java) –CMC (C/C++) State space can be infinite (or very large) –Try exploring as much behaviors as possible –Focus on precision
18
Model Checking == Testing ? Almost! Systematic exploration of nondeterminism –Testing = random walks in the state space –Model checking = systematic graph search Forces the user to expose more nondeterminism –A call to malloc() can fail, a packet can get lost State space reduction techniques identify redundant tests
19
Specifying the System Similar to building a unit-test framework Extract the code to be checked Provide an environment model –Includes entities that the implementation interacts with –Calls to libraries, network, timers manual input Code + environment is a closed system –An executable that you can run Provide correctness properties
20
Identify the Transitions Transition is a code execution between two non- deterministic choices –Atomic execution of a thread between two schedule points –Execution of an event handler Model checker should get control at these choice points
21
Capturing the State State of the program is captured by global variables, stack, heap, and registers Need a way to capture the state of the environment model
22
Backtracking Physically reset the state to an older version –Java Pathfinder, CMC Go to the initial state and reexecute –Fork a separate process at initial state (Verisoft) –Some systems have a natural 'reset' –Unload and reload a driver –Reformat the disk
23
Experience with CMC Three AODV implementations –35 implementation bugs, 1 specification bug Linux TCP –4 bugs, 90% protocol coverage Three Linux filesystems –32 bugs in total –10 serious ones (such as deleting "/")
24
Environment Problem Where to separate the system and the environment Need a faithful abstraction of the environment –Enough nondeterminism to trigger interesting behaviors in the system –Not too much nondeterminism to trigger false behaviors An Example –System: Linux TCP implementation –Environment: Kernel, network (driver + hardware), …
25
Extracting Linux TCP from the Kernel Conventional wisdom: –Extract TCP along a minimal, narrow interface –Minimizes the model state –Provide a ‘kernel library’ –Implements stubs for all kernel functions TCP requires Never worked! –The narrowest interfaces still had ~150 interface fns –These interfaces are not documented –Errors in stubs can cause subtle but false errors –Model checkers are good in finding subtle errors! –Errors in stubs can miss errors 25
26
Solution (hard learned) : –Extract along well-defined interfaces –Minimize errors in stub implementations –These interfaces change infrequently –Do so even if it stresses model checking Well defined interfaces around TCP –The system call interface (kernel & user processes) –The hardware abstraction layer (kernel & hardware) Extracting at these two interfaces –Forces CMC to run the entire Linux kernel 26 Extracting Linux TCP from the Kernel
27
Running the Entire Kernel in CMC Linux kernel has to run in user space –Has been done before (UML : User Mode Linux) CMC needs to handle much larger states –Approximately 300 kilobytes –Incremental states in effect extract TCP relevant state A larger state space –Restrict the environment to trigger TCP events only Compensated by the ease of environment model generation Approach not possible when model checking with an intermediate description 27
28
Specifying Properties Assertion in the code –Trigger automatically as we are running the code Heap related errors –Build your own memory allocator –Check for leaks, double-free Purify-style dynamic techniques –Reading uninitialized variables, access after free Checking for resource leaks –Check if you reached the initial state if you should have –Identify idempotent sequences –CreateFile(A) followed by DeleteFile(A)
29
Some properties are hard to specify Real systems have ambigous / incomplete specifications –TCP congestion control should does not use up "too much " network bandwidth –A file system should not lose files –Difficult to check in the presence of crashes Identify properties that are easy to check –A file system is in a bad state if its own fsck() cannot recover from it
30
State Space Reduction Techniques 1. Downscaling 2. Hash Compaction 3. Identifying State Symmetries 30
31
Downscaling Check smaller versions of the model Example –Run with only 3-4 nodes in the network –Send just 3 data packets Find bugs involving complex interactions in smaller instances Potentially miss bugs present only in larger instances 31
32
Hash Compaction Compact states in the hash table [Stern, 1995] –Compute a signature for each state –Only store the signature in the hashtable Signature is computed incrementally –Partial signature cached at each page Might miss errors due to collisions Orders of magnitude memory savings –Compact 100 kilobyte state to 4-8 bytes Possible to search ~10 million states 32
33
Explore one out of a (large) set of equivalent states Canonicalize states before hashing State Symmetries State transformations can be approximate But, use the original state for further state exploration Thus, approximations do not generate false errors! Current State Canonical State Hash table Hash Signature Successor States 33
34
Heap Canonicalization Heap objects can be allocated in different order –Depends on the order events happen Relocate heap objects to a unique representation state1 state2Canonical Representation Essentially: Find a canonical representation for each heap graph By abstracting the concrete values of pointers 34
35
Heap Canonicalization Algorithm Basic algorithm [Iosif 01] –Do a deterministic graph traversal of the heap (bfs / dfs) –Relocate objects in the order visited CMC extensions: 1. How to do it incrementally? –Should not traverse the entire heap in every transition 2. How to do it for C objects? –Type information is not available at run time 35
36
Iosif’s Canonicalization Algorithm Do a deterministic graph traversal of the heap (bfs / dfs) Relocate objects to a canonical location – Determined by the dfs (or bfs) number of the object Hash the resulting heap c r s ayxcayx r s 0246 26 Heap Canonical Heap
37
c Two Linked List Example r s ayxc r s ayxcayx r s 0246 26 bayx r s 0246 bc 8 Transition: Insert b HeapCanonical Heap y Partial hash values
38
A Much Larger Example : Linux Kernel Core OS Network File- system Core OSNetworkFilesystem An object insertion hereAffects the canonical location of objects here HeapCanonical Heap p p
39
Incremental Heap Canonicalization Access Chain : – A path from the root to an object in the heap Bfs Access Chain: – Shortest of all access paths – Break ties lexicographically – Note: Bfs access chain is a shortest path from a global variable Canonical location of an object is a function of its bfs access chain r c ab hg gf Access chain of c f Bfs access chain of c
40
Revisiting Two Linked Lists Example c r s ayxc r s ayxcayx r s 0246 26 cayx r s 0246 bb 8 HeapCanonical Heap 0 4 2 6 8 Relocation Function Table r,s are root vars n is the next field
41
And on the much larger example Core OS Network File- system Core OSFilesystem p p Canonical location of p does not change Unless its Bfs Access Chain changes For small changes to the graph Shortest path of most objects remains the same Core OS’Filesystem’ HeapCanonical Heap Changes here do not affect the canonical location of p
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.