Distributed Verification of Multi-threaded C++ Programs Stefan Edelkamp joint work with Damian Sulewski and Shahid Jabbar
Motivation: IO-HSF-SPIN Arrives at the final state Arrives again at same final state Same states in both parts Current state Already seen final state Large jumps due to 2nd heuristic 2.9 TB 20 days 1 node days 3 nodes
Overview Software Checking in StEAM Externalization Virtual Addresses Parallelization
Overview Software Checking in StEAM Externalization Virtual Addresses Parallelization
Software Checking Advantages + Building a model unnecessary + Learning specification language unnecessary + Checking can be done more often Disadvantages - Code has to be executed - Huge number of states - Huge states
StEAM Can check concurrent C++ programs Uses a virtual machine for execution supports BFS, DFS, Best-First, A*, IDA* finds Deadlocks Assertion Violations Segmentation Faults
Objectcode StEAM - Checking a C++ Program igcc Compiler Model checker Virtual Machine char globalChar; int globalBlocksize = 7; int main(){ allocateBlock(blocksize); } void allocateBlock(int size){ void *memBlock; memBlock = (void *) malloc(size); }
StEAM - Interpreting the Object Code char globalChar; int globalBlocksize = 7; int main(){ allocateBlock(blocksize); } void allocateBlock(int size){ void *memBlock; memBlock = (void *) malloc(size); } Register BSS Section Data Section Text Section Stack Memory Pool ICVM Virtual Machine Objectcode
StEAM – Generating States Register BSS Section Data Section Text Section Stack Memory Pool ICVM Virtual Machine StEAM Register BSS Section Data Section Text Section Stack Memory Pool Initial State Register BSS Section Stack Memory Pool State 1 Register BSS Section Data Section Stack State 2
Overview Software Checking in StEAM Externalization Virtual addresses Parallelization
Externalization - Motivation Internal External time problem size
Externalization – Mini States pointer to a state in RAM or on Disk pointer to the predecessor mini state constant size Disk RAM [EJMRS 06]
Externalization – Expanding a State Mini States Secondary Memory Cache Internal Memory
Externalization – Flushing the Cache Mini States Secondary Memory Cache Internal Memory
Externalization – Collapse Compression Register BSS Section Data Section Text Section Stack Memory Pool State CachesFiles on Disk
Overview Software Checking in StEAM Externalization Virtual Addresses Parallelization
Virtual Addresses programs request memory memory assignment done by system moving program between nodes impossible two possible strategies converting the addresses before executing using virtual addresses
Virtual Addresses – Memory Management Stack Stack pointer Text BSS Data Program counter Memory pool 0 RAM real address: x virtual address: y y x, size AVL-Tree Stack pointer
Virtual Addresses - Overhead real virtual nodes time
Overview Software Checking in StEAM Externalization Virtual Addresses Parallelization
Parallelization – Motivation Distributed (Shared) Memory MPI channels/shared RAM communication Sending full states too expensive (if not used for expansion) Exploit externalization DualChannel (Speedup vs. Load Balance) Appropriate State Space Partitioning
Parallelization – Dual Channel Communication
Parallelization – Hash Partitioning Partitioning by hashing full state Problem: Successors often not in same partition high communication overhead Partitioning by hashing partial state, e.g. memory pool Problem: Too many states map to one hash value Load balancing
Parallelization – Incremental Tree Hashing h(3,1) = 3*3+1*9 mod 17= 1 h(1,2) = 1*3+2*9 mod 17 = 4 h(2,2,1,2) = 9 = 6+h(2,1,2)*3^1 = 6+1*3 mod 17 h(2) = 2*3^1 mod 17= 6 h(s) = (Σ i s i 3^i) mod 17 h(1,2,3,1,2,2,1,2) = 4+1*3^2 + 9*3^(2+2) mod 17 = 11 [EM05]
Parallelization – Search Partitioning DFS [Holzman & Bosnacki 2006] Best-First, A* horizontal slicesvertical slices
Parallelization - Hardware Cluster Vision System (PBS) Linux Suse 10.0 MPI via infiniband Files via GBit Ethernet 224 nodes (464 procs), < 15 used AMD Opteron DP 50 (2.4 GHz)
Experiments: 15-Puzzle Partial Hash time nodes speedup
Experiments – Depth-First Slicing 200 Philosophers time processors Top Result: 600 Phils / 6 nodes 97 KB /state Ex Collapse Compression & Distribution 16GB 1.5 GB per node
Experiments - Bath-Tub Effect (50 phils-avg.) Time Size of Depth Layer validates Holzmann & Bosnacki
Experiment - Shared Memory Bakery (pthread) 4 Opteron MP 852 (2.6 GHZ) nodes speedup time
Conclusion Preceeding Work: Full Externalization of States, inIO-HSF-SPIN Constant-Size RAM, e.g. 1.8 GB RAM, 20 days 1 proc, 8 days 4 procs, 2.9TB disk [EJ06], Distribution via (g+h)-Value Problem: Huge & Highly Dynamic States Solution: Mini States as Constant Size Finger Prints of States in RAM for Dual-Channel Communication to combine External and Parallel Search with Memory-Pool, Best-First Slicing Partitioning