1 Advanced course on: Parallel and Distributed Model Checking Lecture 1 – 19.3.02 Lecturers: Orna Grumberg, Computer Science Dept, Technion Karen Yorav,

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Introduction to Formal Methods for SW and HW Development 09: SAT Based Abstraction/Refinement in Model-Checking Roberto Sebastiani Based on work and slides.
Adopt Algorithm for Distributed Constraint Optimization
Traveling Salesperson Problem
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Proofs from SAT Solvers Yeting Ge ACSys NYU Nov
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Chapter 4: Trees Part II - AVL Tree
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
1 A Tutorial on Parallel and Distributed Model Checking Orna Grumberg Computer Science Department Technion, Israel.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Reference: Message Passing Fundamentals.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Computation Engines: BDDs and SAT (part 2) 290N: The Unknown Component Problem Lecture 8.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
Parallel and Distributed Computing in Model Checking Diana DUBU (UVT) Dana PETCU (IeAT, UVT)
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
CSC 211 Data Structures Lecture 13
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
SAT-Based Model Checking Without Unrolling Aaron R. Bradley.
1 Recursive algorithms Recursive solution: solve a smaller version of the problem and combine the smaller solutions. Example: to find the largest element.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
On Partitioning and Symbolic Model Checking FM 2005 Subramanian Iyer, UT-Austin Debashis Sahoo, Stanford E. Allen Emerson, UT-Austin Jawahar Jain, Fujitsu.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Lecture 3: Uninformed Search
The NP class. NP-completeness
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
Overview Parallel Processing Pipelining
Traveling Salesperson Problem
Hybrid BDD and All-SAT Method for Model Checking
Chapter 3: Process Concept
CSE 486/586 Distributed Systems Leader Election
Advanced Computer Networks
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Database Management System
Synthesis for Verification
Parallel Programming By J. H. Wang May 2, 2017.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
PREGEL Data Management in the Cloud
Instructor: Rajeev Alur
The University of Adelaide, School of Computer Science
Parallel Density-based Hybrid Clustering
Parallel Algorithm Design
Chapter 12: Query Processing
Uniformed Search (cont.) Computer Science cpsc322, Lecture 6
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Introduction to Formal Verification
Chapter 3: The Efficiency of Algorithms
Multi-Objective Optimization
Sungho Kang Yonsei University
CSE 486/586 Distributed Systems Leader Election
Lecture 2- Query Processing (continued)
Chapter 3: The Efficiency of Algorithms
Chapter 11 Limitations of Algorithm Power
Chapter 1 Computer System Overview
CENG 351 Data Management and File Structures
Reseeding-based Test Set Embedding with Reduced Test Sequences
Database System Architectures
Model Checking and Its Applications
Chapter5: Synchronous Sequential Logic – Part 3
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

1 Advanced course on: Parallel and Distributed Model Checking Lecture 1 – Lecturers: Orna Grumberg, Computer Science Dept, Technion Karen Yorav, Galileo Technologies TA: Tamir Heyman

2 Topics to be covered: Distributed and parallel algorithms for symbolic reachability and generation of counter example Explicit reachability LTL model checking Slicing the BDDs SAT …….

3 A Scalable Parallel algorithm for reachability analysis of Very Large Circuits Tamir Heyman, Danny Geist, Orna Grumberg, and Assaf Schuster In Conference on Computer Aided Verification (CAV’00)

4 We compute symbolically the set of reachable states on a network of processes with disjoint memory that communicate via message passing Each process runs a reachability algorithm At all times the set of computed states is distributed among the processes Goal: Reducing the space requirements (not necessarily the time)

5 Main Idea The state space is partitioned into slices Each slice is owned by one process Each process runs BFS on its slice When non-owned states are discovered they are sent to the process that owns them

6 Elements of distributed symbolic algorithm Slicing algorithm that partitions the set of states among the processes Load balance algorithm that keeps these sets similar in size during the execution Distributed termination detection Compact BDD representation that can be transferred between processes and allows different variable orders

7 Sequential BSF Reachable = new = InitialStates While (new   ) { next = Image(new) new = next \ reachable reachable = reachable  new }

8 The Parallel algorithm The initial sequential stage BSF is performed by a single process until some threshold is reached The state space is sliced into k slices Each process is informed of: The set of window functions W 1,…,W k Its own slice of reachable Its own slice of new

9 BSF of process id Receive from Single W 1,…,W k, reachable id, new id While (Parterm(new id =  )  true) { next id = Image(new id ) next id = exchange(next id, W 1,…,W k ) new id = next id \ reachable id reachable id = reachable id  new id loadBalance( reachable, W 1,…,W k ) }

10 Exchange procedure of process id Exchange(S, W 1,…,W k ) Res = S  W id For j  {1,…,id-1, id+1,…,k } do Send(j, S  W j ) For j  {1,…,id-1, id+1,…,k } do Res = Res  Recv(j) Return Res

11 Parterm procedure of process id for distributed termination detection Parterm( f ) Res = f For j  {1,…,id-1, id+1,…,k } do Send( j, f ) For j  {1,…,id-1, id+1,…,k } do Res = Res  Recv(j) Return Res

12 The parallel stage requires a coordinator for: Pairing processes for exchange Each process sends the coordinator the list of processes j for which S  W j is not empty. The coordinator pairs processes that need to communicate. Several pairs are allowed to communicate in parallel.

13 Important: Data is transferred directly between the processes and not via the coordinator

14 Coordinator can also be used for: Distributed termination detection Processes send the coordinator the one bit of ( new id =  ) The coordinator informs the processes to stop when all bits are true (sometimes more efficient than the procedure Parterm given above)

15 Load Balance The initial slicing distributes the memory requirements equally among the processes. As more states are discovered, the memory requirements might become unbalanced. Therefore, at the end of each step in the computation of the reachable states a load balance procedure is applied.

16 Load Balance (Cont.) Process i with a small slice sends its slice to process j with a large slice. Process j applies the slicing procedure for k=2 and obtains two new balanced slices. Process j sends process i its new slice

17 Coordinator is used also for: Pairing processes for load balancing Each process informs the coordinator of the size of its slice The coordinator pairs processes with small slices with processes with large slices Disjoint pairs perform load balance in parallel

18 When a pair finishes its load balance procedure it informs the coordinator of their new window functions When all processes finish their load balance the coordinator informs all processes of the new set of window functions

19 The slicing procedure A window function is a Boolean function that characterizes a subset of the state space. A set of window functions W 1,…,W k is complete if W 1  …  W k = 1 A set of window functions is disjoint if W i  W j = 0

20 The slicing procedure Input: a set of states S, given as a BDD Output: A set of complete window functions W 1,…,W k The slices for the processes are obtained by S i = S  W i

21 Problems: When slicing a BDD we loose the sharing that causes BDD to be a compact representation There are too many possible way to slice a set of states. Some heuristics is necessary Goal: Reducing the slice size and preventing duplication

22 Choosing a BDD variable for slicing SelectVar( S,  ) For each BDD variable v compute f v = f  v and f !v = f  !v Choose the variable with the lower cost function

23 Cost( f, v,  ) = (  * MAX ( | f v |, | f !v | ) ) / | f | + ( ( 1 -  ) * ( | f v | + | f !v | ) ) / | f | MAX ( | f v |, | f !v | ) / | f | approximately measures the reduction obtained by the slicing ( | f v | + | f !v | ) ) / | f | approximately measures the duplication (number of shared BDD nodes in f v and f !v )

24 We use adaptive  :  =0 takes into account only the duplication.  =1 takes in account only the reduction Initially the algorithm tries to minimize duplication (  =0 ) while reducing memory requirements below the threshold  ( MAX( | f v |, | f !v | )  | f | -  ) If no such slicing exists,  is gradually increased until MAX( | f v |, | f !v | )  | f | -  is reached

25 We obtained: The size of each slice is below the threshold. This guarantees a nontrivial partition (no | f 1 |  | f 2 | or | f 2 |  0 ) There is as little duplication as possible

26 SelectVar( f,  )  =  = STEP BestVar = the variable with minimal cost(f, v,  ) while (( max( |f  v|, |f  !v| ) > |f| -  )  (   1))  =  +  BestVar = the variable with minimal cost(f, v,  ) return BestVar We set: STEP =min( 0.1, 1/k )  = | f | / k

27 Instead of gradually increasing , it is more efficient to find the best  using binary search In order to slice into k slices we slice into 2 maybe unbalanced slices, then repeatedly slice the larger slice into 2 until k slices are obtained Slicing can be optimized

28 Efficient transfer of BDDs msg2bdd translates a BDD into buffer msg Reduces size of sent messages since pointers are replaced by indices to the buffer (and the buffer size is bound) Allows different BDD variable orders in the sender and receiver Applies the restrict operator w.r.t. the slice of the receiver

29 Efficient transfer of BDDs (Cont.) msg2bdd translates a message into a BDD with the order of the receiver

30 Experimental results On examples for which reachability terminates with one process, adding more processes reduces memory (Number of BDD nodes) On examples for which reachability explode more processes manage to compute more steps of the reachability algorithm

31 Communication was not a bottleneck Time requirements were better in some examples and worse in others better – because BDDs are smaller worse – overhead + no optimizations for improving time have been applied