HELSINKI UNIVERSITY OF TECHNOLOGY *Laboratory for Theoretical Computer Science Helsinki University of Technology **Department of Computing Science University of Newcastle upon Tyne Parallelization of the Petri Net Unfolding Algorithm K.Heljanko*, V.Khomenko**, and M.Koutny**
2 Partial order semantics of Petri nets Alleviate the state space explosion problem Efficient model checking algorithms Motivation
3 Unf places from M 0 pe transitions enabled by M 0 cut-off while pe extract e min pe if e is a cut-off event then cut-off cut-off {e} else add e and its postset into Unf UpdatePotExt(pe, Unf, e) end while add cut-off events and their postsets to Unf The ERV unfolding algorithm
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 P7P7 P8P8 P9P9 T6T6
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P7P7 P8P8 P9P9 T6T6
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 12
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10
T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P1P1 P7P7 P8P8 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10 P9P9 P7P7 P8P8
17 while pe extract appropriate non-empty Sl pe for all e Sl in any order refining do if e is a cut-off event then cut-off cut-off {e} else add e and its postset into Unf UpdatePotExt(pe, Unf, e) end for end while Step 1: Unfolding algorithm with slices
18 Problem 1 The order in which the events are processed may be inconsistent with ! Can be fixed by imposing the constraint: for every e Sl and every f e: f pe\Sl and pe does not contain causal predecessors of f
19 Theorem: Let Pref' and Pref'' be the prefixes of the unfolding of a bounded net system, produced by arbitrary runs of the basic and slicing algorithms respectively. Then Pref' and Pref'' are isomorphic. Correctness
20 Problem 2 How to choose slices to satisfy the imposed condition? For orders refining the McMillan’s adequate order C 1 C 2 |C 1 | < |C 2 | a good choice is to take Sl = { e | [e] = k }, where k = min { |[e]| | e pe }.
21 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P1P1 P7P7 P8P8 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10 P9P9 P7P7 P8P8 Example
22 Step 2: Parallelisation The events in a slice can be inserted into the prefix all together, and their possible extensions can be computed in parallel!
23 Problem 3 The same possible extensions can be computed for several times! T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 T9T9 T4T4
24 Sl Restricting the scope
25 Restricting the scope
26 Restricting the scope
27 Restricting the scope
28 Problem 4 How to get rid of the ordering in the for all loop? for all e Sl in any order refining do if e is a cut-off event then cut-off cut-off {e} else UpdatePotExt(pe, Unf, e) end for If there are no cut-offs in the slice Sl then the order in which the events are processed is irrelevant.
29 Cut-offs “in advance” One can check the cut-off criterion as soon as a new possible extension is computed Advantages: No cut-offs in a slice (fixes Problem 4) The cut-off criterion is checked in UpdatePotExt(pe, Unf, e) – the part of the algorithm which is computed in parallel
30 The queue of possible extensions Can be represented as a sequence Sl 1,Sl 2,Sl 3,… where Sl i contains events whose local configurations have the size i Insertion an event e into the queue is reduced to adding it to the set Sl |[e]| Choosing a slice is reduced to detaching the first non-empty set Sl i from the queue No comparisons of configurations are involved!
31 The total number of comparisons of configurations performed by the parallel algorithm is equal to |E cut |, i.e. there are no redundant comparisons! In contrast, the ERV unfolding algorithm performs O(|E|log|E|) comparisons. Comparisons of configurations
32 Experimental results Processors:234 Speedup: The speedup is real, but not linear due to limited memory bandwidth (“bus contention”) 4 Pentium TM III 500MHz 512K cache processors, 512M 133MHz RAM
33 Conclusions The algorithm is faster even on a uniprocessor The size of slices is usually large, which allows for good parallelization More than 95% of time is spent in the parallel sections of the algorithm Can be efficiently implemented even on distributed memory architectures Linear speedup for most of the examples (in theory) Limited memory bandwidth (“bus contention”)