HELSINKI UNIVERSITY OF TECHNOLOGY *Laboratory for Theoretical Computer Science Helsinki University of Technology **Department of Computing Science University.

Slides:

Advertisements

Similar presentations

Jeremy S. Bradbury, James R. Cordy, Juergen Dingel, Michel Wermelinger

Advertisements

Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.

Theory of Computer Science - Algorithms

Shortest Violation Traces in Model Checking Based on Petri Net Unfoldings and SAT Victor Khomenko University of Newcastle upon Tyne Supported by IST project.

Problems and Their Classes

Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.

Vered Gafni – Formal Development of Real Time Systems 1 Statecharts Semantics.

1.6 Behavioral Equivalence. 2 Two very important concepts in the study and analysis of programs –Equivalence between programs –Congruence between statements.

A Novel Method For Fast Model Checking Project Report.

Parallel Algorithms Lecture Notes. Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed.

June 9, 2007 Animation of Important Concepts in Parallel Computer Architecture Gambhir, Gehringer & Solihin Animation of Important Concepts in Parallel.

Merged Processes of Petri nets Victor Khomenko Joint work with Alex Kondratyev, Maciej Koutny and Walter Vogler.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

Parallel Processing & Distributed Systems Thoai Nam Chapter 2.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.

Synthesis of Embedded Software Using Free-Choice Petri Nets.

Hardware and Petri nets Partial order methods for analysis and verification of asynchronous circuits.

Resolution of Encoding Conflicts by Signal Insertion and Concurrency Reduction based on STG Unfoldings V. Khomenko, A. Madalinski and A. Yakovlev University.

Behaviour-Preserving Transition Insertions in Unfolding Prefixes

Hash Tables1 Part E Hash Tables  

1 Recap. 2 No. of Processors C.P.I Computational Power Improvement Multiprocessor Uniprocessor.

Branching Processes of High-Level Petri Nets Victor Khomenko and Maciej Koutny University of Newcastle upon Tyne.

Parallel LTL-X Model Checking of High- Level Petri Nets Based on Unfoldings Claus Schröter* and Victor Khomenko** *University of Stuttgart, Germany **University.

Branching Processes of High-Level Petri Nets and Model Checking of Mobile Systems Maciej Koutny School of Computing Science Newcastle University with:

Canonical Prefixes of Petri Net Unfoldings Walter Vogler Universität Augsburg in cooperation with V. Khomenko, M. Koutny (CAV 2002, Acta Informatica 2003)

A New Type of Behaviour- Preserving Transition Insertions in Unfolding Prefixes Victor Khomenko.

Detecting State Coding Conflicts in STGs Using SAT Victor Khomenko, Maciej Koutny, and Alex Yakovlev University of Newcastle upon Tyne.

*Department of Computing Science University of Newcastle upon Tyne **Institut für Informatik, Universität Augsburg Canonical Prefixes of Petri Net Unfoldings.

Derivation of Monotonic Covers for Standard C Implementation Using STG Unfoldings Victor Khomenko.

ECDL 2002 Employing Smart Browsers to Support Flexible Information Presentation in Petri net-based Digital Libraries Unmil P. Karadkar, Richard Furuta.

Merged processes – a new condensed representation of Petri net behaviour V.Khomenko 1, A.Kondratyev 2, M.Koutny 1 and W.Vogler 3 1 University of Newcastle.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.

Performance Evaluation of Parallel Processing. Why Performance?

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Salim Hariri HPDC Laboratory Enhanced General Switch Management Protocol Salim Hariri Department of Electrical and Computer.

Duality between Reading and Writing with Applications to Sorting Jeff Vitter Department of Computer Science Center for Geometric & Biological Computing.

Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.

Comparison Networks Sorting Sorting binary values

Parallelization of the Classic Gram-Schmidt QR-Factorization

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Computer Architecture Lecture 26 Fasih ur Rehman.

A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.

Towards Interoperability Test Generation of Time Dependent Protocols: a Case Study Zhiliang Wang, Jianping Wu, Xia Yin Department of Computer Science Tsinghua.

Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.

Behavioral Comparison of Process Models Based on Canonically Reduced Event Structures Paolo Baldan Marlon Dumas Luciano García Abel Armas.

CONCURRENT SIMULATION: A TUTORIAL Christos G. Cassandras Dept. of Manufacturing Engineering Boston University Boston, MA Scope of.

Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*

Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.

(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.

Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 240 Elementary Data Structures Array Lists Array Lists Dale.

Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.

University of Macau Faculty of Science and Technology Programming Languages Architecture SFTW 241 spring 2004 Class B Group 3.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

ICDCS 2006 Efficient Incremental Optimal Chain Partition of Distributed Program Traces Selma Ikiz Vijay K. Garg Parallel and Distributed Systems Laboratory.

Introduction to distributed systems description relation to practice variables and communication primitives instructions states, actions and programs synchrony.

Structural methods for synthesis of large specifications

University of Technology

Hybrid BDD and All-SAT Method for Model Checking

A DFA with Extended Character-Set for Fast Deep Packet Inspection

Objective of This Course

STUDY AND IMPLEMENTATION

A Simulator to Study Virtual Memory Manager Behavior

Victor Khomenko and Andrey Mokhov

Design and Analysis of Algorithms

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

HELSINKI UNIVERSITY OF TECHNOLOGY *Laboratory for Theoretical Computer Science Helsinki University of Technology **Department of Computing Science University of Newcastle upon Tyne Parallelization of the Petri Net Unfolding Algorithm K.Heljanko*, V.Khomenko**, and M.Koutny**

2  Partial order semantics of Petri nets  Alleviate the state space explosion problem  Efficient model checking algorithms Motivation

3 Unf  places from M 0 pe  transitions enabled by M 0 cut-off   while pe   extract e  min pe if e is a cut-off event then cut-off  cut-off  {e} else add e and its postset into Unf UpdatePotExt(pe, Unf, e) end while add cut-off events and their postsets to Unf The ERV unfolding algorithm

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 P7P7 P8P8 P9P9 T6T6

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P7P7 P8P8 P9P9 T6T6

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 12

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10

T1T1 P3P3 T3T3 P5P5 P2P2 T2T2 P1P1 T5T5 P6P6 T4T4 P4P4 P7P7 P8P8 P9P9 P 11 P 10 P 13 P 14 P 12 T9T9 T7T7 T 10 T6T6 T8T8 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P1P1 P7P7 P8P8 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10 P9P9 P7P7 P8P8

17 while pe   extract appropriate non-empty Sl  pe for all e  Sl in any order refining do if e is a cut-off event then cut-off  cut-off  {e} else add e and its postset into Unf UpdatePotExt(pe, Unf, e) end for end while Step 1: Unfolding algorithm with slices

18 Problem 1 The order in which the events are processed may be inconsistent with ! Can be fixed by imposing the constraint: for every e  Sl and every f e:  f  pe\Sl and  pe does not contain causal predecessors of f

19 Theorem: Let Pref' and Pref'' be the prefixes of the unfolding of a bounded net system, produced by arbitrary runs of the basic and slicing algorithms respectively. Then Pref' and Pref'' are isomorphic. Correctness

20 Problem 2 How to choose slices to satisfy the imposed condition? For orders refining the McMillan’s adequate order C 1 C 2  |C 1 | < |C 2 | a good choice is to take Sl = { e | [e] = k }, where k = min { |[e]| | e  pe }.

21 T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P6P6 T5T5 P1P1 P7P7 P8P8 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 P 14 T 10 P9P9 P7P7 P8P8 Example

22 Step 2: Parallelisation The events in a slice can be inserted into the prefix all together, and their possible extensions can be computed in parallel!

23 Problem 3 The same possible extensions can be computed for several times! T1T1 P1P1 T2T2 T3T3 P2P2 P3P3 P4P4 P5P5 T4T4 P7P7 P8P8 P9P9 T6T6 T7T7 P 10 P 11 T8T8 P 13 P 12 T9T9 T9T9 T4T4

24 Sl Restricting the scope

25 Restricting the scope

26 Restricting the scope

27 Restricting the scope

28 Problem 4 How to get rid of the ordering in the for all loop? for all e  Sl in any order refining do if e is a cut-off event then cut-off  cut-off  {e} else UpdatePotExt(pe, Unf, e) end for If there are no cut-offs in the slice Sl then the order in which the events are processed is irrelevant.

29 Cut-offs “in advance” One can check the cut-off criterion as soon as a new possible extension is computed Advantages:  No cut-offs in a slice (fixes Problem 4)  The cut-off criterion is checked in UpdatePotExt(pe, Unf, e) – the part of the algorithm which is computed in parallel

30 The queue of possible extensions  Can be represented as a sequence Sl 1,Sl 2,Sl 3,… where Sl i contains events whose local configurations have the size i  Insertion an event e into the queue is reduced to adding it to the set Sl |[e]|  Choosing a slice is reduced to detaching the first non-empty set Sl i from the queue No comparisons of configurations are involved!

31 The total number of comparisons of configurations performed by the parallel algorithm is equal to |E cut |, i.e. there are no redundant comparisons! In contrast, the ERV unfolding algorithm performs O(|E|log|E|) comparisons. Comparisons of configurations

32 Experimental results Processors:234 Speedup: The speedup is real, but not linear due to limited memory bandwidth (“bus contention”) 4  Pentium TM III 500MHz 512K cache processors, 512M 133MHz RAM

33 Conclusions The algorithm is faster even on a uniprocessor The size of slices is usually large, which allows for good parallelization More than 95% of time is spent in the parallel sections of the algorithm Can be efficiently implemented even on distributed memory architectures Linear speedup for most of the examples (in theory)  Limited memory bandwidth (“bus contention”)