D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.
Introduction to Openmp & openACC
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Distributed Systems CS
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 7:
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Dependable computing needs pervasive debugging Tim Harris
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Submitters:Vitaly Panor Tal Joffe Instructors:Zvika Guz Koby Gottlieb Software Laboratory Electrical Engineering Faculty Technion, Israel.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Lecture 13: Multiprocessors Kai Bu
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
An Operation Rearrangement Technique for Low-Power VLIW Instruction Fetch Dongkun Shin* and Jihong Kim Computer Architecture Lab School of Computer Science.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Parallel and Distributed Simulation Time Parallel Simulation.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture U of C (ACAG) Department.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
On Partitioning and Symbolic Model Checking FM 2005 Subramanian Iyer, UT-Austin Debashis Sahoo, Stanford E. Allen Emerson, UT-Austin Jawahar Jain, Fujitsu.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Agenda  Quick Review  Finish Introduction  Java Threads.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
1 Advanced course on: Parallel and Distributed Model Checking Lecture 1 – Lecturers: Orna Grumberg, Computer Science Dept, Technion Karen Yorav,
Chapter Overview General Concepts IA-32 Processor Architecture
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Hybrid BDD and All-SAT Method for Model Checking
Xiaodong Wang, Shuang Chen, Jeff Setter,
Processes and Threads Processes and their scheduling
Threads Cannot Be Implemented As a Library
Chapter 4: Multithreaded Programming
INTEL HYPER THREADING TECHNOLOGY
Effective Data-Race Detection for the Kernel
Chapter 4: Threads 羅習五.
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Department of Computer Science University of California, Santa Barbara
Communication and Memory Efficient Parallel Decision Tree Construction
Lecture 1: Parallel Architecture Intro
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
STUDY AND IMPLEMENTATION
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Multiprocessors - Flynn’s taxonomy (1966)
What is Concurrent Programming?
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
Deadlock Detection for Distributed Process Networks
Chapter 4 Multiprocessors
Chapter 01: Introduction
Chapter 2 Operating System Overview
Dynamic Verification of Sequential Consistency
Department of Computer Science University of California, Santa Barbara
Research: Past, Present and Future
Presentation transcript:

A New Reachability Algorithm for Symmetric Multi-processor Architecture D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin D. Dill, Stanford Formal Equivalence and  Assertion-based Verification Workshop 2005

Outline Standard Reachability Analysis Multithreaded Reachability Multithreaded Reachability in SMP machines Engineering Issues Results Conclusion and Future Work

Related Work Parallel Reachability Analysis: Stern and Dill [CAV, 97] Stornetta and Brewer [DAC, 96] Yang, Hallaron [97] Heyman, Geist, Grumberg, Schuster [CAV, 00] Garavel, Mateescu, Smarandache [SPIN, 01] Pixley, Havlicek [03]

Reachability using BDD [Burch et al. : 91] Partitioned Transition Relation Initial State I … … R1 Image computation Tr1 Tri Trn R2 Least Fixed Point Ri

Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4

Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Communicate from 1 -> 3 Communicate from 1 -> 4 Communicate from 1 -> 2 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4

Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Local Fixed Point 2 Communicate from 2 -> 3 Communicate from 2 -> 1 Communicate from 2 -> 4 Local Fixed Point 3 Local Fixed Point 4 Similarly repeat for other partitions

Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Improvements: [Iyer et al. : 03] [Sahoo et al. : 04]

Motivation for Multi-threaded Approach Scheduling Problem Increasing availability of powerful SMP machines Multi-threading is a way of achieving real parallelism in SMP machines

Multi-threaded Reachability [DAC 05] Naïve parallelization Time Advantage: Parallel speedup Catch a bug faster than the sequential version Problems: Not much parallelism

Multi-threaded Reachability [DAC 05] Early Communication Time Advantage: Parallel speedup Finishes the reachability analysis faster Catches bug faster than the naive version Problems: Parallelism could be better

Multi-threaded Reachability [DAC 05] Early Communication and Partial Communication Time Advantage: Parallel speedup Finishes the reachability analysis faster Catches bug faster than the previous versions

Reachability in SMP Architecture Time We find the bugs faster ! Improved parallelism Better parallel speedup

Engineering Issues Thread-safe BDD library Deterministic behavior Smart thread scheduling

Sources of Non-determinism Extensive memory based optimizations Pointer comparisons Hashing based on memory address Solutions: Deterministic Hashing Deterministic comparisons Thread 1 Thread 2 p = malloc (…) p = malloc (…) key = hash(p) if (p > p1) …

Sources of Non-determinism Thread synchronization Solutions Synchronization based on deterministic count Number of ITE operations Number of Sift operations Thread 1 Thread 2 Image #n Image #n+1

Smart Thread Scheduling Each processor has its own cache Thread is assigned to a processor The cache fills up with the thread’s memory usage. The same thread assigned to a different processor after sometime A large number of unnecessary cache miss when the thread use its previously used memory locations Solutions: Bind thread to a processor Leads to suboptimal throughput If the number of threads exceeds the number of processors CPU1 CPU2 Cache1 0x07ffd0 Cache2 Lookup 0x07ffd0 Cachemiss

BDD Performance : CUDD Vs New Ckts BDD Statistics after Reachability Analysis (Static Order) P/F #img #nodes CUDD New Mem (MB) Cache hits Cache collision Time bpb F 10 1.8M 50M 41.0% 90.4% 18.6 61M 88.2% 26.3 eight P 47 79K 6.1M 42.9% 26.2% 0.8 7.5M 1.5 fru32 2 8K 9.2M 34.0% 28.4% 7.9 10.9M 28.9% 8.9 idu32 1 36K 6.6M 28.8% 5.0% 4.2 7.8M 28.7% 7.7% 4.5 usbphy 90K 6.4M 37.7% 16.6% 0.7 17.1%

BDD Performance : CUDD Vs New

Performance : Non-deterministic Vs Deterministic Ckts Verification Time in Sec Non-deterministic Deterministic c1 T/O 227 c2 962 917 c3 809 62 c4 903 161 d1 13 d2 24 30 d3 84 100 d4 38 d5 37

Performance: Cache or Parallelism Ckts Verification Time in Sec Uniprocessor Sequential In 8-way SMP Parallel c1 1570 286 227 d1 125 13 d2 180 39 30 d3 295 130 100 d4 176 60 38

Results on Industrial Circuits Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Naïve Early Comm Early Comm + Partial Comm 1 CPU c1 371 T/O 286 227 c2 3346 1789 1564 93 917 c3 2540 228 62 c4 2236 2084 1174 161 509 d1 6 13 d2 10 11 45 39 30 d3 15 21 23 100 130 d4 60 38 d5 12 16 34 37

Results on public benchmarks Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel 8 CPUs Naïve Early Comm Early Comm + Partial Comm 1 CPU spprod 891 61 53 93 510 440 am2910 T/O 281 122 204 386 356 palu 273 4 9 8 S1269b-1 3635 59 72 60 S1269b-5 2287 55 67 blackjck 1213 470 340 98 70

Results : Gantt charts Real execution traces from our multi-threaded reachability program

Conclusion and Future Work Parallelize the Reachability Multi-threaded Reachability Better results Deterministic behavior Future Work Improve the parallelism further Study cache behavior