1 MPI Verification Ganesh Gopalakrishnan and Robert M. Kirby Students Yu Yang, Sarvani Vakkalanka, Guodong Li, Subodh Sharma, Anh Vo, Michael DeLisi, Geof.

Slides:

Advertisements

Similar presentations

Demo of ISP Eclipse GUI Command-line Options Set-up Audience with LiveDVD About 30 minutes – by Ganesh 1.

Advertisements

Openflow App Testing Chao SHI, Stephen Duraski. Motivation Network is still a complex stuff ! o Distributed mechanism o Complex protocol o Large state.

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.

Message Passing: Formalization, Dynamic Verification Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112, USA based.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Illustration of ISP for each bug-class MPI Happens-before: how MPI supports Out-of-order execution The POE algorithm of ISP explained using MPI happens-before.

Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Chapter 4 Quality Assurance in Context

EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS Dissertation Defense Sarvani Vakkalanka Committee: Prof. Ganesh Gopalakrishnan (advisor),

1 Semantics Driven Dynamic Partial-order Reduction of MPI-based Parallel Programs Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done.

Practical Formal Verification of MPI and Thread Programs Sarvani Vakkalanka Anh Vo* Michael DeLisi Sriram Aananthakrishnan Alan Humphrey Christopher Derrick.

Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.

Scheduling Considerations for building Dynamic Verification Tools for MPI Sarvani Vakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School.

Testing: Who 3, What 4, Why 1, When 2, How 5 Lian Yu, Peking U. Michal Young, U. Oregon.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

1 An Approach to Formalization and Analysis of Message Passing Libraries Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done at the.

1 Distributed Dynamic Partial Order Reduction based Verification of Threaded Software Yu Yang (PhD student; summer intern at CBL) Xiaofang Chen (PhD student;

Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.

1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi, Anh Vo, Sarvani Vakkalanka, Subodh.

Partial Order Reduction for Scalable Testing of SystemC TLM Designs Sudipta Kundu, University of California, San Diego Malay Ganai, NEC Laboratories America.

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Practical Model-Checking Method For Verifying Correctness of MPI.

The Problem  Rigorous descriptions for widely used APIs essential  Informal documents / Experiments not a substitute Goals / Benefits  Define MPI rigorously.

1 Multicores viewed from a correctness perspective Ganesh Gopalakrishnan.

1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

Utah Verifier Group Research Overview Robert Palmer.

Introduction In the process of writing or optimizing High Performance Computing software, mostly using MPI these days, designers can inadvertently introduce.

The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,

1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.

This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.

An Introduction to MBT  what, why and when 张坚

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.

Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,

Scientific Computing By: Fatima Hallak To: Dr. Guy Tel-Zur.

Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.

Games Development 2 Concurrent Programming CO3301 Week 9.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.

The shift from sequential to parallel and distributed computing is of fundamental importance for the advancement of computing practices. Unfortunately,

Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

CS265: Dynamic Partial Order Reduction Koushik Sen UC Berkeley.

Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.

Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois

September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,

Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

Symbolic Model Checking of Software Nishant Sinha with Edmund Clarke, Flavio Lerda, Michael Theobald Carnegie Mellon University.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Atomic Operations in Hardware

Optimization Code Optimization ©SoftMoore Consulting.

runtime verification Brief Overview Grigore Rosu

Indranil Roy High Performance Computing (HPC) group

Lecture 5: GPU Compute Architecture

Amir Kamil and Katherine Yelick

L21: Putting it together: Tree Search (Ch. 6)

CSCI1600: Embedded and Real Time Software

Lecture 5: GPU Compute Architecture for the last time

CSE 303 Concepts and Tools for Software Development

Reachability testing for concurrent programs

Introduction Previous work Test Suite Minimization

Amir Kamil and Katherine Yelick

CSCI1600: Embedded and Real Time Software

Presentation transcript:

1 MPI Verification Ganesh Gopalakrishnan and Robert M. Kirby Students Yu Yang, Sarvani Vakkalanka, Guodong Li, Subodh Sharma, Anh Vo, Michael DeLisi, Geof Sawaya ( School of Computing University of Utah Supported by: Microsoft HPC Institutes NSF CNS

2 “MPI Verification” or How to exhaustively verify MPI programs without the pain of model building and considering only “relevant interleavings”

3 Computing is at an inflection point (photo courtesy of Intel)

4 Our work pertains to these: l MPI programs l MPI libraries l Shared Memory Threads based on Locks

5 Name of the Game: Progress Through Precision 1. Precision in Understanding 2. Precision in Modeling 3. Precision in Analysis 4. Doing Modeling and Analysis with Low Cost

6 1. Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier Will P1’s Send Match P2’s Receive ?

7 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier It will ! Here is the animation

8 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

9 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

10 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

11 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

12 Need for Precision in Understanding: The “crooked barrier” quiz P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

13 Would you rather explain each conceivable situation in a large API with an elaborate “bee dance” and informal English…. or would you rather specify it mathematically and let the user calculate the outcomes? P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

14 TLA+ Spec of MPI_Wait (Slide 1/2)

15 TLA+ Spec of MPI_Wait (Slide 2/2)

16 Executable Formal Specification can help validate our understanding of MPI … 6/22/2015 TLA+ MPI Library Model TLA+ MPI Library Model TLA+ Prog. Model MPIC Program Model Visual Studio 2005 Phoenix Compiler TLC Model Checker MPIC Model Checker Verification Environment MPIC IR FMICS 07 PADTAD 07

17 The Histrionics of FV for HPC (1)

18 The Histrionics of FV for HPC (2)

19 Error-trace Visualization in VisualStudio

20 2. Precision in Modeling: The “Byte-range Locking Protocol” Challenge Asked to see if new protocol using MPI 1-sided was OK… lock_acquire (start, end) { Stage 1 1 val[0] = 1; /* flag */ val[1] = start; val[2] = end; 2 while(1) { 3 lock_win 4 place val in win 5 get values of other processes from win 6 unlock_win 7 for all i, if (Pi conflicts with my range) 8 conflict = 1; Stage 2 9 if(conflict) { 10 val[0] = 0 11 lock_win 12 place val in win 13 unlock_win 14 MPI_Recv(ANY_SOURCE) 15 } 16 else{ 17 /* lock is acquired */ 18 break; 19 } 20 }//end while P0P0 P1P1 flag start end

21 Precision in Modeling: The “Byte-range Locking Protocol” Challenge P0P0 P1P1 l Studied code l Wrote Promela Verification Model (a week) l Applied the SPIN Model Checker l Found Two Deadlocks Previously Unknown l Wrote Paper (EuroPVM / MPI 2006) with Thakur and Gropp – won one of the three best-paper awards l With new insight, Designed Correct AND Faster Protocol ! l Still, we felt lucky … what if we had missed the error while hand- modeling l Also hand-modeling was NO FUN – how about running the real MPI code “cleverly” ?

22 Measurement under Low Contention

23 Measurement under High Contention

24 4. Modeling and Analysis with Reduced Cost… 0: 1: 2: 3: 4: 5: Card Deck 0 Card Deck 1 0: 1: 2: 3: 4: 5: Only the interleavings of the red cards matter So don’t try all riffle-shuffles (12!) / (6!) (6!) = 924 Instead just try TWO shuffles of the decks !!

25 What works for cards works for MPI (and for PThreads also) !! 0: MPI_Init 1: MPI_Win_lock 2: MPI_Accumulate 3: MPI_Win_unlock 4: MPI_Barrier 5: MPI_Finalize P0 (owner of window) P1 (non-owner of window) 0: MPI_Init 1: MPI_Win_lock 2: MPI_Accumulate 3: MPI_Win_unlock 4: MPI_Barrier 5: MPI_Finalize These are the dependent operations 504 interleavings without POR in this example 2 interleavings with POR !!

26 4. Modeling and Analysis with Reduced Cost The “Byte-range Locking Protocol” Challenge P0P0 P1P1 l Studied code  DID NOT STUDY CODE l Wrote Promela Verification Model (a week)  NO MODELING l Applied the SPIN Model Checker  NEW ISP VERIFIER l Found Two Deadlocks Previously Unknown  FOUND SAME! l Wrote Paper (EuroPVM / MPI 2007) with Thakur and Gropp – won one of the three best-paper awards  DID NOT WIN  l Still, we felt lucky … what if we had missed the error while hand- modeling  NO NEED TO FEEL LUCKY (NO LOST INTERLEAVING – but also did not foolishly do ALL interleavings) l Also hand-modeling was NO FUN – how about running the real MPI code “cleverly” ?  DIRECT RUNNING WAS FUN

27 3. Precision in Analysis The “crooked barrier” quiz again … P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier Our Cluster NEVER gave us the P0 to P2 match !!! Elusive Interleavings !! Bites you the hardest when you port to new platform !!

28 3. Precision in Analysis The “crooked barrier” quiz again … P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier SOLVED!! Using the new POE Algorithm Partial Order Reduction in the presence of Out of Order Operations and Elusive Interleavings

29 Precision in Analysis P0P0 P1P1 l POE Works Great (all 41 Umpire Test-Suites Run) l No need to “pad” delay statements to jiggle schedule and force “the other” interleaving – This is a very brittle trick anyway! l Prelim Version Under Submission – Detailed Version for EuroPVM… l Jitterbug uses this approach – We don’t need it l Siegel (MPI_SPIN): Modeling effort l Marmot : Different Coverage Guarantees..

30 1-4: Finally! Precision and Low Cost in Modeling and Analysis, taking advantage of MPI semantics (in our heads…) P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier This is how POE does it

31 Discover All Potential Senders by Collecting (but not issuing) operations at runtime… P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( ANY ) MPI_Barrier

32 Rewrite “ANY” to ALL POTENTIAL SENDERS P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( P0 ) MPI_Barrier

33 Rewrite “ANY” to ALL POTENTIAL SENDERS P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( P1 ) MPI_Barrier

34 Recurse over all such configurations ! P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( P1 ) MPI_Barrier

35 If we now have P0-P2 doing this, and P3-5 doing the same computation between themselves, no need to interleave these groups… P0 --- MPI_Isend ( P2 ) MPI_Barrier P1 --- MPI_Barrier MPI_Isend( P2 ) P2 --- MPI_Irecv ( * ) MPI_Barrier P3 --- MPI_Isend ( P5 ) MPI_Barrier P4 --- MPI_Barrier MPI_Isend( P5 ) P5 --- MPI_Irecv ( * ) MPI_Barrier

36 Why is all this worth doing ? P0P0 P1P1

37 MPI is the de-facto standard for programming cluster machines Our focus: Help Eliminate Concurrency Bugs from HPC Programs Apply similar techniques for other APIs also (e.g. PThreads, OpenMP) (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) 

38 The success of MPI (Courtesy of Al Geist, EuroPVM / MPI 2007)

39 The Need for Formal Semantics for MPI – Send – Receive – Send / Receive – Send / Receive / Replace – Broadcast – Barrier – Reduce – Rendezvous mode – Blocking mode – Non-blocking mode – Reliance on system buffering – User-attached buffering – Restarts/Cancels of MPI Operations – Non Wildcard receives – Wildcard receives – Tag matching – Communication spaces An MPI program is an interesting (and legal) combination of elements from these spaces

40 The core count rises but the number of pins on a socket is fixed. This accelerates the decrease in the bytes/flops ratio per socket. The bandwidth to memory (per core) decreases The bandwidth to interconnect (per core) decreases The bandwidth to disk (per core) decreases MPI Library Implementations Would Also Change Multi-core – how it affects MPI (Courtesy, Al Geist) Need Formal Semantics for MPI, because we can’t imitate any existing implementation…

41 l Look for commonly committed mistakes automatically – Deadlocks – Communication Races – Resource Leaks We are only after “low hanging” bugs…

42 Deadlock pattern… 6/22/2015 P0P1--- s(P1);s(P0); r(P1);r(P0); P0P1--- Bcast;Barrier; Barrier;Bcast;

43 Communication Race Pattern… 6/22/2015 P0P1P r(*);s(P0);s(P0); r(P1); P0P1P r(*);s(P0);s(P0); r(P1); OK NOK

44 Resource Leak Pattern… 6/22/2015 P0 --- some_allocation_op(&handle); FORGOTTEN DEALLOC !!

45 Bugs are hidden within huge state-spaces… 6/22/2015

46 Partial Order Reduction Illustrated… l With 3 processes, the size of an interleaved state space is p s =27 l Partial-order reduction explores representative sequences from each equivalence class l Delays the execution of independent transitions l In this example, it is possible to “get away” with 7 states (one interleaving) 6/22/2015

47 A Deadlock Example… (off by one  deadlock) // Add-up integrals calculated by each process if (my_rank == 0) { total = integral; for (source = 0 ; source < p; source++) { MPI_Recv(&integral, 1, MPI_FLOAT,source, tag, MPI_COMM_WORLD, &status); total = total + integral; } } else { MPI_Send(&integral, 1, MPI_FLOAT, dest, tag, MPI_COMM_WORLD); } 6/22/2015 p1:to 0p2:to 0p3:to 0 p0:fr 0p0:fr 1p0:fr 2

48 Organization of ISP MPI Program Simplified MPI Program Simplified MPI Program Simplifications Actual MPI Library and Runtime executable Proc 1 Proc n scheduler request/permit compile PMPI calls

49 Summary (have posters for each) l Formal Semantics for a large subset of MPI 2.0 – Executable semantics for about 150 MPI 2.0 functions – User interactions through VisualStudio API l Direct execution of user MPI programs to find issues – Downscale code, remove data that does not affect control, etc – New Partial Order Reduction Algorithm »Explores only Relevant Interleavings – User can insert barriers to contain complexity »New Vector-Clock algorithm determines if barriers are safe – Errors detected »Deadlocks »Communication races »Resource leaks l Direct execution of PThread programs to find issues – Adaptation of Dynamic Partial Order Reduction reduces interleavings – Parallel implementation – scales linearly

50 Also built POR explorer for C / Pthreads programs, called “Inspect” Multithreaded C/C++ program instrumented program instrumentation Thread library wrapper compile executable thread 1 thread n scheduler request/permit

51 Dynamic POR is almost a “must” ! ( Dynamic POR as in Flanagan and Godefroid, POPL 2005)

52 Why Dynamic POR ? a[ j ]++ a[ k ]-- Ample Set depends on whether j == k Can be very difficult to determine statically Can determine dynamically

53 Why Dynamic POR ? The notion of action dependence (crucial to POR methods) is a function of the execution

54 Computation of “ample” sets in Static POR versus in DPOR Ample determined using “local” criteria Current State Next move of Red process Nearest Dependent Transition Looking Back Add Red Process to “Backtrack Set” This builds the Ample set incrementally based on observed dependencies Blue is in “Done” set { BT }, { Done }

55 l We target C/C++ PThread Programs l Instrument the given program (largely automated) l Run the concurrent program “till the end” l Record interleaving variants while advancing l When # recorded backtrack points reaches a soft limit, spill work to other nodes l In one larger example, a 11-hour run was finished in 11 minutes using 64 nodes l Heuristic to avoid recomputations was essential for speed-up. l First known distributed DPOR Putting it all together …

56 A Simple DPOR Example {}, {} t0: lock(t) unlock(t) t1: lock(t) unlock(t) t2: lock(t) unlock(t)

57 For this example, all the paths explored during DPOR For others, it will be a proper subset

58 Idea for parallelization: Explore computations from the backtrack set in other processes. “Embarrassingly Parallel” – it seems so, anyway !

59 Request unloading idle node id work description report result load balancer We then devised a work-distribution scheme…

60 Speedup on aget

61 Speedup on bbuf

62 Historical Note l Model Checking – Proposed in 1981 – 2007 ACM Turing Award for Clarke, Emerson, and Sifakis l Bug discovery facilitated by – The creation of simplified models – Exhaustively checking the models »Exploring only relevant interleavings

63 Looking ahead… Plans for one year out…

64 Finish tool implementation for MPI and others… l Static Analysis to reduce some cost l Inserting Barriers (to contain cost) using new vector- clocking algorithm for MPI l Demonstrate on meaningful apps (e.g. Parmetis) l Plug into MS VisualStudio l Development of PThread (“Inspect”) tool with same capabilities l Evolving these tools to Transaction Memory, Microsoft TPL, OpenMP, …

65 Thanks Microsoft ! and Dennis Crain, Shahrokh Mortazavi In these times of unpredictable NSF funding, the HPC Institute Program made it possible for us to produce a great cadre of Formal Verification Engineers Robert Palmer (PhD – to join Microsoft soon), Sonjong Hwang (MS), Steve Barrus (BS), Salman Pervez (MS) Yu Yang (PhD), Sarvani Vakkalanka (PhD), Guodong Li (PhD), Subodh Sharma (PhD), Anh Vo (PhD), Michael DeLisi (BS/MS), Geof Sawaya (BS) ( Microsoft HPC Institutes NSF CNS

66 Extra Slides

67 Looking Further Ahead: Need to clear “idea log-jam in multi-core computing…” “There isn’t such a thing as Republican clean air or Democratic clean air. We all breathe the same air.” There isn’t such a thing as an architectural- only solution, or a compilers-only solution to future problems in multi-core computing…

68 Now you see it; Now you don’t ! On the menace of non reproducible bugs. l Deterministic replay must ideally be an option l User programmable schedulers greatly emphasized by expert developers l Runtime model-checking methods with state- space reduction holds promise in meshing with current practice…