Effectively Model Checking Real-World Distributed Systems Junfeng Yang Joint work with Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu, Lintao Zhang, Heming.

Slides:



Advertisements
Similar presentations
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Openflow App Testing Chao SHI, Stephen Duraski. Motivation Network is still a complex stuff ! o Distributed mechanism o Complex protocol o Large state.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
Parasol Architecture A mild case of scary asynchronous system stuff.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Heming Cui, Gang Hu, Jingyue Wu, Junfeng Yang Columbia University
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang,
Multiple Processor Systems
Troubleshooting SDN Control Software with Minimal Causal Sequences COLIN SCOTT, ANDREAS WUNDSAM, BARATH RAGHAVANAUROJIT PANDA, ANDREW OR, JEFFERSON LAI,EUGENE.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar Shaz Qadeer.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar [UC Berkeley] Shaz Qadeer [Microsoft Research]
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
1 Efficient Memory Safety for TinyOS 2.1 Yang Chen Nathan Cooprider Will Archer Eric Eide David Gay † John Regehr University of Utah School of Computing.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Learning from the Past for Resolving Dilemmas of Asynchrony Paul Ezhilchelvan and Santosh Shrivastava Newcastle University England, UK.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable threads HEMING CUI, YI-HONG LIN, HAO LI, XINAN XU, JUNFENG YANG, JIRI SIMSA, BEN BLUM,
2001 ©R.P.Martin Using Distributed Data Structures for Constructing Cluster-Based Servers Richard Martin, Kiran Nagaraja and Thu Nguyen Rutgers University.
Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
Rex: Replication at the Speed of Multi-core Zhenyu Guo, Chuntao Hong, Dong Zhou*, Mao Yang, Lidong Zhou, Li Zhuang Microsoft ResearchCMU* 1.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Automatic Mediation Of Privacy-sensitive Resource Access In Smartphone Applications Ben Livshits and Jaeyeon Jung Microsoft Research.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,
Distributed Systems: Concepts and Design Chapter 1 Pages
Testing and Verifying Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford.
System Research in MSRA Bingsheng He 1. Overview Research works within SRG – Debugging tools – Many-core operating system Personal research interests.
Huayang Guo 1,2, Ming Wu 1, Lidong Zhou 1, Gang Hu 1,2, Junfeng Yang 2, Lintao Zhang 1 1 Microsoft Research Asia 2 Columbia University Practical Software.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
E X PLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University.
Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
The article collection PRIS F7 Fredrik Kilander. Content “On agent-based software engineering” Nick Jennings, 1999 “An agent-based approach for building.
1 Software Reliability in Wireless Sensor Networks (WSN) -Xiong Junjie
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Chapter 8 – Software Testing
Chapter 18 Software Testing Strategies
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
Providing Secure Storage on the Internet
Model Checking for an Executable Subset of UML
Fault Tolerance Distributed Web-based Systems
FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems Jeffrey F. Lukman, Huan Ke, Cesar Stuardo, Riza Suminto, Daniar Kurniawan,
Presentation transcript:

Effectively Model Checking Real-World Distributed Systems Junfeng Yang Joint work with Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu, Lintao Zhang, Heming Cui, Jingyue Wu, Chia-che Tsai, John Gallagher 1

One-slide Summary Distributed systems: important, but hard to get right Model checking: find serious bugs but is slow Dynamic Interface Reduction: a new type of state- space reduction technique in 25 years [DeMeter SOSP 11] – exponentially speed up model checking – One data point: 34 years  18 hours Stable Multithreading: a radically new approach [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] – what-you-check-is-what-you-run – Billions of years  7 hours – 2

Distributed Systems: Pervasive and Critical 3

Distributed Systems: Hard to Get Right Node has no centralized view of entire system Code must correctly handle many failures – Link failures, network partitions – message loss, delay, or reordering – machine crashes Worse: geo, larger, weird failures more likely  Complex protocols, more complex code, bugs 4

Model Checking Distributed Systems Implementations 5 … Choices of actions – Send message – Recv message – Run thread – Delay message – Fail link – Crash machine –…–… Run checkers on states – E.g., assertions send fail link thread crash …

Good Error Detection Results E.g., [MoDist NSDI 09] [dBug SSV 10] – Easy: check unmodified, real code in native environment (“in-situ” [eXplode OSDI 06] ) – Comprehensive: check many corner cases – Deterministic: detected errors can be replay MoDist results – Checked Berkeley DB rep, MPS (Microsoft production), PacificA – Found 35 bugs 10 Protocol flaws found in every system checked – Transfer to Microsoft product groups 6

But, the State Explosion Problem Real-world distributed systems have too many states to completely explore –Even for conceptually small state spaces –3-node MPS: 34 years for MoDist! Incompleteness  Low assurance Prior model checkers explored many redundant states 7

This Talk: Two Techniques to Effectively Reduce/Shrink State Space Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11] –34 years  18 hours, 10^5 reduction Leverage Stable Multithreading [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] to make what- you-check what-you-run (ongoing) 8

Dynamic Interface Reduction (DIR) Insight: system builders decompose a system into components with narrow interfaces – e.g., [Clarke, Long, McMillan 87] [Laster, Grumberg 98] Distinguish global and local actions Check local actions via conceptually local fork() 9 // main // ckpt n=recv() total+=n Send(n) Log(total)

Reduction Analysis N components, each having M local actions 10 w/o DIR: M * M * … * M = M^N w DIR: M + M + … + M = M * N Exponential reduction … … … … …

Challenge in Implementing DIR How to automatically compute interfaces from real code w/o causing false positives or missing bugs? Manual spec: tedious, costly, error-prone – Required by prior compositional or modular model checking work Made-up interfaces: difficult-to-diagnose false positives [Guerraoui and Yabandeh, NSDI 11] 11

Automatically Discover Interface by Running Code 12 Global Explorer Explore global actions Local Explorers Explore local actions Explore local actons Message Traces Insight: message traces collectively define interfaces Message Traces

13 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Example // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S

14 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Global Explorer: Compute Initial Global Trace // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global

15 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Global Explorer: Project Message Traces // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2)

16 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorers: Explore Local Actions Using Message traces // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2)

17 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Primary: Explore Local Trace 1 // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2) P.Log P.total+=1 P.total+=2

18 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Primary: Explore Local Trace 2 // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2) P.Log P.total+=1 P.total+=2

19 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Primary: Explore Local Trace 3 // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2) P.Log P.total+=1 P.total+=2

20 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Client // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2)

21 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Client // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 2) C.Toss(2) = 0

22 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Local Explorer of Client Found New Message Trace // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 3) C.Toss(2) = 1 C.Send(P, 2)

23 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Global Explorer: Composition // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 2) P.Recv(C, 2) P.total+=2 P.Send(S, 2) S.Recv(P, 2) S.total+=2 Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 3) C.Toss(2) = 1 C.Send(P, 2)

24 // main // ckpt While(n=recv()){ total+=n Send(S, n) } Log(total) if (Toss(2) == 0)) { Send(P, 1); Send(P, 2); } else { Send(P, 1); Send(P, 3); } Global Explorer: New Global Trace // main // ckpt While(n=recv()){ total+=n } Log(total) Client C Primary P Second S C.Toss(2) = 0 C.Send(P, 1) P.Recv(C, 1) P.Log P.total+=1 P.Send(S, 1) S.Recv(P, 1) S.Log S.total+=1 C.Send(P, 3) Global P.Recv(C, 1) P.Send(S, 1) P.Recv(C, 2) P.Send(S, 2) S.Recv(P, 1) S.Recv(P, 2) C.Send(P, 1) C.Send(P, 3) C.Toss(2) = 1 C.Send(P, 2)

Implementation 7,279 lines of C++ Integrated DIR with –MoDist [MoDist NSDI 09],757 lines –MaceMC [MaceMC NSDI 07],1,114 lines –Easy Orthogonal with partial order reduction through vector clock tricks 25

Verification/Reduction Results MPS (Microsoft production system) BDB: Berkeley DB Replication Chord: Chord implementation in Mace *-n: n nodes Results of other benchmarks in [Demeter SOSP 11] 26 AppMPS-2MPS-3BDB-2BDB-3Chord-2Chord-3 Reduction Speedup DIR-MoDistDIR-MaceMC

DIR Summary Proven sound (introduce no false positive) and complete (introduce no false negative) Fully automatic, real, exponential reduction Works seamlessly w/ existing model checkers –Integrated into MoDist and MaceMC; easy Results –Verified instances of real-world systems –Empirically observed large reduction 34 years  18 hours (10^5) on MPS 27

This Talk: Two Techniques to Effectively Reduce State Space Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11] –34 years  18 hours, 10^5 reduction Leverage Stable Multithreading [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] to make what- you-check what-you-run (ongoing) 28

Threads: Difficult to Model Check Many thread interleavings, or schedules – To verify, local explorer must explore all schedules Wide interfaces between threads – Any shared-memory load/store – Tracing load/store is costly – DIR may not work well  29

What-you-check is what-you-run Coverage = C/R Reduction: enlarge C exploiting equivalence But equivalence is rare, hard to find! – DIR took us 2-3 years Can we increase coverage w/o equivalence? Shrink R w/ Stable Multithreading [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] 30 All possible runtime schedules (R) Model checked schedules (C)

Stable Multithreading 31 Reuse well-checked schedules on diff. inputs How does it work? See papers [Tern OSDI '10] [Peregrine SOSP '11] [PLDI '12] [Parrot SOSP '13] [CACM '13] So much easier that it feels like cheating Nondeterministic Stable Deterministic

Conclusion Dynamic Interface Reduction: check components separately to avoid costly global exploration [DeMeter SOSP 11] – Automatic, real, exponential reduction – Proven sound and complete –34 years  18 hours, 10^5 reduction Leverage Stable Multithreading [Tern OSDI '10] [Peregrine SOSP '11] to make what-you-check what-you-run (ongoing) 32

Key Challenge Make stable multithreading work with real- world distributed systems – Physical time? – Message passing? – Dynamic load balancing? – Overhead? 33