Huayang Guo 1,2, Ming Wu 1, Lidong Zhou 1, Gang Hu 1,2, Junfeng Yang 2, Lintao Zhang 1 1 Microsoft Research Asia 2 Columbia University Practical Software.

Slides:



Advertisements
Similar presentations
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Advertisements

POPL'05: Dynamic Partial-Order ReductionCormac Flanagan1 Dynamic Partial-Order Reduction for Model Checking Software Cormac Flanagan UC Santa Cruz Patrice.
Model Checking for an Executable Subset of UML Fei Xie 1, Vladimir Levin 2, and James C. Browne 1 1 Dept. of Computer Sciences, UT at Austin 2 Bell Laboratories,
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Towards Self-Testing in Autonomic Computing Systems Tariq M. King, Djuradj Babich, Jonatan Alava, and Peter J. Clarke Software Testing Research Group Florida.
A SOFT Way for OpenFlow Interoperability Testing Marco Canini TU Berlin / T-Labs [CoNEXT’12]
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Heming Cui, Gang Hu, Jingyue Wu, Junfeng Yang Columbia University
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang,
Effectively Model Checking Real-World Distributed Systems Junfeng Yang Joint work with Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu, Lintao Zhang, Heming.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Lab 2 Group Communication Andreas Larsson
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar Shaz Qadeer.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar [UC Berkeley] Shaz Qadeer [Microsoft Research]
Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball and Sriram K. Rajamani Software Productivity Tools, Microsoft Research Presented.
1 Eran Yahav and Mooly Sagiv School of Computer Science Tel-Aviv University Verifying Safety Properties.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Rex: Replication at the Speed of Multi-core Zhenyu Guo, Chuntao Hong, Dong Zhou*, Mao Yang, Lidong Zhou, Li Zhuang Microsoft ResearchCMU* 1.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Thread-modular Abstraction Refinement Thomas A. Henzinger, et al. CAV 2003 Seonggun Kim KAIST CS750b.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
1 The Google File System Reporter: You-Wei Zhang.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Parallel and Distributed Computing in Model Checking Diana DUBU (UVT) Dana PETCU (IeAT, UVT)
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Using Model-Checking to Debug Device Firmware Sanjeev Kumar Microprocessor Research Labs, Intel Kai Li Princeton University.
LOGO Service and network administration Storage Virtualization.
System Research in MSRA Bingsheng He 1. Overview Research works within SRG – Debugging tools – Many-core operating system Personal research interests.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 2 ARCHITECTURES.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Machine Learning.
Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Advanced Computer Architecture & Processing Systems Research Lab Framework for Automatic Design Space Exploration.
1 RealProct: Reliable Protocol Conformance Testing with Real Nodes for Wireless Sensor Networks Junjie Xiong, Edith C.-Ngai, Yangfan Zhou, Michael R. Lyu.
E X PLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Michał Jankowski, Paweł Wolniewicz, Jiří Denemark, Norbert Meyer,
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.
Bin Xin, Patrick Eugster, Xiangyu Zhang Dept. of Computer Science Purdue University {xinb, peugster, Lightweight Task Graph Inference.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
The Client/Server Database Environment
Self Healing and Dynamic Construction Framework:
Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.)
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Software Myths Software is easy to change
CS122B: Projects in Databases and Web Applications Winter 2018
Model Checking for an Executable Subset of UML
Fault Tolerance Distributed Web-based Systems
FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems Jeffrey F. Lukman, Huan Ke, Cesar Stuardo, Riza Suminto, Daniar Kurniawan,
CS122B: Projects in Databases and Web Applications Winter 2019
CS122B: Projects in Databases and Web Applications Spring 2018
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

Huayang Guo 1,2, Ming Wu 1, Lidong Zhou 1, Gang Hu 1,2, Junfeng Yang 2, Lintao Zhang 1 1 Microsoft Research Asia 2 Columbia University Practical Software Model Checking via Dynamic Interface Reduction

Building reliable distributed systems is hard Machine failure Message lost Message reorder Thread interleaving Non-determinism leads to tricky bugs Crash Thr1 Thr2 Async I/O

Implementation-level software model checkers MaceMC (NSDI ’ 07), MoDist (NSDI ’ 09) Directly check implementations No need to construct abstract model beforehand Crash Thr1 Thr2 Async I/O State Space Explorer …

State space explosion MPS: Product-level Paxos Never fully explored 3 nodes 34 years for MoDist …

Dynamic Interface Reduction (DIR) Effective 34 years  18 hours (Fully explored MPS-3) Exponential Reduction: 100K : 1 states for MPS and Berkeley DB w/ replication Automatic, no manual efforts required Provably sound and complete Easy to integrate with legacy MCs DeMeter: DIR with MoDist and MaceMC MC specific modifications: ≤ 1k loc 5

Outline Insight Challenges Dynamic Interface Reduction Evaluation Related work Conclusion 6

Insight Distributed systems: componentized Local non-determinism isolated Empirically, 99.9% do not propagate (Berkeley DB) Previous work: Check components together |m1|*|m2|*|m3| DIR: Check components separately |m1|+|m2|+|m3| 7 Thr1 Thr2 Async I/O Thr3 Thr4 Interface behavior m1 m2 m3

Challenges and Solutions How to discover/construct interface behavior of component? Manually or statically construct interface process Impractical for complex software system How to guarantee Completeness: find all bugs Soundness: no false positives Our solution: Dynamically discover interface behaviors Combine discovered interface behaviors Track dependencies 8

DIR Overview 9 Global Explorer Explore global interface behaviors Local Explorers Component1 Component2 Component3 Explore local states Interface behavior

Example 10 Sum Ckpt Client Primary/Secondary //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Send(P,2); total+=n; Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Client Primary Secondary

Produce initial global trace 11 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer -- Produce initial global trace.

Construct message trace 12 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer -- Bold statements form the message trace.

Project message trace 13 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer -- Project global message trace to components. Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Pri.Recv(Cli, 2) Pri.Send(Sec, 2) Primary Sec.Recv(Pri, 1) Sec.Recv(Pri, 2) Secondary Cli.Send(Pri, 1) Cli.Send(Pri, 2) Client

Local explorer for Primary 14 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Pri.Recv(Cli, 2) Pri.Send(Sec, 2) Local explorer for Primary Pri.Ckpt Pri.Sum Pri.Ckpt Pri.Sum Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Pri.Recv(Cli, 2) Pri.Send(Sec, 2) Pri.Sum Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Pri.Recv(Cli, 2) Pri.Send(Sec, 2)

Local explorer for Client 15 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer Cli.Send(Pri, 1) Cli.Send(Pri, 2) Local explorer for Client Cli.Choose(2) = 0 Cli.Send(Pri, 1) Cli.Send(Pri, 3) Cli.Choose(2) = 1 Branching Trace

Composition 16 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Existing global message trace: Cli.Send(Pri, 1) Cli.Send(Pri, 3) Branching local message trace: dependence ==

Composition 17 Client(Cli) Primary/Secondary(Pri/Sec) //Main thread //Checkpoint thread if (Choose(2)==0){ while (n=Recv()) { Lock(); Send(P,1); Lock(); Log(total); Ckpt Send(P,2); total+=n; Sum Unlock(); } else { Unlock(); Send(P,1); if (isPrimary) Send(P,3); Send(S,n); } Cli.Choose(2) = 0 Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Ckpt Pri.Sum Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Sec.Ckpt Sec.Sum Cli.Send(Pri, 2) Pri.Recv(Cli, 2) Pri.Sum Pri.Send(Sec, 2) Sec.Recv(Pri, 2) Sec.Sum Global explorer Cli.Send(Pri, 1) Pri.Recv(Cli, 1) Pri.Send(Sec, 1) Sec.Recv(Pri, 1) Cli.Send(Pri, 3) New global message trace:

Evaluation Experiment Setup D E M ETER -M O D IST : MPS, an deployed product implementation of Paxos Berkeley DB (BDB) D E M ETER -M ACE MC: Chord, peer-to-peer DHT implementation 18

Evaluation Effectiveness of Dynamic Interface Reduction App-n : n is the number of distributed nodes Reduction Ratio: |M w/o DIR | / |M w DIR | 19 AppMPS-2MPS-3BDB-2BDB-3Chord-2Chord-3 Reduction Speedup x1000 x100 DeMeter-ModistDeMeter-MaceMC

Related Work Compositional model checking E.M.Clarke et. al. (Symposium on Logic in Computer Science 1989) Partial-order reduction C.Flanagan and P.Godefroid (POPL ’ 05) Model checking network system R.Guerraoui and M.Yabandeh (NSDI ’ 11) 20

Conclusion Distributed systems  componentized Local non-determinism does not propagate Dynamic interface reduction Effective, automatic, easy Provably sound and complete DeMeter – enable DIR for legacy MCs 21