Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.

Similar presentations


Presentation on theme: "Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications."— Presentation transcript:

1 Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications

2 Motivation Real-world data loss – Facebook (03/09), HSBC (07/09), T-Mobile (10/09) – 7 out of 10 small-medium firms go out of business Failures in the cloud – Crashes, disk and network failures (permanent and transient), corruptions, etc. – “Millions of opportunities” for failures to occur Not just single, but also combinations of failures

3 Testing Cloud Infrastructure Big and complex recovery – HDFS/GFS HDFS: 44 out of 600 bug reports/issues pertain to recovery – Cassandra/Dynamo+BigTable, Zookeeper/Chubby Existing approaches – Mostly single failures – Random multiple failures How to systematically test cloud infrastructure against failures?

4 Google FS – Write Protocol Client API 1 1 Master DataNode 1 DataNode 1 DataNode 2 DataNode 2 DataNode 3 DataNode 3 4 4 2 2 3 3

5 HDFS Implementation of Write Client API Master DataNode 1 DataNode 1 DataNode 2 DataNode 2 DataNode 3 DataNode 3 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

6 Failure Service Goal: – Exercise combinations of failures systematically Method: – Identify a failure point: Source location, node id, stack trace, etc. – Specify max # failures – Run the workload until all combinations of failures are exhausted (similar to model checking) Challenges: – Efficiency: 400+ fail experiments due to 6 bugs Single write workload: thousands of experiments – Coverage: failure coverage, workload coverage, state coverage – Integrating failure service into a model checker

7 Declarative Checks How do we conclude bugs? – Client observable behaviors – Write distributed invariant checks Current approach: – Checks are written in Java/C++ – A check is more than 50 LOC 100 checks  5000 LOC Our goal/method: – Encourage developers to write hundreds of specs More specs, more bugs (esp. silent bugs) – Use relational logic language (Bloom) Write specs in Bloom Convert runtime events from inter-node protocols, disk I/Os, etc. into Bloom events (i.e., directly verify the implementation against the specs)

8 Conclusion Towards thousands of failures – Failure service Towards hundreds of specifications – Declarative checks with Bloom


Download ppt "Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications."

Similar presentations


Ads by Google