Download presentation
Presentation is loading. Please wait.
Published byBrendan Banks Modified over 9 years ago
1
Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications
2
Motivation Real-world data loss – Facebook (03/09), HSBC (07/09), T-Mobile (10/09) – 7 out of 10 small-medium firms go out of business Failures in the cloud – Crashes, disk and network failures (permanent and transient), corruptions, etc. – “Millions of opportunities” for failures to occur Not just single, but also combinations of failures
3
Testing Cloud Infrastructure Big and complex recovery – HDFS/GFS HDFS: 44 out of 600 bug reports/issues pertain to recovery – Cassandra/Dynamo+BigTable, Zookeeper/Chubby Existing approaches – Mostly single failures – Random multiple failures How to systematically test cloud infrastructure against failures?
4
Google FS – Write Protocol Client API 1 1 Master DataNode 1 DataNode 1 DataNode 2 DataNode 2 DataNode 3 DataNode 3 4 4 2 2 3 3
5
HDFS Implementation of Write Client API Master DataNode 1 DataNode 1 DataNode 2 DataNode 2 DataNode 3 DataNode 3 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
6
Failure Service Goal: – Exercise combinations of failures systematically Method: – Identify a failure point: Source location, node id, stack trace, etc. – Specify max # failures – Run the workload until all combinations of failures are exhausted (similar to model checking) Challenges: – Efficiency: 400+ fail experiments due to 6 bugs Single write workload: thousands of experiments – Coverage: failure coverage, workload coverage, state coverage – Integrating failure service into a model checker
7
Declarative Checks How do we conclude bugs? – Client observable behaviors – Write distributed invariant checks Current approach: – Checks are written in Java/C++ – A check is more than 50 LOC 100 checks 5000 LOC Our goal/method: – Encourage developers to write hundreds of specs More specs, more bugs (esp. silent bugs) – Use relational logic language (Bloom) Write specs in Bloom Convert runtime events from inter-node protocols, disk I/Os, etc. into Bloom events (i.e., directly verify the implementation against the specs)
8
Conclusion Towards thousands of failures – Failure service Towards hundreds of specifications – Declarative checks with Bloom
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.