Presentation is loading. Please wait.

Presentation is loading. Please wait.

A data-centric platform for analyzing distributed systems Provenance Maintenance and Querying on Log-structured Databases.

Similar presentations


Presentation on theme: "A data-centric platform for analyzing distributed systems Provenance Maintenance and Querying on Log-structured Databases."— Presentation transcript:

1 A data-centric platform for analyzing distributed systems Provenance Maintenance and Querying on Log-structured Databases

2 2 An Example Scenario An example scenario: network routing – The route to foo.com has suddenly changed – Alice wants to understand the exact cause Why did my route to foo.com change?! Alice foo.com Route r 1 Route r 2 Innocent Reason? Malicious Attack? A D E B C Software Bugs?

3 3 For network routing … – “YouTube blames Pakistan ISP for global outage” (Feb 2008) – “A Chinese ISP momentarily hijacks the Internet” (March 2010) – “Unknown fault darkens Australia’s Internet” (Feb 2012) … but also for other application scenarios – Distributed hash table: Eclipse attack – Cloud computing: misbehaving machines – Online multi-player gaming: cheating Goal: To understand and debug behavior of distributed systems 3 Anomalies in Distributed Systems

4 4 We assume a general distributed system – Network consists of nodes (routers, middleboxes,...) – The state of a node is a set of tuples (routes, config,...) – Idea: Explanation as reasoning of state dependencies Alice foo.com route(C, foo.com) link(C, foo.com) route(A, B) A BC D E …… route(B, foo.com) link(B, C) route(A, foo.com) link(A, B) route(A, D) link(A, B) link(A, D) A Data-centric Perspective

5 5 Provenance for encoding state dependencies – Explains the derivation of tuples – Captures the dependencies between tuples as a graph – Explanation of a tuple is a tree rooted at the tuple Route r1 disappeared due to a link failure between B and C Alice foo.com route(C, foo.com) link(C, foo.com) A BC D E route(B, foo.com) link(B, C) route(A, foo.com) link(A, B) route(D, foo.com) link(D, E) route(E, foo.com) link(E, B) Provenance as Explanations

6 6 In traditional database systems – Provenance deltas between adjacent system state – Logs of all non-deterministic events – Replay the events to reconstruct provenance – Problem: storage overhead In log-structured database systems – Only maintain logs of events, but not the latest system state – Natural for provenance support (with no additional cost) – Example: Hyder [CIDR 2011] build upon SSDs for web services Proposal: Provenance Maintenance

7 7 Proposal: Provenance Querying An efficient data structure for provenance querying – Backward pointers to most recent update to the same state – Chained pointers for reconstructing the a specific state Optimization of provenance querying – Naïve approach: reconstruct the complete provenance graph – Optimization: only reconstruct the provenance as necessary route(A,B,5)route(A,B,7)route(A,B,3)

8 8 Project Arrangement Project plan – Develop the provenance system on log-structure databases – Evaluate the provenance system against several applications – Performance impact on primary system – Performance (latency) of provenance queries Budget – Time frame – System development: 6 months – Performance evaluation: 3 months – Cost: $75,000


Download ppt "A data-centric platform for analyzing distributed systems Provenance Maintenance and Querying on Log-structured Databases."

Similar presentations


Ads by Google