Download presentation
Presentation is loading. Please wait.
Published byVirginia Atkinson Modified over 9 years ago
1
Presenter: Chi-Hung Lu 1
2
Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments Protocols involve complex interactions among a collection of networked machines Need to handle failures ranging from network problems to crashing nodes Intricate sequences of events can trigger complex errors as a result of mishandled corner cases 2
3
Approaches Logging-based Debugging X-Trace Bi-directional Distributed BackTracker (BDB) Pip Deterministic Replay WiDS Friday Jockey Model Checking MaceMC 3
4
R. Fonseca et al, NSDI 07 4
5
Problem Description It is difficult to diagnose the source of the problem for an internet application Current network diagnostic tools only focus on one particular protocol Does not share information on the application between the user, service, and the network operators 5
6
Examples traceroute Could locate IP connectivity problem Could not reveal proxy or DNS failures HTTP monitoring suite Could locate application problem Could not diagnose routing problems 6
7
Examples 7 User DNS Server Proxy Web Server
8
Examples 8 User DNS Server Proxy Web Server
9
Examples 9 User DNS Server Proxy Web Server
10
Examples 10 User DNS Server Proxy Web Server
11
X-Trace An integrated tracing framework Record the network path that were taken Invoke X-Trace when initiating an application task Insert X-Trace metadata with a task identifier in the request Propagate the metadata down to lower layers through protocol interfaces 11
12
Task Tree X-Trace tags all network operations resulting from a particular task with the same task identifier Task tree is the set of network operations connected with an initial task Task tree could be reconstruct after collecting trace data with reports 12
13
An example of the task tree A simple HTTP request through a proxy 13
14
X-Trace Components Data X-Trace metadata Network path Task tree Report Reconstruct task tree 14
15
Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 15
16
Propagation of X-Trace Metadata The propagation of X-Trace metadata through the task tree 16
17
The X Trace metadata FieldUsage FlagsBits that specify which of the three optional components are present TaskIDAn unique integer ID TreeInfoParentID, OpID, EdgeType DestinationSpecify the address that X-Trace report should be sent to OptionsAccommodate future extensions mechanism 17
18
Operation of X-Trace Metadata 18
19
Operation of X-Trace Metadata 19
20
X-Trace Report Architecture 20
21
X-Trace Report Architecture 21
22
X-Trace Report Architecture 22
23
Usage Scenario (1) Web request and recursive DNS queries 23
24
Usage Scenario (2) A request fault annotated with user input 24
25
Usage Scenario (3) A client and a server communicate over I3 overlay network 25
26
Usage Scenario (3) Internet Indirect Infrastructure (I3) 26
27
Usage Scenario (3) Internet Indirect Infrastructure (I3) 27
28
Usage Scenario (3) Internet Indirect Infrastructure (I3) 28
29
Usage Scenario (3) Tree for normal operation 29
30
Usage Scenario (3) The receiver host fails 30
31
Usage Scenario (3) Middlebox process crash 31
32
Usage Scenario (3) The middlebox host fails 32
33
Discussion Report loss Non-tree request structures Partial deployment Managing report traffic Security Considerations 33
34
X. Liu et al, NSDI 07 34
35
Problem Description Log mining is both labor-intensive and fragile Latent bugs often are distributed across multiple nodes Logs reflect incomplete information of an execution Non-determinism of distributed application 35
36
Goals Efficiently verify application properties Provide fairly complete information about an execution Reproduce the buggy runs deterministically and faithfully 36
37
Approach Log the actual execution of a distributed system Apply predicate checking in a centralized simulator over a run driven by testing scripts or replayed by logs Output violation report along with message traces An execution is interpreted as a sequence of events, which are dispatched to corresponding handling routines 37
38
Components A versatile script language Allow a developer to refine system properties into straightforward assertions A checker Inspect for violations 38
39
Architecture Components of WiDS Checker 39
40
Architecture Reproduce real runs Log all non-deterministic events using Lamport’s logical clock Check user-defined predicates A versatile scription language to specify system states being observed and the predicates for invariants and correctness Screen out false alarms with auxiliary information For liveness properties Trace root causes using a visualization tool 40
41
Programming with WiDS WiDS APIs are mostly member function of the WiDSObject class WiDS runtime maintains an event queue to buffer pending events and dispatches them to corresponding handling routines 41
42
Enabling Replay Logging Log all WiDS nondeterminism Redirect OS calls and log the results Embed a Lamport Clock in each out-going message Checkpoint Support partial replay Save the WiDS process context Replay Start from the beginning or a checkpoint Replay events in serialized Lamport order 42
43
Checker Observe memory state Define states and evaluate predicates Refresh database for each event Maintain history Re-evaluate modified predicates Auxiliary information for violations Liveness properties only guarantee to be true eventually 43
44
44
45
45
46
46
47
Visualization Tools Message flow graph 47
48
Evaluation Benchmark and result summary 48
49
Performance Running time for evaluating predicates 49
50
Logging Overhead Percentage of logging time 50
51
Discussion System is debugged by those who developed it Bugs are hunted by those who are intimately familiar with the system 51
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.