User-level Internet Path Diagnosis Ratul Mahajan, Neil Spring, David Wetherall and Thomas Anderson Designed by Yao Zhao
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. L. Lamport
Motivation Can end users, with no special privileges identify and pinpoint faults inside the network that degrade the performance of their applications? Why (unprivileged) end users? Operators do not share the users ’ view of the network Operators may have no more insight than unprivileged users for problems inside other administrative domains user can directly contact the responsible ISP leading to faster problem resolution Many techniques are more effective and scalable with fault localization than blindly trying all possibilities
Outline Diagnosis architecture Diagnosis Tool: Tulip Evaluation Recommendations Conclusion
Problem
An Ideal Trace-based Solution Routers log packet activity and make these traces available to users. The log at each router is recorded for both input and output interfaces. impractical for deployment
Packet-based Solutions Complete Embedding Each router along the path records information into each packet that it forwards. Barring two exceptions, the scheme above is equivalent to the path trace. Reduced Embedding Remove the step of embedding the complete input packet in the output packet Constant Space Embedding Sample TTL Real Clocks Unsynchronized clock Finite precision
New Fields of Packet Header in the Architecture
Outline Diagnosis architecture Diagnosis Tool: Tulip Evaluation Recommendations Conclusion
Internet Approximations Out-of-band measurement probes ICMP timestamp requests to access time at the router IP identifiers instead of per-flow counters
Packet Reordering
Assumptions for Packet Loss IP-IDs are consecutive 80% of the time from over 90% of the routers Small size packets usually have low loss rate In over 60% of the cases when any packet in the triplet was lost, only the data packet was lost. ICMP rate-limiting will not be mistaken as packet loss 1 more check packet
Packet Loss
Packet Queuing Similar to cing Two practical problems: ICMP generation time Cable modems and wireless links
Tulip Network Load BL/W Diagnosis time 10 ~ 30 min per path Parallel search vs Binary search Two or more faults?
Outline Diagnosis architecture Diagnosis Tool: Tulip Evaluation Recommendations Conclusion
Methodology Evaluate applicability Diagnosis granularity Three sources: MIT, U Washington and London Destinations from Skitter Validation
Diagnosis granularity (1)
Diagnosis granularity (2)
Validation IP-IDs and ICMP timestamp vs End-to- end measurement Tulip vs Sting Consistency of Tulip ’ s inferences Consistency between Tulip and Paths
Two facts Locating Loss and Delay in the Internet Persistence of Faults
Outline Diagnosis architecture Diagnosis Tool: Tulip Evaluation Recommendations Conclusion
Limitations of Tulip Out-of-band measurements Stable routing path IP-ID counters Limitations of ICMP timestamps
In-band vs Out-of-band Diagnosis Priority of protocols Packet drop Packet size Loss rate Reordering
Other Recommendations Path Verification IP Identifiers Router Timestamps
Related Works Diagnosis Approaches Magpie SPIE NetFlow Measurement Primitives Overlay primitives IPMP Measurement Tools PING, Traceroute, pathchar, Sting
Conclusion Tulip Practical tool to diagnose packet reordering, loss and queuing Diagnosis architecture In-band Lightweight
Questions?