Author: Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica Presenter :Yinzhi Cao
Outline Background Origin X-Trace Vector Flowing Vector God OverHead Usage Scenarios Potential Problems
Background(1) Network Diagnosis Scenarios One (Accessing Website)
Background(2) Scenario Two (Distributed File System)
Background(3) Existing Method White Box X-Trace Black Box Wap5 Sherlock Comparison of White Box and Black Box WhiteBo x BlackBo x OverheadLargeSmall Modification to ProgramYesNo Notification of ProgramNoYes AccuracyHighLow
Origin of X-Trace How to Diagnosis a Person? 1. Radioactive Material Implies: We need a vector flowing in our body. 2. X-Ray Detector Implies: We need a collector to monitor activities. 3. Overhead Implies: There is no free lunch.
X-Trace(Vector) Vector: X-Trace Metadata
X-Trace(Flowing Vector) Flowing Vector Only Vectors are of no use. We make it flow and we get the info. The following is an entity we want to diagnosis.
X-Trace(Flowing Vector) Continued Let Vectors Flow. Two Ways: pushNext() and pushDown()
X-Trace(Collector) Like diagnosing a person, we need a god to collect all the data and reconstruct offline trees. The question is how to?
X-Trace(Overhead) Modification of Existing Program
X-Trace(Overhead) Continued Influence on Current Network Flow 1. Metadata is very small which brings little additional flow to the network. 2. Reports are sent in different channels which doesn’t occupy current network flow
Usage Scenarios of X-Trace(1) Web Request and Recursive DNS queries
Usage Scenarios of X-Trace(2) A Web Hosting Site
Usage Scenarios of X-Trace(3) An Overlay Network
Potential Problems Mentioned by Author Report Loss Managing Report Traffic Non-Tree Request Structures Partial Deployment Security Consideration
We have examined White Box. So let’s come to some other approach, which may not be that accurate but may cost less overhead. First, we need some models.
Author: Victor Bahl, Ranveer Chandra, Albert Greenberg, Srikanth Kandula, David A. Maltz, Ming Zhang Presenter: Yinzhi Cao
Outline Models Node Model Network Model Relationship Model How to use Our Model Algorithm Efficiency Evaluation
Models The main idea of this paper is to establish a model of network and use this model to diagnose. We have three levels of Model: Node, Network and Relationship.
Node Model Node has three status: down, up and troubled.
Network Model Graph What’s more? Inference Graph.
Relationship Model(1) Noisy-Max
Backup Slides 1 First, we use the model below. The circle means with x probability the output is the input, and with 1-x probability the output is up. Let’s use unordered pair {x,y} to represent node status. {1,1} = {1} up {0,1} troubled {0,0} = {0} down
Backup Slides 2 So the status of Child can be represented as follows. Status(Child) = |Status(Parent)Status(Parent)| means outer product. And we define |(x,y)| = = xy.
Relationship Model(2) Selector
Relationship Model(3) Failover
Backup Slides 3 We use definition before. Status(Parent1)={x1,x2}, Status(Parent2)={y1,y2}. Status(Child)={(x1+x2)x1+not(x1+x2)y1, (x1+x2)x2+not(x1+x2)y2} + means and, * means or which is skipped.
How to Use Model? Fault Localization on the Inference Graph
Algorithm Efficiency(1) Calculations inside Inference Graph ( noisy max relationship ) Reduce time complexity from O(3 n ) to O(n)
Algorithm Efficiency(2) Comparison of Multiple Input and Observation Two Methods to Use 1. Examine Data Sets with High Probability and Ignore Small Ones 2. Dynamic Programming (Reduce Redundancy)
Algorithm Efficiency(3) Author conclude two observations using these two methods. 1. It is very likely that at any point in time only a few root-cause nodes are troubled or down. 2. Since a root-cause is assigned to be up in most assignment vectors, the evaluation of an assignment vector only requires re- evaluation of states at the descendants of rootcause nodes that are not up.
Evaluation Inference Graph Established
Accuracy Compared with others
Time to Localize Faults
Impact of Errors in Inference Graph
Open Issues The Node Model is very simple, which only has three status. Can we have a continuous model of it? Can we take some stochastic process concept like Markov-Chain into this model?