What is Fault in an Overlay Network and How Can We Tolerate Them?
What is a fault? Question Is it a fault if a message from A fails to reach B? –A can reach C. –C can reach B. –But A cannot reach B. C BA Question Suppose we find an object, but too late?
How Do Faults Happen? Accidents –Uniformly at random –Correlated failures—how? Malice –A few rotten apples –Big organization
Growing Faults Insertion requires good information. –Garbage in, garbage out Faulty node can cause many errors –Silently drops all messages through node Resilient to misbehavior as well as delete
Replication Helps Objects have multiple independent roots May have to wait too long, objects have multiple connected roots. –Backup roots Get information from multiple sources, and check it.
Questions What is a fault? What sort of faults can we handle? When do we give up? How can we deal with partial faults? How can we detect misbehavior from misconfiguration or malice? What techniques and ideas help?