What is Fault in an Overlay Network and How Can We Tolerate Them?

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Large-Scale Distributed Systems Andrew Whitaker CSE451.
Ethernet Automatic Protection Switching (EAPS)
1 Fault-Tolerant Computing Systems #6 Network Reliability Pattara Leelaprute Computer Engineering Department Kasetsart University
The google file system Cs 595 Lecture 9.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.
More on protocol implementation Version walks Timers and their problems.
Garbage Collecting the World. --Bernard Lang, Christian and Jose Presented by Shikha Khanna coen 317 Date – May25’ 2005.
The Connectivity and Fault-Tolerance of the Internet Topology
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Last Class: Weak Consistency
XML Store Christian Theil Have, René Kofoed, References: Kasper Pedersen & Jesper Pedersen, Value-oriented.
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.
Slicing the Onion: Anonymity Using Unreliable Overlays Sachin Katti Jeffrey Cohen & Dina Katabi.
Securing Every Bit: Authenticated Broadcast in Wireless Networks Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zarko Milosevic, and Calvin Newport.
P2PSIP diagnostics Song Haibin draft-zheng-p2psip-diagnose-02
1 Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers Shetal Shah, IIT Bombay Kirthi Ramamritham, IIT Bombay Prashant.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
1 Detecting and Reducing Partition Nodes in Limited-routing-hop Overlay Networks Zhenhua Li and Guihai Chen State Key Laboratory for Novel Software Technology.
Chuang, Sang, Killian and Kulkarni, “Programming Model Support for Dependable, Elastic Cloud Applications” HotDep ‘12 Programming Model Support for Dependable,
A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.
1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
CS603 Fault Tolerance - Communication April 17, 2002.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
More Distributed Garbage Collection DC4 Reference Listing Distributed Mark and Sweep Tracing in Groups.
EEC 688/788 Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
PeerReview: Practical Accountability for Distributed Systems SOSP 07.
SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.
Wireless/mobility group Luís Rodrigues, Aline Viana, Roy Friedman, Daniela Gavidia, Spyros Voulgaris.
Ben Miller.   A distributed algorithm is a type of parallel algorithm  They are designed to run on multiple interconnected processors  Separate parts.
Bigtable: A Distributed Storage System for Structured Data
Distributed Error- Confinement Shay Kutten (Technion) with Boaz Patt-Shamir (Tel Aviv U.) Yossi Azar (Tel Aviv U.)
Naming CSCI 6900/4900. Unreferenced Objects in Dist. Systems Objects no longer needed as nobody has a reference to them and hence will not use them Garbage.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Fault Tolerance Comparison
Large Distributed Systems
Cluster Communications
Garbage Collection Modern programming languages provide garbage collection mechanisms for reclaiming the memory locations that are no longer used by programs.
Outline Announcements Fault Tolerance.
Fault Tolerance Distributed Web-based Systems
Jacob Gardner & Chuan Guo
EEC 688/788 Secure and Dependable Computing
Random inserting into a B+ Tree
Object Location Problem: Find a close copy of an object in a large network Solution should: Find object if it exists Find a close copy of the object (no.
Distributed Error- Confinement
Types of topology. Bus topology Bus topology is a network type in which every computer and network device is connected to single cable. When it has exactly.
Team 6: Ali Nickparsa, Yoshimichi Nakatsuka, Yuya Shiraki
RANDOM NUMBERS SET # 1:
Presentation transcript:

What is Fault in an Overlay Network and How Can We Tolerate Them?

What is a fault? Question Is it a fault if a message from A fails to reach B? –A can reach C. –C can reach B. –But A cannot reach B. C BA Question Suppose we find an object, but too late?

How Do Faults Happen? Accidents –Uniformly at random –Correlated failures—how? Malice –A few rotten apples –Big organization

Growing Faults Insertion requires good information. –Garbage in, garbage out Faulty node can cause many errors –Silently drops all messages through node  Resilient to misbehavior as well as delete

Replication Helps Objects have multiple independent roots May have to wait too long, objects have multiple connected roots. –Backup roots Get information from multiple sources, and check it.

Questions What is a fault? What sort of faults can we handle? When do we give up? How can we deal with partial faults? How can we detect misbehavior from misconfiguration or malice? What techniques and ideas help?