Epidemic Techniques Algorithms and Implementations.

Slides:



Advertisements
Similar presentations
Chapter 14 – Authentication Applications
Advertisements

COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Lecture 7 Data distribution Epidemic protocols. EECE 411: Design of Distributed Software Applications Epidemic algorithms: Basic Idea Idea Update operations.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Ranveer Chandra , Kenneth P. Birman Department of Computer Science
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
1 LINK STATE PROTOCOLS (contents) Disadvantages of the distance vector protocols Link state protocols Why is a link state protocol better?
Secure Multicast Xun Kang. Content Why need secure Multicast? Secure Group Communications Using Key Graphs Batch Update of Key Trees Reliable Group Rekeying.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class: Web Caching Use web caching as an illustrative example Distribution protocols –Invalidate.
CS 582 / CMPE 481 Distributed Systems
Mesh Networks A.k.a “ad-hoc”. Definition A local area network that employs either a full mesh topology or partial mesh topology Full mesh topology- each.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
CSE331: Introduction to Networks and Security Lecture 9 Fall 2002.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Application Layer At long last we can ask the question - how does the user interface with the network?
Self Healing Wide Area Network Services Bhavjit S Walha Ganesh Venkatesh.
Astrolabe Serge Kreiker. Problem Need to manage large collections of distributed resources (Scalable system) The computers may be co-located in a room,
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Anonymous Gossip: Improving Multicast Reliability in Mobile Ad-Hoc Networks Ranveer Chandra (joint work with Venugopalan Ramasubramanian and Ken Birman)
Application Layer Multicast for Earthquake Early Warning Systems Valentina Bonsi - April 22, 2008.
Epidemic Techniques Chiu Wah So (Kelvin). Database Replication Why do we replicate database? – Low latency – High availability To achieve strong (sequential)
1 Chapter 13: Representing Identity What is identity Different contexts, environments Pseudonymity and anonymity.
Wide-area cooperative storage with CFS
P2P Course, Structured systems 1 Introduction (26/10/05)
Ad Hoc Mobility Management with Uniform Quorum Systems.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.
VLAN Trunking Protocol (VTP) W.lilakiatsakun. VLAN Management Challenge (1) It is not difficult to add new VLAN for a small network.
Cache Updates in a Peer-to-Peer Network of Mobile Agents Elias Leontiadis Vassilios V. Dimakopoulos Evaggelia Pitoura Department of Computer Science University.
Managing DHCP. 2 DHCP Overview Is a protocol that allows client computers to automatically receive an IP address and TCP/IP settings from a Server Reduces.
EPIDEMIC TECHNIQUES Ki Suh Lee. OUTLINE Epidemic Protocol Epidemic Algorithms for Replicated Database Maintenance Astrolabe: A Robust and scalable technology.
Epidemic Algorithms for replicated Database maintenance Alan Demers et al Xerox Palo Alto Research Center, PODC 87 Presented by: Harshit Dokania.
Communication (II) Chapter 4
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
Collaborative Content Delivery Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University A peer-to-peer solution for.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
Practical Byzantine Fault Tolerance
MobiQuitous 2007 Towards Scalable and Robust Service Discovery in Ubiquitous Computing Environments via Multi-hop Clustering Wei Gao.
Routing and Routing Protocols
Hwajung Lee. A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types of groups:
Cisco Systems Networking Academy S2 C 11 Routing Basics.
TELE202 Lecture 6 Routing in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »Packet switching in Wide Area Networks »Source: chapter 10 ¥This Lecture.
1 Gossip-Based Ad Hoc Routing Zygmunt J. Haas, Joseph Halpern, LiLi Cornell University Presented By Charuka Silva.
Distance Vector Routing
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Exploration 3 Chapter 4. What is VTP? VTP allows a network manager to configure a switch so that it will propagate VLAN configurations to other switches.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Planning, Configuring, And Troubleshooting WINS.
Gossip-based Data Dissemination
Providing Secure Storage on the Internet
Dynamic Routing and OSPF
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Last Class: Web Caching
Overview Multimedia: The Role of WINS in the Network Infrastructure
Presentation transcript:

Epidemic Techniques Algorithms and Implementations

Agenda Consistency issues Epidemic algorithms Astrolabe Conclusion

Databases replicated at many sites need to maintain consistency Relaxed consistency problem: –Database is changed at one site –Change must propagate to all other sites –All copies must eventually agree –Copies should be mostly current Important factors –Propagation time –Network traffic (ideally proportional to Size of the update X Number of servers)

Epidemic algorithms help spread updates and maintain consistency Epidemic terminology –A site with an update it is willing to share is infective –A site which has yet to receive an update is susceptible –A site with an update it is no longer willing to share is removed Removed Infective Susceptible

Agenda Consistency issues Epidemic algorithms Astrolabe Conclusion

Xerox has three algorithms to create database consistency Direct mail –Updates are mailed from originating site to all other sites Anti-entropy (epidemic) –All sites regularly chose other sites and exchange database contents Rumor mongering (epidemic) –Updates become “hot rumors” which are periodically sent to other sites until most sites contacted are “infective”

Direct mail is almost, but not completely reliable Queues –Updates are queued to prevent delays –Queue located in stable storage Failures –Queues overflow –Destinations inaccessible for long periods of time –Source lacks accurate knowledge of all other sites Traffic –n messages per update –Each message traverses all links from source to destination –Traffic proportional to the number of sites X distance between sites UpdateQueu Server n Server 1 …

Anti-entropy is reliable but costly Site A choose site B at random The databases are compared –Pull: A gets database from B –Push: A sends database to B –Push-pull: A and B exchange databases When used as backup pull or push-pull is preferable AB AB AB Pull Push Push-pull

Checksums can be used with anti- entropy to improve performance Comparing databases is expensive A recent update list can be kept Recent updates are exchanged Updates applied Checksums of database contents exchanged Databases compared only if checksums disagree

Rumor mongering is less costly but can be inconsistent n individuals initially susceptible Rumor planted making A infective A contacts others at random to share the rumor Everyone who hears the rumor becomes infective When A unnecessarily contacts someone A will become inactive (removed) with probability 1/k Increasing k insures almost everyone will hear the rumor A

Some variations on rumor mongering exist Blind vs. Feedback –Feedback can tell when a recipient has already heard a rumor –Blind stops spreading the rumor with probability 1/k regardless of whether recipient has already heard the rumor Counter vs. Coin –Coin loses interest with probability 1/k –Counter loses interest after k unnecessary contacts Simulations indicate that counter and feedback used in combination have the least delay

Pull performs better when updates are frequent Push vs. pull –Up until now, have assumed that updates are pushed –When a database has a high rate of rumor injection Pull more likely to find non-empty rumor lists –When database is mostly quiescent Push will cease to introduce traffic –Choice is based on the rate of updates –Connection limits help push but hinder pull

Rumor mongering a better choice when using anti-entropy as backup What happens when anti-entropy detects inconsistency? –Nothing. Anti-entropy makes the databases consistent Ok when only a few sites were missed –Update redistributed Better in the event of a complete failure Worst case: distribution reached half the sites Direct mailRumor mongering

Deletion is more complicated than simply removing a file In anti-entropy and rumor mongering an absent file will be replaced by an old version Solution –File replaced with death certificate –Death certificates spread removing old copies of deleted items –When and how do death certificates get deleted? Death Certificate A File A

Death certificates become dormant but can be resurrected Death certificates are stamped with two timestamps T1 and T2 When T1 is reached, most servers delete the certificate Servers on death certificate’s retention site list keep a dormant copy Dormant copies discarded when T1+T2 is reached Dormant death certificates are resurrected if an obsolete copy of the data is encountered T1- 1:00 T2- 2:00 Retention List- A, B, D Death certificate Current time – 12:00 T1- 1:00 T2- 2:00 Retention List- A, B, D Dormant certificate kept on A, B, and D Current time – 1:00 T1- 1:00 T2- 2:00 Retention List- A, B, D All certificates deleted Current time – 2:00 X

Timestamps tricky when reactivating a death certificate Setting the timestamp forward to current clock value reactivates the death certificate Problem: legitimate updates made between death certificate and current time will be erased erroneously An activation timestamp must be added to prevent the deletion of changes more current than the death certificate Activation time – 1:00 T1- 1:00 T2- 2:00 Retention List- A, B, D Death certificate Current time – 12:00 Activation time – 1:00 T1- 1:00 T2- 2:00 Retention List- A, B, D Dormant certificate kept on A, B, and D Current time – 1:00 Activation time – 1:00 T1- 3:00 T2- 4:00 Retention List- A, B, D Certificate reactivated Current time – 2:00

Distance between nodes can effect traffic overhead Updates cost less to send when the source and destination are close Assume a worst case linear network Nearest neighbor selection results in high convergence time –Links per cycle would be O(1) –O(n) cycles would be needed Uniform random connections result in high traffic overhead –Average connection time of O(n) –Convergence O(log n) –Traffic per link per cycle is O(n) Nonuniform distribution reduces traffic and has acceptable convergence time 12n …

Spatial distribution can improve traffic in anti-entropy Each site builds a list of sites sorted by distance An anti-entropy exchange partner is selected from the list according to some function f(i) = i -a Spatial distribution significantly reduces traffic on critical links Convergence time is not significantly worse with a higher spatial distribution

Push and pull rumors more sensitive to spatial distribution There is a high probability that S and T will chose each other If update introduced at S or T, will be pushed to the other Rumor will eventually die without reaching all other nodes S T U1U1 U2U2 UmUm …

Xerox chose to implement randomized anti-entropy algorithm Anti-entropy guarantees consistency Well chosen spatial distribution algorithm reduced link traffic by factor of 4 and critical link traffic by 30 Xerox experienced improvement in consistency and network traffic overhead with implementation

Agenda Consistency issues Epidemic algorithms Astrolabe Conclusion

Astrolabe provides fast, dynamic mgmt of large stores of information DNS A directory service Organizes machines into domains Associates attributes with each domain Designed to map domain names to IP addresses and mail servers Changes rare Updates are slow to propagate Astrolabe An information management service Organizes resources into a hierarchy of zones, like domains Attributes associated with each zone Zones not bound to specific servers Attributes can be very dynamic Updates propagate quickly

Astrolabe can be used in p2p systems to cache large objects Problem –Infeasible to keep large objects on a central database and copy on every access –Load time and network load too high Solution –Store copies on different hosts –Use Astrolabe to find a nearby, fresh copy A A A

Astrolabe strives to satisfy four basic principles Scalability through hierarchy –Maintains consistent overhead Flexibility through mobile code –SQL queries allow different applications to communicate Robustness through a randomized peer-to-peer protocol –Communicate by running a process on each host –Epidemic protocol used Security through certificates –Digital signatures used to allow or deny access to data, operations, etc.

Zone hierarchy makes Astrolabe scalable A zone is –A host or a set of non- overlapping zones (no hosts in common) Tree structure –Leaves are hosts –Each zone (except root) has a local zone identifier –Each zone has an attribute list (MIB) –Attributes are generated by aggregation functions, summary of children’s attributes –Leaf zones have writable virtual child zones used to populate attributes for that zone MIB Zone Host

Aggregate functions are used to query the tree Aggregate functions summarize and are bounded in size Aggregate functions are programmable Code embedded in time- stamped aggregate function certificates (AFCs) AFCs stored as attributes in MIBs For every zone an agent is in, it scans hosts looking for children’s attributes, then aggregates results Zones learn about other zones through gossip protocol Applications invoke Astrolabe through calls to library AFC

Agents on each host maintain a database of the zone hierarchy Astrolabe agent runs on each host Each agent stores a subset of MIBs in the Astrolabe tree –A copy of root MIB –A copy of all MIBs of the root’s children For each level a list of child zones (and attributes) is kept along with which child represents its own zone Asia Europe USA self Cornell MIT self pc1 pc2 pc3 self system inventory monitor self pc4

Gossiping is an epidemic protocol used to propagate information Cornell pc1 inventorymonitorsystem pc2 inventorymonitorsystem pc3 inventorymonitorsystem pc4 inventorymonitorsystem Periodically, an agent selects a zone in which to gossip Agent picks some child at random (other than its own) within that zone to gossip with Agent sends chosen child the id, rep, and issued attributes of all MIBs of all children at that level and up to the root Recipient can then tell which entries are out of date Updates are passed back and forth Note: timestamps can be compared only if the attribute is issued by the same rep pc1pc2 pc3pc4 Cornell

Astrolabe allows members to be added or removed Member removal Each MIB knows which rep (agent) created it and when it was last updated When an agent has not seen an update for some zone from a rep for time T fail, the MIB is removed When the last MIB for a zone is removed, the zone is also removed Member integration IP multicast sets up initial contact When two trees join, each tree multicasts a gossip message at a fixed rate Broadcasting gossip on local LAN is also used Astrolabe agents maintain a set of relatives who should be contacted on occasion

Certificates are used to guarantee security Each zone is allowed to override the security requirements of his parent zone –Control zone creation, gossip rate, failure detection time-outs, introducing new AFCs, etc. Each zone has a Certificate Authority (CA) which issues certificates for that zone –Zone certificate: binds zoneID to its public key –MIB certificate: gossiped with zone certificate to propagate data between hosts –Aggregate function certificate (AFC): contains code and other info for aggregation functions. Agent will only install AFCs issued by ancestor zones or by one of their clients. –Client certificate: authenticates a client. Astrolabe agents do not maintain a client database for scalability. If an ancestor signs the client certificate with its CA key, the client is trusted.

An AFC is introduced into the system through the virtual children AFCs can be introduced by adding an attribute to the virtual child zone The agent will automatically evaluate the attribute AFCs can propagate by copying into the parent MIBs until they reach the root Adoption is used to propagate back down the tree –Agents scan ancestor for new attributes –New AFCs automatically copied For garbage collection, an expiration time can be specified Cornell pc1 inventorymonitorsystem pc2 inventorymonitorsystem pc3 inventorymonitorsystem pc4 inventorymonitorsystemnew

An AFC must meet certain security requirements to propagate AFC must be signed by ancestor zone, or a client of that zone –A client must have permission to propagate AFC cannot have expired The name of the AFC attribute and the category attribute must match –Prevents a malicious client from introducing an AFC for a purpose other than advertised Cornell pc1 inventorymonitorsystem pc2 inventorymonitorsystem pc3 inventorymonitorsystem pc4 inventorymonitorsystemnew

Experiments demonstrate Astrolabe’s scalability Branch factor increases –A higher branching factor leads to larger messages and more traffic –Astrolabe remains scalable even with a high branch factor Loss rates –A higher loss rate does not seriously affect scalability –Due to the randomization algorithm

Agenda Consistency issues Epidemic algorithms Astrolabe Conclusion

Scalability through hierarchy –Zones enable scalability Flexibility through mobile code –AFCs can be generated by one agent and the propagated throughout to learn the attributes on a variety of hosts Robustness through a randomized p2p protocol –Zones select other zones at random and propagate MIB of least common ancestor –Guarantees changes will eventually reach the entire system Security through certificates –Certificates authenticate every level of communication Conclusion: Astrolabe is a scalable, robust system which allows changes to propagate quickly and guarantees eventual consistency

Backup

Astrolabe improves upon several previous systems