Leiden; Dec 06Gossip-Based Networking Workshop1 Gossip Algorithms and Emergent Shape Ken Birman.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.
Ýmir Vigfússon IBM Research Haifa Labs Ken Birman Cornell University Qi Huang Cornell University Deepak Nataraj Cornell University.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Small-world Overlay P2P Network
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Using Gossip to Build Scalable Services Ken Birman, CS514 Dept. of Computer Science Cornell University.
Future Usage Environments & Systems Integration November 16 th 2004 HCMDSS planning workshop Douglas C. Schmidt (moderator) David Forslund, Cognition Group.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.
Positive Feedback Loops in DHTs or Be Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
CS5412: ADAPTIVE OVERLAYS Ken Birman 1 CS5412 Spring 2012 (Cloud Computing: Birman) Lecture V.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Fixing the Embarrassing Slowness of OpenDHT on PlanetLab Sean Rhea, Byung-Gon Chun, John Kubiatowicz, and Scott Shenker UC Berkeley (and now MIT) December.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Ken Birman Cornell University. CS5410 Fall
Epidemic Techniques Chiu Wah So (Kelvin). Database Replication Why do we replicate database? – Low latency – High availability To achieve strong (sequential)
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.
P2P Course, Structured systems 1 Introduction (26/10/05)
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Spanning Tree and Multicast. The Story So Far Switched ethernet is good – Besides switching needed to join even multiple classical ethernet networks Routing.
Multicast Communication Multicast is the delivery of a message to a group of receivers simultaneously in a single transmission from the source – The source.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
EPIDEMIC TECHNIQUES Ki Suh Lee. OUTLINE Epidemic Protocol Epidemic Algorithms for Replicated Database Maintenance Astrolabe: A Robust and scalable technology.
Communication (II) Chapter 4
COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.
Collaborative Content Delivery Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University A peer-to-peer solution for.
Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Approved for Public Release, Distribution Unlimited QuickSilver: Middleware for Scalable Self-Regenerative Systems Cornell University Ken Birman, Johannes.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman CS5412 Spring Lecture XIX.
Leiden; Dec 06Gossip-Based Networking Workshop1 Epidemic Algorithms and Emergent Shape Ken Birman.
Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman Gossip-Based Networking Workshop 1 Lecture XIX Leiden; Dec 06.
BIMODAL MULTICAST ASTROLABE Ken Birman 1 CS6410. Gossip  Recall from early in the semester that gossip spreads in log(system size) time  But is.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Tackling Challenges of Scale in Highly Available Computing Systems Ken Birman Dept. of Computer Science Cornell University.
Topologies and behavioral properties of the network Yvon Kermarrec Based on tml.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
School of Computing Clemson University Fall, 2012
Using Gossip to Build Scalable Services
Intra-Domain Routing Jacob Strauss September 14, 2006.
CS5412: Bimodal Multicast Astrolabe
TRUST:Team for Research in Ubiquitous Secure Technologies
CS514: Intermediate Course in Operating Systems
COS 561: Advanced Computer Networks
CS5412: Bimodal Multicast Astrolabe
Presentation transcript:

Leiden; Dec 06Gossip-Based Networking Workshop1 Gossip Algorithms and Emergent Shape Ken Birman

Leiden; Dec 06Gossip-Based Networking Workshop2 On Gossip and Shape Why is gossip interesting? Scalability…. Protocols are highly symmetric Although latency often forces us to bias them Powerful convergence properties… Especially in support of epidemics New forms of consistency… Probabilistic… but this is often adequate

Leiden; Dec 06Gossip-Based Networking Workshop3 Consistency Captures intuition that if A and B compare their states, no contradiction is evident In systems with “logical” consistency, we say things like “A’s history and B’s are closed under causality, and A is a prefix of B” With probabilistic systems we seek rapidly decreasing probability (as time elapses) that A knows “x” but B doesn’t …. Probabilistic convergent consistency

Leiden; Dec 06Gossip-Based Networking Workshop4 Exponential convergence A subclass of convergence behaviors Not all gossip protocols offer exponential convergence But epidemic protocols do have this property… and many gossip protocols implement epidemics

Leiden; Dec 06Gossip-Based Networking Workshop5 Value of exponential convergence An exponentially convergent protocol overwhelms mishaps and even attacks Requires that new information reach relevant nodes with at most log(N) delay Can talk of “probability 1.0” outcomes Even model simplifications (such as idealized network) are washed away! Predictions rarely “off” by more than a constant A rarity: a theory relevant to practice!

Leiden; Dec 06Gossip-Based Networking Workshop6 Convergent consistency To illustrate our point, contrast Cornell’s Kelips system with MIT’s Chord Kelips is convergent. Chord isn’t

Leiden; Dec 06Gossip-Based Networking Workshop7 Kelips (Linga, Gupta, Birman) Take a a collection of “nodes”

Leiden; Dec 06Gossip-Based Networking Workshop8 Kelips N  N members per affinity group Map nodes to affinity groups Affinity Groups: peer membership thru consistent hash

Leiden; Dec 06Gossip-Based Networking Workshop9 Kelips Affinity Groups: peer membership thru consistent hash 1N  Affinity group pointers N members per affinity group idhbeatrtt ms ms Affinity group view 110 knows about other members – 230, 30…

Leiden; Dec 06Gossip-Based Networking Workshop10 Affinity Groups: peer membership thru consistent hash Kelips N  Contact pointers N members per affinity group idhbeatrtt ms ms Affinity group view groupcontactNode …… 2202 Contacts 202 is a “contact” for 110 in group 2

Leiden; Dec 06Gossip-Based Networking Workshop11 Affinity Groups: peer membership thru consistent hash Kelips N  Gossip protocol replicates data cheaply N members per affinity group idhbeatrtt ms ms Affinity group view groupcontactNode …… 2202 Contacts resourceinfo …… cnn.com110 Resource Tuples “cnn.com” maps to group 2. So 110 tells group 2 to “route” inquiries about cnn.com to it.

Leiden; Dec 06Gossip-Based Networking Workshop12 How it works Kelips is entirely gossip based! Gossip about membership Gossip to replicate and repair data Gossip about “last heard from” time used to discard failed nodes Gossip “channel” uses fixed bandwidth … fixed rate, packets of limited size

Leiden; Dec 06Gossip-Based Networking Workshop13 How it works Heuristic: periodically ping contacts to check liveness, RTT… swap so-so ones for better ones. Node 102 Gossip data stream Hmm…Node 19 looks like a much better contact in affinity group RTT: 235ms RTT: 6 ms Node 175 is a contact for Node 102 in some affinity group

Leiden; Dec 06Gossip-Based Networking Workshop14 Work in progress… Prakash Linga is extending Kelips to support multi-dimensional indexing, range queries, self-rebalancing Kelips has limited incoming “info rate” Behavior when the limit is continuously exceeded is not well understood. Will also study this phenomenon

Leiden; Dec 06Gossip-Based Networking Workshop15 Replication makes it robust Kelips should work even during disruptive episodes After all, tuples are replicated to  N nodes Query k nodes concurrently to overcome isolated crashes, also reduces risk that very recent data could be missed … we often overlook importance of showing that systems work while recovering from a disruption

Leiden; Dec 06Gossip-Based Networking Workshop16 Chord (MIT group) The MacDonald’s of DHTs A data structure mapped to a network Ring of nodes (hashed id’s) Superimposed binary lookup trees Other cached “hints” for fast lookups Chord is not convergently consistent

Leiden; Dec 06Gossip-Based Networking Workshop17 Chord picture Cached link Finger links

Leiden; Dec 06Gossip-Based Networking Workshop18 Chord picture Europe USA Transient Network Partition

Leiden; Dec 06Gossip-Based Networking Workshop19 … so, who cares? Chord lookups can fail… and it suffers from high overheads when nodes churn Loads surge just when things are already disrupted… quite often, because of loads And can’t predict how long Chord might remain disrupted once it gets that way Worst case scenario: Chord can become inconsistent and stay that way The Fine Print The scenario you have been shown is of low probability. In all likelihood, Chord would repair itself after any partitioning failure that might really arise. Caveat emptor and all that.

Leiden; Dec 06Gossip-Based Networking Workshop20 Saved by gossip! Epidemic gossip: remedy for what ails Chord! c.f. Epichord (Liskov), Bambou Key insight: Gossip based DHTs, if correctly designed, are self-stabilizing!

Leiden; Dec 06Gossip-Based Networking Workshop21 Connection to self-stabilization Self-stabilization theory Describe a system and a desired property Assume a failure in which code remains correct but node states are corrupted Proof obligation: show that property is reestablished within bounded time But doesn’t bound badness when transient disruption is occuring

Leiden; Dec 06Gossip-Based Networking Workshop22 Beyond self-stabilization Tardos poses a related problem Consider behavior of the system while an endless sequence of disruptive events occurs System never reaches a quiescent state Under what conditions will it still behave correctly? Results of form “if disruptions satisfy  then correctness property is continuously satisfied” Hypothesis: with convergent consistency we may be able to develop a proof framework for systems that are continuously safe.

Leiden; Dec 06Gossip-Based Networking Workshop23 Let’s look at a second example Astrolabe system uses a different emergent data structure – a tree Nodes are given an initial location – each knows its “leaf domain” Inner nodes are elected using gossip and aggregation

Astrolabe Intended as help for applications adrift in a sea of information Structure emerges from a randomized gossip protocol This approach is robust and scalable even under stress that cripples traditional systems Developed at RNS, Cornell By Robbert van Renesse, with many others helping… Today used extensively within Amazon.com

Leiden; Dec 06Gossip-Based Networking Workshop25 Astrolabe is a flexible monitoring overlay NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu Periodically, pull data from monitored systems NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal

Leiden; Dec 06Gossip-Based Networking Workshop26 Astrolabe in a single domain Each node owns a single tuple, like the management information base (MIB) Nodes discover one-another through a simple broadcast scheme (“anyone out there?”) and gossip about membership Nodes also keep replicas of one-another’s rows Periodically (uniformly at random) merge your state with some else…

Leiden; Dec 06Gossip-Based Networking Workshop27 State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu

Leiden; Dec 06Gossip-Based Networking Workshop28 State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu swift cardinal

Leiden; Dec 06Gossip-Based Networking Workshop29 State Merge: Core of Astrolabe epidemic NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic?SMTP?Word Versi on swift falcon cardinal swift.cs.cornell.edu cardinal.cs.cornell.edu

Leiden; Dec 06Gossip-Based Networking Workshop30 Observations Merge protocol has constant cost One message sent, received (on avg) per unit time. The data changes slowly, so no need to run it quickly – we usually run it every five seconds or so Information spreads in O(log N) time But this assumes bounded region size In Astrolabe, we limit them to rows

Leiden; Dec 06Gossip-Based Networking Workshop31 Big systems… A big system could have many regions Looks like a pile of spreadsheets A node only replicates data from its neighbors within its own region

Leiden; Dec 06Gossip-Based Networking Workshop32 Scaling up… and up… With a stack of domains, we don’t want every system to “see” every domain Cost would be huge So instead, we’ll see a summary NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal cardinal.cs.cornell.edu NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal NameTimeLoadWeblogic ? SMTP?Word Version swift falcon cardinal

Leiden; Dec 06Gossip-Based Networking Workshop33 NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey SQL query “summarizes” data Dynamically changing query output is visible system-wide NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris

Leiden; Dec 06Gossip-Based Networking Workshop34 Large scale: “fake” regions These are Computed by queries that summarize a whole region as a single row Gossiped in a read-only manner within a leaf region But who runs the gossip? Each region elects “k” members to run gossip at the next level up. Can play with selection criteria and “k”

Leiden; Dec 06Gossip-Based Networking Workshop35 Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey Yellow leaf node “sees” its neighbors and the domains on the path to the root. Falcon runs level 2 epidemic because it has lowest load Gnu runs level 2 epidemic because it has lowest load

Leiden; Dec 06Gossip-Based Networking Workshop36 Hierarchy is virtual… data is replicated NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris San Francisco New Jersey Green node sees different leaf domain but has a consistent view of the inner domain

Leiden; Dec 06Gossip-Based Networking Workshop37 Worst case load? A small number of nodes end up participating in O(log fanout N) epidemics Here the fanout is something like 50 In each epidemic, a message is sent and received roughly every 5 seconds We limit message size so even during periods of turbulence, no message can become huge.

Leiden; Dec 06Gossip-Based Networking Workshop38 Self-stabilization? Like Kelips, it seems that Astrolabe Is convergently consistent, self-stabilizing And would “ride out” a large class of possible failures But Astrolabe would be disrupted by Incorrect aggregation (Byzantine faults) Correlated failure of all representatives of some portion of the tree

Leiden; Dec 06Gossip-Based Networking Workshop39 Focus on emergent shape Kelips: Nodes start with a-priori assignment to affinity groups, end up with a superimposed pointer structure Astrolabe: Nodes start with a-priori leaf domain assignments, build the tree What other data structures can be constructed with emergent protocols?

Leiden; Dec 06Gossip-Based Networking Workshop40 Emergent shape We know a lot about a related question Given a connected graph, cost function Nodes have bounded degree Use a gossip protocol to swap links until some desired graph emerges Another related question Given a gossip overlay, improve it by selecting “better” links (usually, lower RTT) Example: The “Anthill” framework of Alberto Montresor, Ozalp Babaoglu, Hein Meling and Francesco Russo

Leiden; Dec 06Gossip-Based Networking Workshop41 Example of an open problem Given a description of a data structure (for example, a balanced tree) … design a gossip protocol such that the system will rapidly converge towards that structure even if disrupted Do it with bounced per-node message rates, sizes (network load less important) Use aggregation to test tree quality?

Leiden; Dec 06Gossip-Based Networking Workshop42 Van Renesse’s dreadful aggregation tree A B C D E F G H I J K L M N O P ACEGIKMO B F J N DL  An event e occurs at H P learns O(N) time units later! G gossips with H and learns e

Leiden; Dec 06Gossip-Based Networking Workshop43 What went wrong? In Robbert’s horrendous tree, each node has equal “work to do” but the information-space diameter is larger! Astrolabe benefits from “instant” knowledge because the epidemic at each level is run by someone elected from the level below

Leiden; Dec 06Gossip-Based Networking Workshop44 Insight: Two kinds of shape We’ve focused on the aggregation tree But in fact should also think about the information flow tree

Leiden; Dec 06Gossip-Based Networking Workshop45 Information space perspective Bad aggregation graph: diameter O(n) Astrolabe version: diameter  O(log(n)) H – G – E – F – B – A – C – D – L – K – I – J – N – M – O – P A – B C – D E – F G – H I – J K – L M – N O – P

Leiden; Dec 06Gossip-Based Networking Workshop46 Gossip and bias Often useful to “bias” gossip, particularly if some links are fast and others are very slow A C D F Z E Y B X Roughly half the gossip will cross this link! Demers: Shows how to adjust probabilities to even the load. Ziao later showed that must also fine-tune gossip rate

Leiden; Dec 06Gossip-Based Networking Workshop47 Gossip and bias Idea: “shaped” gossip probabilities Gravitational Gossip (Jenkins) Groups carry multicast event streams reporting evolution of a continuous function (like electric power loads) Some nodes want to watch “closely” while others only need part of the data

Leiden; Dec 06Gossip-Based Networking Workshop48 Gravitational Gossip ac b Inner ring: Nodes want 100% of the traffic Middle ring: Nodes want 75% of the traffic Outer ring: Nodes want 20% of the traffic Jenkins: When a gossips to b, includes information about topic t in a way weighted by b’s level of interest in topic t

Leiden; Dec 06Gossip-Based Networking Workshop49 Gravitational Gossip

Leiden; Dec 06Gossip-Based Networking Workshop50 How does bias impact information-flow graph? Earlier, all links were the same Now, some links carry Less information And may have longer delays (Bounded capacity messages: “similar” to long links?) Question: Model biased information flow graphs and explore implications

Leiden; Dec 06Gossip-Based Networking Workshop51 Questions about bias When does the biasing of gossip target selection break analytic results? Example: Alves and Hopcroft show that with fanout too small, gossip epidemics can die out, logically partitioning a system Question: Can we relate the question to flooding on an expander graph?

Leiden; Dec 06Gossip-Based Networking Workshop52 Can an overlay learn its shape? The unifying link across the many examples we’ve cited And needed in real world overlays when these adapt to underlying topology… For example, to pick “super-peers” Hints of a theory of adaptive tuning? Related to “sketches” in databases

Leiden; Dec 06Gossip-Based Networking Workshop53 Suppose we wanted to use gossip to implement decentralized balanced overlay trees (nodes “know” locations) Build some sort of overlay Randomly select initial upper-level nodes Jiggle until they are nicely spaced Instance of “facility location” problem Can an overlay learn its shape?

Leiden; Dec 06Gossip-Based Networking Workshop54 Emergent landmarks

Leiden; Dec 06Gossip-Based Networking Workshop55 Emergent landmarks Too many Too few

Leiden; Dec 06Gossip-Based Networking Workshop56 Emergent landmarks

Leiden; Dec 06Gossip-Based Networking Workshop57 Emergent landmarks

Leiden; Dec 06Gossip-Based Networking Workshop58 Emergent landmarks One per cluster Centroi d

Leiden; Dec 06Gossip-Based Networking Workshop59 2-D scenario: Gossip variant of Euclidian k-medians (a popular STOC topic) Nodes know position on a line, can gossip In log(N) time seek k equally spaced nodes Later, after disruption, reconverge in log(N) time NB: The facility placement problem is just the d-dimensional k-centroids problem K=3

Leiden; Dec 06Gossip-Based Networking Workshop60 Why is this hard? After k rounds, each node has only communicated with 2k other nodes In log(N) time, aggregate information has time to transit the whole graph, but with bounded bandwidth we don’t have time to send all data

Leiden; Dec 06Gossip-Based Networking Workshop61 Emergent shape problem Can a gossip system learn its own shape in a convergently consistent way? Start with centralized solutions draw from the optimization community. Then seek decentralized gossip variants that achieve convergently consistent approximations, (re)converge in log(N) rounds, and use bounded bandwidth

Leiden; Dec 06Gossip-Based Networking Workshop62 … more questions Notice that Astrolabe forces participants to agree on what the aggregation hierarchy should contain In effect, we need to “share interest” in the aggregation hierarchy This allows us to bound the size of messages (expected constant) and the rate (expected constant per epidemic) Optional content (time permitting)

Leiden; Dec 06Gossip-Based Networking Workshop63 The question Could we design a gossip-based system for “self-centered” state monitoring? Each node poses a query, Astrolabe style, on the state of the system We dynamically construct an overlay for each of these queries The “system” is the union of these overlays Optional content (time permitting)

Leiden; Dec 06Gossip-Based Networking Workshop64 Self-centered monitoring Optional content (time permitting)

Leiden; Dec 06Gossip-Based Networking Workshop65 Self-centered queries… Offhand, looks like a bad idea If everyone has an independent query And everyone is iid in all the obvious ways Than everyone must invest work proportional to the number of nodes monitored by each query If O(n) queries touch O(n) nodes, workload grows as O(n) and global load as O(n 2 ) Optional content (time permitting)

Leiden; Dec 06Gossip-Based Networking Workshop66 Aggregation … but in practice, it seems unlikely that queries would look this way More plausible is something Zipf-like A few queries look at broad state of system Most look at relatively few nodes And a small set of aggregates might be shared by the majority of queries Assuming this is so, can one build a scalable gossip overlay / monitoring infrastructure? Optional content (time permitting)

Leiden; Dec 06Gossip-Based Networking Workshop67 Summary of possible topics… Consistency models Necessary conditions for convergent consistency Self-stabilization Implications of bias Emergent structures Self-centered aggregation Bandwidth-limited systems; “sketches” Using aggregation to fine-tune a shape with constant costs