Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University.

Slides:



Advertisements
Similar presentations
Ranveer Chandra Ramasubramanian Venugopalan Ken Birman
Advertisements

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Epidemic Techniques Algorithms and Implementations.
Using Gossip to Build Scalable Services Ken Birman, CS514 Dept. of Computer Science Cornell University.
15-441: Computer Networking Lecture 26: Networking Future.
Future Usage Environments & Systems Integration November 16 th 2004 HCMDSS planning workshop Douglas C. Schmidt (moderator) David Forslund, Cognition Group.
Application Layer Anycasting: A Server Selection Architecture and Use in a Replicated Web Service Presented in by Jayanthkumar Kannan On 11/26/03.
Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.
After the Internet Ken Birman Professor, Dept. of Computer Science Cornell University.
1 AINA 2006 Wien, April th 2006 DiVES: A DISTRIBUTED SUPPORT FOR NETWORKED VIRTUAL ENVIRONMENTS The IEEE 20th International Conference on Advanced.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Astrolabe Serge Kreiker. Problem Need to manage large collections of distributed resources (Scalable system) The computers may be co-located in a room,
Implementation of a Tapestry Node: The main components: The core router, utilizes the routing and object reference tables to handle messages, The node.
Hermes: A Distributed Event- Based Middleware Architecture Peter Pietzuch and Jean Bacon 1st DEBS Workshop, Vienna,
Ken Birman Cornell University. CS5410 Fall
Epidemic Techniques Chiu Wah So (Kelvin). Database Replication Why do we replicate database? – Low latency – High availability To achieve strong (sequential)
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
Navigating in the Dark: New Options for Building Self- Configuring Embedded Systems Ken Birman Cornell University.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
An Active Reliable Multicast Framework for the Grids M. Maimour & C. Pham ICCS 2002, Amsterdam Network Support and Services for Computational Grids Sunday,
Distributed Publish/Subscribe Network Presented by: Yu-Ling Chang.
EPIDEMIC TECHNIQUES Ki Suh Lee. OUTLINE Epidemic Protocol Epidemic Algorithms for Replicated Database Maintenance Astrolabe: A Robust and scalable technology.
Self-Organizing Adaptive Networks Hari Balakrishnan MIT Laboratory for Computer Science
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Communication (II) Chapter 4
Probabilistic Broadcast Presented by Keren Censor 1.
Collaborative Content Delivery Werner Vogels Robbert van Renesse, Ken Birman Dept. of Computer Science, Cornell University A peer-to-peer solution for.
Publisher Mobility in Distributed Publish/Subscribe Systems Vinod Muthusamy, Milenko Petrovic, Dapeng Gao, Hans-Arno Jacobsen University of Toronto June.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Gil EinzigerRoy Friedman Computer Science Department Technion.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Wireless Networks of Devices (WIND) Hari Balakrishnan and John Guttag MIT Lab for Computer Science NTT-MIT Meeting, January 2000.
Content-Based Routing in Mobile Ad Hoc Networks Milenko Petrovic, Vinod Muthusamy, Hans-Arno Jacobsen University of Toronto July 18, 2005 MobiQuitous 2005.
Reliable Distributed Systems Astrolabe. Massive scale. Constant flux Source: Burch and Cheswick The Internet.
Approved for Public Release, Distribution Unlimited QuickSilver: Middleware for Scalable Self-Regenerative Systems Cornell University Ken Birman, Johannes.
Let’s ChronoSync: Decentralized Dataset State Synchronization in Named Data Networking Zhenkai Zhu Alexander Afanasyev (presenter) Tuesday, October 8,
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
CSC 600 Internetworking with TCP/IP Unit 7: IPv6 (ch. 33) Dr. Cheer-Sun Yang Spring 2001.
1 Testbeds Breakout Tom Anderson Jeff Chase Doug Comer Brett Fleisch Frans Kaashoek Jay Lepreau Hank Levy Larry Peterson Mothy Roscoe Mehul Shah Ion Stoica.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman CS5412 Spring Lecture XIX.
Leiden; Dec 06Gossip-Based Networking Workshop1 Epidemic Algorithms and Emergent Shape Ken Birman.
Leiden; Dec 06Gossip-Based Networking Workshop1 Gossip Algorithms and Emergent Shape Ken Birman.
2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Peer-to-Peer Result Dissemination in High-Volume Data Filtering Shariq Rizvi and Paul Burstein CS 294-4: Peer-to-Peer Systems.
CS5412: BIMODAL MULTICAST ASTROLABE Ken Birman Gossip-Based Networking Workshop 1 Lecture XIX Leiden; Dec 06.
BIMODAL MULTICAST ASTROLABE Ken Birman 1 CS6410. Gossip  Recall from early in the semester that gossip spreads in log(system size) time  But is.
Introduction to Active Directory
TRUST Self-Organizing Systems Emin G ü n Sirer, Cornell University.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Tackling Challenges of Scale in Highly Available Computing Systems Ken Birman Dept. of Computer Science Cornell University.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Using Gossip to Build Scalable Services
Plethora: Infrastructure and System Design
CS5412: Bimodal Multicast Astrolabe
TRUST:Team for Research in Ubiquitous Secure Technologies
CS514: Intermediate Course in Operating Systems
ModelNet: A Large-Scale Network Emulator for Wireless Networks Priya Mahadevan, Ken Yocum, and Amin Vahdat Duke University, Goal:
Distributed Publish/Subscribe Network
CS5412: Using Gossip to Build Overlay Networks
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
CS5412: Bimodal Multicast Astrolabe
Presentation transcript:

Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University

Background ISIS, Horus, Ensemble systems –Strong properties (for replicated data) –Adaptive (changing network/app behavior) Problems… –as fast as slowest receiver –“Jim Gray effect” –no IP Multicast

New Direction Probabilistically Strong Guarantees –Randomized protocols Compartmentalization No reliance on IP multicast, clock sync Auto-configuration, self-repair  JBI

Three Main Components Astrolabe –Aggregation Service SelectCast –Dissemination Service Bimodal Multicast –End-to-end reliability

Aggregation Ability to summarize information from distributed sources. aka data fusion in sensor networks. The basis for scalability! Standard service in databases. Why not in distributed systems?

Examples Barrier Synchronization Voting Resource Location Multicast Routing F

Astrolabe Astrolabe takes continuous snapshots of the global state of a distributed system, and aggregates this information in user- specified ways.

Four Design Principles Scalability through Hierarchy Flexibility through Mobile SQL Robustness through p2p Gossip Security through Certificates

DNS-like Domain Hierarchy Attribute list Domains identified by path names

MIB Each domain has an attribute list called “MIB” (management information base). MIBs of internal domains generated by aggregating child domains’ MIBs.

Domain Table No servers for any domain: a MIB is replicated on all hosts in its domain! Each host maintains not only the MIBs of its own domains, but also those of its sibling domains. Sibling MIBs organized in “domain tables”.

Domain Table Example IDCONTACTSISSUEDNMEMBERSMIN(LOAD) dom T dom T dom T381.5 dom T4180.0

Aggregation idLoadWeblogic?SMTP?Word Version … swift falcon cardinal idLoadWeblogic?SMTP?Word Version … gazelle zebra gnu idMin Load WL contactSMTP contact domain domain domain Domain1 Domain2 SQL query “summarizes” data Dynamically changing query output is visible domain- wide (like spreadsheet)

Example queries –SELECT SUM(nmembers) AS nmembers –SELECT MAX(depth) + 1 AS depth –SELECT MIN(minl) AS minl (minimum load) –… Functions gossiped with everything else.

Aggregation NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris Domain1 Domain2

Aggregation NameLoadWeblogic?SMTP?Word Version … swift falcon cardinal NameLoadWeblogic?SMTP?Word Version … gazelle zebra gnu NameAvg Load WL contactSMTP contact SF NJ Paris Domain1 Domain2 O(log n) info per host

Other Examples 1.Which are the three lowest loaded hosts? 2.Which domains contain hosts with an out-of- date virus database? 3.Do >30% of hosts measure elevated radiation? 4.Which domains contain subscribers interested in some topic? 5.Where is the nearest logging server?

Epidemic or Gossip Protocols Used to keep domain tables up-to-date Randomized Communication between (nearby) hosts: –Fast (latency grows O(log n)) –Hard to stop (robust even in the face of Denial-of- Service attacks) –Probabilistically Real-Time guarantees on latency (based on epidemiological analysis).

How it works… IDCONTACTSISSUEDNMEMBERSMIN(LOAD) dom T dom T310.3 dom T480.0 IDCONTACTSISSUEDNMEMBERSMIN(LOAD) domA T domB T domC T721.5 domD T830.0 gossip SQL

SelectCast Disseminate messages through Astrolabe hierarchy (Application-level) Routers selected through domain aggregation: SELECT FIRST(3, routers) AS routers, MIN(minload) AS minload ORDER BY minload Exploit heterogeneity, don’t hide it!

Multicast Tree

Fault Masking

Filtering (Pub/Sub) SQL condition on each message For example: –MIN(version) < 3 –MAX(radiation) > 300 –OR(subject) // BLOOM FILTERS –TRUE Generalization of topic based publishing

Filtering Example

Scalability Latency, memory use, CPU load, load on network links, all grow O(log N), and independent of update rate. Highly robust to omission and crash failures. Confirmed by analysis, simulation, and experiment. O(1) lookup for most useful queries.

Emulab topology (U. Utah)

Experiments

Real vs. Simulation The real thingSimulation

Membership Domain failure detected when its attributes are no longer being updated. Domains discovered (and partitions repaired) through –gossip –occasional broadcast and multicast –configuration Special precautions for domains separated by firewalls and NAT boxes

Security Integrated PKI –integrity, no confidentiality –prevents “Sybil” Attacks Remove outliers –Summarize in a robust way Compartmentalize –Exploit domain hierarchy

Bimodal Multicast Probabilistic end-to-end reliability Uses IP Multicast or SelectCast for initial dissemination Runs a background gossip protocol to do repairs of message loss Performance improves with scale –share buffering load

Work in Progress Evaluate Scalability and Performance –emulation, simulation, deployment Improve support for low power apps –self configuration Improve expressiveness –pattern matching