Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Data-Centric Reconfiguration with Network-Attached Disks Alex Shraer (Technion) Joint work with: J.P. Martin, D. Malkhi, M. K. Aguilera (MSR) I. Keidar.
Lab 2 Group Communication Andreas Larsson
Distributed Processing, Client/Server, and Clusters
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
A Dependable Auction System: Architecture and an Implementation Framework
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Algorithm for Virtually Synchronous Group Communication Idit Keidar, Roger Khazan MIT Lab for Computer Science Theory of Distributed Systems Group.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Transis Efficient Message Ordering in Dynamic Networks PODC 1996 talk slides Idit Keidar and Danny Dolev The Hebrew University Transis Project.
2/18/2004 Challenges in Building Internet Services February 18, 2004.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
Group Communication Phuong Hoai Ha & Yi Zhang Introduction to Lab. assignments March 24 th, 2004.
Filterfresh Fault-tolerant Java Servers Through Active Replication Arash Baratloo
1 On-Demand Multicast Routing and Its Applications.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group The main part of this talk is.
Abstractions for Fault-Tolerant Distributed Computing Idit Keidar MIT LCS.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
1 Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group Paradigms for Building Distributed Systems: Performance Measurements and.
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT.
Optimistic Virtual Synchrony Jeremy Sussman - IBM T.J.Watson Idit Keidar – MIT LCS Keith Marzullo – UCSD CS Dept.
Transis 1 Fault Tolerant Video-On-Demand Services Tal Anker, Danny Dolev, Idit Keidar, The Transis Project.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Evaluating the Running Time of a Communication Round over the Internet Omar Bakr Idit Keidar MIT MIT/Technion PODC 2002.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University.
SPREAD TOOLKIT High performance messaging middleware Presented by Sayantam Dey Vipin Mehta.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
1 MMORPG Servers. 2 MMORPGs Features Avatar Avatar Levels Levels RPG Elements RPG Elements Mission Mission Chatting Chatting Society & Community Society.
Dealing with open groups The view of a process is its current knowledge of the membership. It is important that all processes have identical views. Inconsistent.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
December 4, 2002 CDS&N Lab., ICU Dukyun Nam The implementation of video distribution application using mobile group communication ICE 798 Wireless Mobile.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Introduction to Networking
1 Communication and Data Management in Dynamic Distributed Systems Nancy Lynch MIT June 20, 2002 …
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
Fault Tolerance (2). Topics r Reliable Group Communication.
1 Reliable Group Communication: a Mathematical Approach Nancy Lynch Theory of Distributed Systems MIT LCS Kansai chapter, IEEE July 7, 2000 GC …
NTT - MIT Research Collaboration — Bi-Annual Report, July 1—December 31, 1999 MIT : Cooperative Computing in Dynamic Environments Nancy Lynch, Idit.
1 Compositional Design and Analysis of Timing-Based Distributed Algorithms Nancy Lynch Theory of Distributed Systems MIT Third MURI Workshop Washington,
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Context-Aware Middleware for Resource Management in the Wireless Internet US Lab 신현정.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Replication & Fault Tolerance CONARD JAMES B. FARAON
Algorithm for Virtually Synchronous Group Communication
Network Load Balancing
Replication Middleware for Cloud Based Storage Service
CLUSTER COMPUTING.
Active replication for fault tolerance
Group Service in CORBA Xing Gang Supervisor: Prof. Michael R. Lyu
In-network computation
Presentation transcript:

Scalable Group Communication for the Internet Idit Keidar MIT Lab for Computer Science Theory of Distributed Systems Group

Modern Distributed Applications (in WANs) Highly available servers –Web –Video-on-Demand Collaborative computing –Shared white-board, shared editor, etc. –Military command and control –On-line strategy games Stock market

Important Issues in Building Distributed Applications Consistency of view –Same picture of game, same shared file Fault tolerance, high availability Performance –Conflicts with consistency? Scalability –Topology - WAN, long unpredictable delays –Number of participants

Generic Primitives - Middleware, “Building Blocks” E.g., total order, group communication Abstract away difficulties, e.g., –Total order - a basis for replication –Mask failures Important issues: –Well specified semantics - complete –Performance

Research Approach Rigorous modeling, specification, proofs, performance analysis Implementation and performance tuning Services  Applications Specific examples  General observations

G Send(G) Group Communication Group abstraction - a group of processes is one logical entity Dynamic Groups (join, leave, crash) Systems: Ensemble, Horus, ISIS, Newtop, Psync, Sphynx, Relacs, RMP, Totem, Transis

Example: Highly Available VoD [ Anker, Dolev, Keidar ICDCS1999] Dynamic set of servers Clients talk to “abstract” service Server can crash, client shouldn’t know

VoD Service: Exploiting Group Communication Group abstraction for connection establishment and transparent migration (with simple clients) Membership services detect conditions for migration - fault tolerance and load balancing Reliable group multicast among servers for consistently sharing information Reliable messages for control Server: ~2500 C++ lines –All fault tolerance logic at server

A Scalable Architecture for Group Membership in WANs [ Anker, Chockler, Dolev, Keidar] Dedicated distributed membership servers “divide and conquer” –Servers involved only in membership changes –Members communicate with each other directly (implement “virtual synchrony”) Two levels of membership –Notification Service NSView - “who is around” –Agreed membership views

Architecture NSView: "Who is around" failure/join/leave Agreed View: Members set and identifier Notification Service (NS) Membership {A,B,C,D,E},7 Notification Service (NS) Membership {A,B,C,D,E},7

Moshe: A Group Membership Algorithm for WANs Idit Keidar, Jeremy Sussman Keith Marzullo, Danny Dolev ICDCS 2000

Membership in WAN: the Challenge Message latency is large and unpredictable Frequent message loss è Time-out failure detection is inaccurate è We use a notification service (NS) for WANs è Number of communication rounds matters è Algorithms may change views frequently è View changes require communication for state transfer, which is costly in WAN

Moshe Designed for WANs from the ground up –Previous systems emerged from LAN Avoids delivery of “obsolete” views –Views that are known to be changing –Not always terminating (but NS is) Runs in a single round (“typically”)

Experimenting with Moshe Run over the Internet –In the US: MIT, Cornell (CU), UCSD –In Taiwan: NTU –In Israel: HUJI Run for 10 days (~10,000 times) in one configuration, 2.5 days in another 10 clients at each location –continuously join/leave 10 groups

Moshe Features Avoiding obsolete views A single round –98% of the time in one configuration –99.8% of the time in another Using a notification service for WANs –Good abstraction –Flexibility to configure multiple ways –Future work: configure more ways Scalable “divide and conquer” architecture

Retrospective: Role of Theory Specification –Possible to implement –Useful for applications (composable) Specification can be met in one round “typically” (unlike Consensus) Correctness proof exposes subtleties –Need to avoid live-lock –Two types of detection mechanisms needed

Future Work: The QoS Challenge Some distributed applications require QoS –Guaranteed available bandwidth –Bounded delay, bounded jitter Membership algorithm terminates in one round under certain circumstances –Can we leverage on that to guarantee QoS under certain assumptions? Can other primitives guarantee QoS?