Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University.

Slides:



Advertisements
Similar presentations
Ranveer Chandra Ramasubramanian Venugopalan Ken Birman
Advertisements

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee.
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Primitives for Achieving Reliability 3035/GZ01 Networked Systems Kyle Jamieson Department of Computer Science University College London.
Experimental evaluation of TCP-L June 5, 2003 Stefan Alfredsson Karlstad University.
Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.
Reliable Group Communication Quanzeng You & Haoliang Wang.
Ranveer Chandra , Kenneth P. Birman Department of Computer Science
Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes.
6/3/ Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross-Layer Information Awareness CS495 – Spring 2005 Northwestern University.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor.
Ken Birman Cornell University. CS5410 Fall
Scalable Team Multicast in Wireless Ad hoc networks Exploiting Coordinated Motion Mario Gerla University of California, Los Angeles.
Satellite Multicast for Web Applications Hilmar Linder University of Salzburg/Austria.
1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Transport Protocols.
Random Access MAC for Efficient Broadcast Support in Ad Hoc Networks Ken Tang, Mario Gerla Computer Science Department University of California, Los Angeles.
CSE 561 – Multicast Applications David Wetherall Spring 2000.
Secure Data Communication in Mobile Ad Hoc Networks Authors: Panagiotis Papadimitratos and Zygmunt J Haas Presented by Sarah Casey Authors: Panagiotis.
QoS Management at Transport Layer V. Tsaoussidis and S. Wei Information Technology: Coding and Computing,2000. Proceedings. International Conference on,
E-ODMRP: Enhanced ODMRP with Motion Adaptive Refresh Soon Y. Oh, Joon-Sang Park, Mario Gerla Computer Science Dept. UCLA.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 20 Nov. 2.
Anonymous Gossip: Improving Multicast Reliability in Mobile Ad-Hoc Networks Ranveer Chandra (joint work with Venugopalan Ramasubramanian and Ken Birman)
Resilient Multicast Support for Continuous-Media Applications X. Xu, A. Myers, H. Zhang and R. Yavatkar CMU and Intel Corp NOSSDAV, 1997.
A Survey of Packet Loss Recovery Techniques for Streaming Audio Colin Perkins, Orion Hodson, and Vicky Hardman University College London IEEE Network.
MAC Reliable Broadcast in Ad Hoc Networks Ken Tang, Mario Gerla University of California, Los Angeles (ktang,
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host IP treats a computer as an endpoint of communication Best.
CIS 725 Wireless networks. Low bandwidth High error rates.
Ming-Yu Jiang and Wanjiun Liao,IEEE ICC 2002 Family ACK Tree (FAT): A New Reliable Multicast Protocol for Mobile Ad Hoc Networks. Speaker : Wilson Lai.
Qian Zhang Department of Computer Science HKUST Advanced Topics in Next- Generation Wireless Networks Transport Protocols in Ad hoc Networks.
A Randomized Error Recovery Algorithm for Reliable Multicast Zhen Xiao Ken Birman AT&T Labs – Research Cornell University.
1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance Hakim Weatherspoon Joint with Lakshmi Ganesh, Tudor.
A Simple Neighbor Discovery Procedure for Bluetooth Ad Hoc Networks Miklós Aurél Rónai and Eszter Kail GlobeCom 2003 Speaker: Chung-Hsien Hsu Presented.
Content-Based Routing in Mobile Ad Hoc Networks Milenko Petrovic, Vinod Muthusamy, Hans-Arno Jacobsen University of Toronto July 18, 2005 MobiQuitous 2005.
1 Impact of transmission errors on TCP performance (Nitin Vaidya)
TCP Lecture 13 November 13, TCP Background Transmission Control Protocol (TCP) TCP provides much of the functionality that IP lacks: reliable service.
Networked & Distributed Systems TCP/IP Transport Layer Protocols UDP and TCP University of Glamorgan.
Reliable MAC Layer Multicast in IEEE Wireless Networks Min-Te Sun, Lifei Huang, Anish Arora, Ten-Hwang Lai Department of Computer and Information.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
2007/1/15http:// Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito.
Lecture 4 Overview. Ethernet Data Link Layer protocol Ethernet (IEEE 802.3) is widely used Supported by a variety of physical layer implementations Multi-access.
Forward Error Correction vs. Active Retransmit Requests in Wireless Networks Robbert Haarman.
Page 1 The department of Information & Communications Engineering Dong-uk, kim A Survey of Packet Loss Recovery Techniques for Streaming.
Video Multicast over the Internet Presented by: Liang-Yuh Wu Lung-Yuan Wu Hao-Hsiang Ku 12 / 6 / 2001 Bell Lab. And Georgia Institute of Technologies IEEE.
Slide #1 Performance Evaluation of Routing Protocol for Low Power and Lossy Networks (RPL) draft-tripathi-roll-rpl-simulation-04 IETF Virtual Interim WG.
TCP OVER ADHOC NETWORK. TCP Basics TCP (Transmission Control Protocol) was designed to provide reliable end-to-end delivery of data over unreliable networks.
Ασύρματες και Κινητές Επικοινωνίες Ενότητα # 11: Mobile Transport Layer Διδάσκων: Βασίλειος Σύρης Τμήμα: Πληροφορικής.
CS5248 Student Presentation1 Scalable Resilient Media Streaming Suman Banerjee, Seungjoon Lee, Ryan Braud, Bobby Bhattacharjee, Aravind Srinivasan NOSSDAV.
1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
Using Ant Agents to Combine Reactive and Proactive strategies for Routing in Mobile Ad Hoc Networks Fredrick Ducatelle, Gianni di caro, and Luca Maria.
Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.
Reliable Adaptive Lightweight Multicast Protocol Ken Tang, Scalable Network Technologies Katia Obraczka, UC Santa Cruz Sung-Ju Lee, Hewlett-Packard Laboratories.
INF3190 – Home Exam 2. Goal The goal of this exercise is to provide network layer reliability for the monitoring/administration tool presented in “home.
Mobile Networks and Applications (January 2007) Presented by J.H. Su ( 蘇至浩 ) 2016/3/21 OPLab, IM, NTU 1 Joint Design of Routing and Medium Access Control.
CMPE Wireless and Mobile Networking 1 Reliable Mutlicast in MANETs.
Trustworthy Conferencing via Domain-specific Modeling and Low Latency Reliable Protocols Joe Hoffert, Douglas Schmidt (Vanderbilt University); Mahesh Balakrishnan,
Ch 3. Transport Layer Myungchul Kim
Speaker: Yu-Jen Lai Cheng-Chih Chao Advisor: Hung-Yu Wei 2009/06/08 1 Dong Nguyen, Tuan Tran, Thinh Nguyen, and Bella Bose, Fellow, IEEE IEEE TRANSACTIONS.
Introduction to Networks
ECE 544 Protocol Design Project 2016
Network Layer Goals: Overview:
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
Error recovery for Packet Audio and Video
A Selective Retransmission Protocol for Multimedia on the Internet
Tarun Banka Department of Computer Science Colorado State University
Low-Latency Adaptive Streaming Over TCP
Error Checking continued
Impact of transmission errors on TCP performance
Presentation transcript:

Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University

The Contemporary Datacenter Building-wide super-clusters: 1000s of commodity blade-servers Typically used as commercial website back-ends: Amazon, etc. Software Paradigms: SOA, Eventing, Publish/Subscribe… … many-to-many communication, Multicast!

Multicast in the Datacenter IP Multicast available: adding reliability to it is a well-researched technology… Scalability dimensions Number of receivers Number of senders? Number of groups? Metrics Throughput Timeliness?

Time-Critical Applications … dealing in perishable data: stock quotes, location updates … willing to trade complete reliability for timeliness … requiring tunable reliability/ timeliness/ overhead tradeoffs Probabilistic Guarantee of Timeliness? For x% overhead, y% of lost packets are recovered in time t. Remainder can be optionally recovered in time t’.

Design Space Reactive vs. Proactive Reactive: Loss Discovery ACK Sender-Based Sequencing If the multicast rate in a group is constant, the inter-multicast time at any sender goes up linearly with the number of senders Gossip – Scalable Proactive: FEC – Tunable

Slingshot Overview Receiver-Based FEC: Senders send initially via unreliable IP Multicast Phase 1: Receivers repair losses by proactively sending each other FEC repair packets Phase 2: Remaining losses are recovered from the sender Each receiver sends an error correction (XOR) packet to c randomly selected receivers with the last r packets it received Rate-of-fire parameter (r, c): Allows tuning of overhead-timeliness tradeoff

Protocol Details 0 Two Packet Types: Packet ID (Sender, SeqNo) Application Payload XOR of Data Packets List of Data Packet IDs: (sender1,seqno1), (sender2,seqno2)…. Data Packet : Repair Packet : Application MTU: 1024 Less than Network MTU Terminology: Data packets are included in repair packet

Protocol Details 1 Data Structures: Data Buffer: received data packets Repair Bin: pointers to last <r data packets Arrival of Data Packet dp at Receiver: dp is added to the data buffer &dp is added to the repair bin If repair bin size equals r, a repair packet rp is created from its contents, and the repair bin is cleared rp is dispatched to c random receivers

Protocol Details 2 Arrival of Repair Packet rp at Receiver: If #(missing included data packets) == 0: rp is discarded 1: it is recovered by XORing rp with the other r-1 data packets >1: rp is stored in a special buffer, in case future data packet arrivals and recoveries make it usable

Evaluation Setup 64 node rack-style cluster at Cornell Loss rate fixed at 1%: packets dropped at end buffers All nodes send and receive Inter-node latencies = microseconds Group Data Rate: 1000 packets per second Each node multicasts 64 packets per second; i.e one packet every 64 milliseconds

Slingshot Tunability For 27% overhead, 93.5% Lost Packets are recovered at an avg. of 3.5 milliseconds Example Tradeoff Points between Overhead, Timeliness, and Reliability Overhead and Recovered Packets plotted on left y-axis, Recovery Time on right

Slingshot vs SRM Slingshot recovers 93% in 10 ms, 97% in 25 ms Fastest SRM packet Recovery is 2.2 seconds 93% in 4.85 seconds, 97% in 5.1 seconds 2-3 Orders of Magnitude faster

Slingshot Scalability: Group Size Gossip-Style Scalability: Insensitive to scale beyond a certain size Simulation Results:

Conclusion Slingshot provides a tunable, probabilistic guarantee of timeliness Outperforms SRM by 2 orders of magnitude in a 64 node system Insensitive to number of senders Future Work: Achieve scalability in other dimensions (number of groups) Build a time-critical middleware layer that uses Slingshot as a generic primitive