A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Availability and Performance in Wide-Area Service Composition Bhaskaran Raman EECS, U.C.Berkeley July 2002.
IST 201 Chapter 9. TCP/IP Model Application Transport Internet Network Access.
Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
15-441: Computer Networking Lecture 26: Networking Future.
Criticisms of I3 Jack Lange. General Issues ► Design ► Performance ► Practicality.
Wide-Area Service Composition: Availability, Performance, and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley SAHARA Retreat, Jan 2002.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
CS 268: Project Suggestions Ion Stoica February 6, 2003.
1 End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Tesseract A 4D Network Control Plane
Multipath Routing Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless.
QoS-Aware Path Protection in MPLS Networks Ashish Gupta Ashish Gupta Bijendra Jain Indian Institute of Technology Delhi Satish Tripathi University of California.
1 Towards a deployable IP Anycast service Hitesh Ballani, Paul Francis Cornell University {hitesh,
Availability in Wide-Area Service Composition Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
PROMISE: Peer-to-Peer Media Streaming Using CollectCast Presented by: Randeep Singh Gakhal CMPT 886, July 2004.
Cellular IP: Proxy Service Reference: “Incorporating proxy services into wide area cellular IP networks”; Zhimei Jiang; Li Fung Chang; Kim, B.J.J.; Leung,
Chapter 2 TCP/ IP PROTOCOL STACK. TCP/IP Protocol Suite Describes a set of general design guidelines and implementations of specific networking protocols.
Communication Part IV Multicast Communication* *Referred to slides by Manhyung Han at Kyung Hee University and Hitesh Ballani at Cornell University.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
RUNNING PARALLEL APPLICATIONS BEYOND EP WORKLOADS IN DISTRIBUTED COMPUTING ENVIRONMENTS Zholudev Yury.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
An Overlay Architecture for High Quality VoIP Streams IEEE Trans. on Multimedia 2006 R 翁郁婷 R 周克遠.
An Architecture for Optimal and Robust Composition of Services across the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
A Framework for Highly-Available Cascaded Real-Time Internet Services Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001 Examination Committee:
1 Chapter 1 OSI Architecture The OSI 7-layer Model OSI – Open Systems Interconnection.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
A Framework for Highly-Available Session-Oriented Internet Services Bhaskaran Raman, Prof. Randy H. Katz {bhaskar, The ICEBERG Project.
Information-Centric Networks07a-1 Week 7 / Paper 1 Internet Indirection Infrastructure –Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
6/1/991 Internetworking connectionless and connection-oriented networks Malathi Veeraraghavan Mark Karol Polytechnic UniversityBell Laboratories
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Computer Networks with Internet Technology William Stallings
Performance and Availability in Wide-Area Service Composition Bhaskaran Raman ICEBERG, EECS, U.C.Berkeley Presentation at Siemens, June 2001.
Resilient Overlay Networks By David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT RON Paper from ACM Oct Advanced Operating.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
A comparison of overlay routing and multihoming route control Hayoung OH
Wide-Area Service Composition: Performance, Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Presentation at Ericsson, Jan 2002.
4061 Session 25 (4/17). Today Briefly: Select and Poll Layered Protocols and the Internets Intro to Network Programming.
A Light-Weight Distributed Scheme for Detecting IP Prefix Hijacks in Real-Time Lusheng Ji†, Joint work with Changxi Zheng‡, Dan Pei†, Jia Wang†, Paul Francis‡
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
4.1.4 multi-homing.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
A Framework for Composing Services Across Independent Providers in the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
6/1/991 Internetworking connectionless and connection-oriented networks Malathi Veeraraghavan Mark Karol Polytechnic UniversityBell Labs.
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
4.1.5 multi-homing.
Plethora: Infrastructure and System Design
ECE 671 – Lecture 16 Content Distribution Networks
COS 561: Advanced Computer Networks
COS 461: Computer Networks
Presentation transcript:

A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley

Problem Statement Real-time services with long-lived sessions Need to provide continued service in the face of failures Internet Video smoothing proxy Transcoding proxy Clients Content server Real-time services

Problem Statement Goals: 1.Quick recovery 2.Scalability Client Monitoring Video-on-demand server Network path failure Service failure Service replica

Our Approach Service infrastructure –Computer clusters deployed at several points on the Internet –For service replication & path monitoring –(path = path of the data stream in a client session)

Summary of Related Work Fail-over within a single cluster –Active Services (e.g., video transcoding) –TACC (web-proxy) –Only service failure is handled Web mirror selection –E.g., SPAND –Does not handle failure during a session Fail-over for network paths –Internet route recovery Not quick enough –ATM, Telephone networks, MPLS No mechanism for wide-area Internet No architecture to address network path failure during a real-time session

Research Challenges Wide-area monitoring –Feasibility: how quickly and reliably can failures be detected? –Efficiency: per-session monitoring would impose significant overhead Architecture –Who monitors who? –How are services replicated? –What is the mechanism for fail-over?

Is Wide-Area Monitoring Feasible? Monitoring for liveness of path using keep-alive heartbeat Time Failure: detected by timeout Timeout period Time False-positive: failure detected incorrectly Timeout period There’s a trade-off between time-to-detection and rate of false-positives

Is Wide-Area Monitoring Feasible? False-positives due to: –Simultaneous losses –Sudden increase in RTT Related studies: –Internet RTT study, Acharya & Saltz, UMD 1996 RTT spikes are isolated –TCP RTO study, Allman & Paxson, SIGCOMM 1999 Significant RTT increase is quite transient Our experiments: –Ping data from ping servers –UDP heartbeats between Internet hosts

Ping measurements Ping servers –12 geographically distributed servers chosen –Approximation of a keep-alive stream Count number of loss runs with > 4 simultaneous losses –Could be an actual failure or just intermittant losses –If we have 1 second HBs, and timeout after losing 4 HBs –This count gives the upper bound on the number of false-positives Ping server Berkeley Internet host HTTP ICMP

Ping measurements Ping serverPing hostTotal time> 4 misses cgi.shellnet.co.ukhnets.uia.ac.be15:14:1412 hnets.uia.ac.besites.inka.de20:55:5829 cgi.shellnet.co.ukhnets.uia.ac.be14:53:140 d18183f47.rochester.rr.comwww.his.com17:30: zeus.lyceum.comwww.atmos.albany.edu20:1:

UDP-based keep-alive stream Geographically distributed hosts: –Berkeley, Stanford, UIUC, TU-Berlin, UNSW UDP heart-beat every 300ms Measure gaps between receipt of successive heart-beats False positive: –No heartbeat received for > 2 seconds, but received before 30 seconds Failure: –No HB for > 30 seconds

UDP-based keep-alive stream HB destinationHB sourceTotal timeNum. False positives BerkeleyUNSW130:48:45135 UNSWBerkeley130:51:459 BerkeleyTU-Berlin130:49:4627 TU-BerlinBerkeley130:50:11174 TU-BerlinUNSW130:48:11218 UNSWTU-Berlin130:46:3824 BerkeleyStanford124:21:55258 StanfordBerkeley124:21:192 StanfordUIUC89:53:174 UIUCStanford76:39:1074 BerkeleyUIUC89:54:116 UIUCBerkeley76:39:403

What does this mean? If we have a failure detection scheme –Timeout of 2 sec –False positives can be as low as once a day –For many pairs of Internet hosts In comparison, BGP route recovery: –> 30 seconds –Can take upto 10s of minutes, Labovitz & Ahuja, SIGCOMM 2000 –Worse with multi-homing (an increasing trend)

Architectural Requirements Efficiency: –Monitoring per-session  too much overhead –End-to-end  more latency  monitoring less effective Need aggregation –Client-side aggregation: using a SPAND-like server –Server-side aggregation: clusters –But not all clients have the same server & vice-versa Service infrastructure to address this –Several service clusters on the Internet

Architecture Internet Service cluster: Compute cluster capable of running services Keep-alive stream Client Source Overlay topology Nodes = service clusters Links = monitoring channels between clusters Source  Client Routed via monitored paths Could go through an intermediate service Local recovery using a backup path

Architecture Monitoring within cluster for process/machine failure Monitoring across clusters for network path failure Peering of service clusters – to server as backups for one another – or for monitoring the path between them

Architecture: Advantages 2-level monitoring  –Process/machine failures still handled within cluster –Common failure cases do not require wide-area mechanisms Aggregation of monitoring across clusters  –Efficiency Model works for cascaded services as well Client Source S1 S2

Architecture: Potential Criticism Potential criticism: –Does not handle resource reservation Response: –Related issue, but could be orthogonal –Aggregated/hierarchical reservation schemes (e.g., the Clearing House) –Even if reservation is solved (or is not needed), we still need to address failures

Architecture: Issues (1) Routing Internet Client Source Given the overlay topology, Need a routing algorithm to go from source to destination via service(s) Also need: (a) WA-SDS, (b) Closest service cluster to a given host

Architecture: Issues (2) Topology How many service-clusters? (nodes) How many monitored-paths? (links)

Ideas on Routing BGP –Border router –Peering session Heartbeats –IP route Destination based Service infrastructure –Service cluster –Peering session Heartbeats –Route across clusters Based on destination and intermediate service(s) Similarities with BGP

How is routing in the overlay topology different from BGP? Overlay topology –More freedom than physical topology –Constraints on graph can be imposed more easily –For example, can have local recovery Ideas on Routing S (Berkeley) S’ (San Francisco)

Ideas on Routing AP1 AP2 AP3 BGP exchanges a lot of information Increases with the number of APs – O(100,000) Service clusters need to exchange very little information Problems of (a) Service discovery (b) Knowing the nearest service cluster – are decoupled from routing Source Client

Ideas on Routing BGP routers do not have per- session state But, service clusters can maintain per-session state –Can have local recovery pre- determined for each session Finally, we probably don’t need to have as many nodes as the number of BGP routers –can have more aggressive routing algorithm It is feasible to have fail-over in the overlay topology quicker than Internet route recovery with BGP

Ideas on topology Decision criteria –Additional latency for client session Sparse topology  more latency –Monitoring overhead Many monitored paths  more overhead, but additional flexibility  might reduce end-to-end latency

Ideas on topology Number of nodes: –Claim: upper bound is: one per AS –This will ensure that there is a service cluster “close” to every client –Topology will be close to Internet backbone topology Number of monitoring channels: –# AS: ~7000 as of 1999 –Monitoring overhead: ~100 Bytes/sec Can have ~1000 peering sessions per service cluster easily –Dense overlay topology possible (to minimize additional latency)

Implementation + Performance Service cluster Peer cluster Manager node Exchange of session-information + Monitoring heartbeat

Implementation + Performance PCM  GSM codec service –Overhead of “hot” backup service: 1 process (idle) –Service reinstantiation time: ~100ms End-to-end recovery over wide-area: three components Detection time – O(2sec) Communication with replica – RTT – O(100ms) Replica activation: ~100ms

Implementation + Performance Overhead of monitoring –Keep-alive heartbeat –One per 300ms in our implementation –O(100Bytes/sec) Overhead of false-positive in failure detection –Session transfer –One message exchange across adjacent clusters Few hundred bytes

Summary Real-time applications with long-lived sessions –No support exists for path-failure recovery (unlike say, the PSTN) Service infrastructure to provide this support –Wide-area monitoring for path liveness: O(2sec) failure detection with low rate of false positives –Peering and replication model for quick fail-over Interesting issues: –Nature of overlay topology –Algorithms for routing and fail-over

Questions What are the factors in deciding topology? Strategies for handling failure near client/source –Currently deployed mechanisms for RIP/OSPF, ATM/MPLS failure recovery –What is the time to recover? Applications: video/audio streaming –More? –Games? Proxies for games on hand-held devices? (Presentation running under VMWare under Linux)