Availability and Performance in Wide-Area Service Composition Bhaskaran Raman EECS, U.C.Berkeley July 2002.

Slides:



Advertisements
Similar presentations
Brocade: Landmark Routing on Peer to Peer Networks Ben Y. Zhao Yitao Duan, Ling Huang, Anthony Joseph, John Kubiatowicz IPTPS, March 2002.
Advertisements

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
SDN Controller Challenges
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Designing a New Routing Simulator for DiffServ MPLS Networks Peng Zhang Zhansong Ma Raimo Kantola {pgzhang, zhansong,
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli University of Calif, Berkeley and Lawrence Berkeley National Laboratory SIGCOMM.
Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
Small-world Overlay P2P Network
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
G Robert Grimm New York University Scalable Network Services.
Wide-Area Service Composition: Availability, Performance, and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley SAHARA Retreat, Jan 2002.
Internet-Scale Research at Universities Panel Session SAHARA Retreat, Jan 2002 Prof. Randy H. Katz, Bhaskaran Raman, Z. Morley Mao, Yan Chen.
RRAPID: Real-time Recovery based on Active Probing, Introspection, and Decentralization Takashi Suzuki Matthew Caesar.
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
COS 461: Computer Networks
Wide-area cooperative storage with CFS
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless.
Availability in Wide-Area Service Composition Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley.
Case Study - GFS.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
An Architecture for Optimal and Robust Composition of Services across the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
Adaptive Failover Mechanism Motivation End-to-end connectivity can suffer during net failures Internet path outage detection and recovery is slow (shown.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
A Framework for Highly-Available Cascaded Real-Time Internet Services Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001 Examination Committee:
A Framework for Highly-Available Session-Oriented Internet Services Bhaskaran Raman, Prof. Randy H. Katz {bhaskar, The ICEBERG Project.
Sharing Information across Congestion Windows CSE222A Project Presentation March 15, 2005 Apurva Sharma.
A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.
Performance and Availability in Wide-Area Service Composition Bhaskaran Raman ICEBERG, EECS, U.C.Berkeley Presentation at Siemens, June 2001.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Wide-Area Service Composition: Performance, Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Presentation at Ericsson, Jan 2002.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Latency & Scaling Issues in Mobile-IP Sreedhar Mukkamalla Bhaskaran Raman.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
L Subramanian*, I Stoica*, H Balakrishnan +, R Katz* *UC Berkeley, MIT + USENIX NSDI’04, 2004 Presented by Alok Rakkhit, Ionut Trestian.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
A Framework for Composing Services Across Independent Providers in the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
Towards an integrated multimedia service hosting overlay Dongyan Xu Xuxian Jiang Proceedings of the 12th annual ACM international conference on Multimedia.
Accelerating Peer-to-Peer Networks for Video Streaming
Maximum Availability Architecture Enterprise Technology Centre.
CprE 458/558: Real-Time Systems
CS 457 – Lecture 12 Routing Spring 2012.
Intra-Domain Routing Jacob Strauss September 14, 2006.
Routing: Distance Vector Algorithm
Plethora: Infrastructure and System Design
Routing.
Early Measurements of a Cluster-based Architecture for P2P Systems
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
A New Multipath Routing Protocol for Ad Hoc Wireless Networks
COS 561: Advanced Computer Networks
COS 461: Computer Networks
EE 122: Lecture 22 (Overlay Networks)
Control-Data Plane Separation
Presentation transcript:

Availability and Performance in Wide-Area Service Composition Bhaskaran Raman EECS, U.C.Berkeley July 2002

Problem Statement 10% of paths have only 95% availability

Problem Statement (Continued) Poor availability of wide-area (inter-domain) Internet paths BGP recovery can take several 10s of seconds

Why does it matter? Streaming applications –Real-time Session-oriented applications –Client sessions lasting several minutes to hours Composed applications

Service Composition: Motivation Provider Q Texttospeech Provider R Cellular Phone repository Provider A Video-on-demand server Provider B Thin Client Transcoder Service-Level Path Other examples: ICEBERG, IETF OPES’00

Solution Approach: Alternate Services and Alternate Paths

Goals, Assumptions and Non-goals Goals –Availability: Detect and handle failures quickly –Performance: Choose set of service instances –Scalability: Internet-scale operation Operational model: –Service providers deploy different services at various network locations –Next generation portals compose services –Code is NOT mobile (mutually untrusting service providers) We do not address service interface issue Assume that service instances have no persistent state –Not very restrictive [OPES’00]

Related Work Other efforts have addressed: –Semantics and interface definitions OPES (IETF), COTS (Stanford) –Fault tolerant composition within a single cluster TACC (Berkeley) –Performance constrained choice of service, but not for composed services SPAND (Berkeley), Harvest (Colorado), Tapestry/CAN (Berkeley), RON (MIT) None address wide-area network performance or failure issues for long-lived composed sessions

Outline Architecture for robust service-composition –Failure detection in wide-area Internet paths Evaluation of effectiveness/overheads –Scaling –Algorithms for load-balancing –Wide-area experiments demonstrating availability Text-to-speech composed application

Requirements to achieve goals Failure detection/liveness tracking –Server, Network failures Performance information collection –Load, Network characteristics Service location Global information is required –Hop-by-hop approach will not work

Design challenges Scalability and Global information –Information about all service instances, and network paths in- between should be known Quick failure detection and recovery –Internet dynamics  intermittent congestion

Failure detection: trade-off What is a “failure” on an Internet path? –Outage periods happen for varying durations Monitoring for liveness of path using keep-alive heartbeat Time Failure: detected by timeout Timeout period Time False-positive: failure detected incorrectly  unnecessary overhead Timeout period There’s a trade-off between time-to-detection and rate of false-positives

Is “quick” failure detection possible? Study outage periods using traces –12 pairs of hosts Berkeley, Stanford, UIUC, CMU, TU-Berlin, UNSW Some trans-oceanic links, some within US (including Internet2 links) –Periodic UDP heart-beat, every 300 ms –Measure “gaps” between receive-times: outage periods –Plot CDF of gap periods

CDF of gap durations Ideal case for failure detection

CDF of gap distributions (continued) Failure detection close to ideal case For a timeout of about 1.8-2sec –False-positive rate is about 50% Is this bad? –Depends on: Effect on application Effect on system stability, absolute rate of occurrence

Rate of occurrence of outages Timeout for failure detection

Towards an Architecture Service execution platforms –For providers to deploy services –First-party, or third-party service platforms Overlay network of such execution platforms –Collect performance information –Exploit redundancy in Internet paths

Architecture Internet Service cluster: compute cluster capable of running services Peering: exchange perf. info. Destination Source Composed services Hardware platform Peering relations, Overlay network Service clusters Logical platform Application plane Overlay size: how many nodes? –Akamai: O(10,000) nodes Cluster  process/machine failures handled within

Key Design Points Overlay size: –Could grow much slower than #services, or #clients –How many nodes? A comparison: Akamai cache servers O(10,000) nodes for Internet-wide operation Overlay network is virtual-circuit based: –“Switching-state” at each node E.g. Source/Destination of RTP stream, in transcoder –Failure information need not propagate for recovery Problem of service-location separated from that of performance and liveness Cluster  process/machine failures handled within

Software Architecture Finding Overlay Entry/ExitLocation of Service Replicas Service-Level Path Creation, Maintenance, Recovery Link-State Propagation At-least -once UDP Perf. Meas. Liveness Detection Peer-Peer Layer Link-State Layer Service-Composition Layer Functionalities at the Cluster-Manager

Layers of Functionality Why Link-State? –Need full graph information –Also, quick propagation of failure information –Link-state flood overheads? Service-Composition layer: –Algorithm for service-composition Modified version of Dijkstra’s –To accommodate for constraints in service-level path Additive metric (latency) Load-balancing metric –Computational overheads? –Signaling for path creation, recovery Downstream to upstream

Link-State Overheads Link-state floods: –Twice for each failure –For a 1,000-node graph Estimate #edges = 10,000 –Failures (>1.8 sec outage): O(once an hour) in the worst case –Only about 6 floods/second in the entire network! Graph computation: –O(k*E*log(N)) computation time; k = #services composed –For 6,510-node network, this takes 50ms –Huge overhead, but: path caching helps –Memory: a few MB

Evaluation: Scaling Scaling bottleneck: –Simultaneous recovery of all client sessions on a failed overlay link Parameter –Load – number of client sessions with a single overlay node as exit node Metric –Average time-to-recovery of all paths failed and recovered

Evaluation: Emulation Testbed Idea: Use real implementation, emulate the wide-area network behavior (NistNET) Opportunity: Millennium cluster App Lib Node 1 Node 2 Node 3 Node 4 Rule for 1  2 Rule for 1  3 Rule for 3  4 Rule for 4  3 Emulator

Scaling Evaluation Setup 20-node overlay network –Created over 6,510 node physical network –Physical network generated using GT-ITM Latency variation: according to [Acharya & Saltz 1995] Load per cluster-manager (CM) –Vary from 25 to 500 Paths setup using latency metric 12 different runs –Deterministic failure of link with maximum #client paths –Worst-case in single-link failure

Average Time-to-Recovery vs. Load

CDF of recovery times of all failed paths

Path creation: load-balancing metric So far used a latency metric –In combination with modified Dijkstra’s algorithm –Not good for balancing load How to balance load across service instances? –During path creation and path recovery QoS literature: –Sum(1/available-bandwidth) for bandwidth balancing Applying this for server load balancing: –Metric: Sum(1/(max_load – curr_load)) –Study interaction with Link-state update interval Failure recovery

Load variation across replicas

Dealing with load variation Decreasing link-state update interval –More messages –Could lead to instability Use path-setup messages to update load –Do it all along the path Each node that sees the path setup message –Adds its load info to the message –Records all load info collected so far

Load variation with piggy-back

Load-balancing: effect on path length

Fixing the long- path effect Metric: Sum_services(1/(max_load-curr_load)) + Sum_noop(0.1/(max_load-curr_load))

Fixing the long- path effect

Wide-Area experiments: setup 8 nodes: –Berkeley, Stanford, UCSD, CMU –Cable modem (Berkeley) –DSL (San Francisco) –UNSW (Australia), TU-Berlin (Germany) Text-to-speech composed sessions –Half with destinations at Berkeley, CMU –Half with recovery algo enabled, other half disabled –4 paths in system at any time –Duration of session: 2min 30sec –Run for 4 days Metric: loss-rate measured in 5sec intervals

Loss-rate for a pair of paths

CDF of loss-rates of all paths failed

CDF of gaps seen at client

Improvement in Availability Availability % table (Client at Berkeley) Without recovery With recovery Day Day Day Day Day Day Day Day Day Day Day Availability % table (Client at CMU) Without recovery With recovery Day Day Day Day Day Day Day Day Day Day Day

Split of recovery time Text-to-Speech application Two possible places of failure Leg-2 Leg-1 Texttoaudio Text Source End-Client Request-response protocol Data (text, or RTP audio) Keep-alive soft-state refresh Application soft-state (for restart on failure)

Split of Recovery Time (continued) Recovery time: –Failure detection time –Signaling time to setup alternate path –State restoration time Experiment using tts application, using emulation –Recovery time = 3,300ms –1,800ms failure detection time –700ms signaling –450ms for state restoration New tts engine has to re-process current sentence

Summary Wide-area Internet paths have poor availability –Availability issues in composed sessions Architecture based on overlay network of service clusters Failure detection feasible in ~ 2sec Software-arch scales with #clients WA experiments show improvement in availability