Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R.

Similar presentations


Presentation on theme: "Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R."— Presentation transcript:

1 Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R Cellular Phone Email repository Provider A Video-on-demand server Provider B Thin Client Transcoder

2 Problem Statement and Goals Goals –Performance: Choose set of service instances –Availability: Detect and handle failures quickly –Scalability: Internet- scale operation Problem Statement –Path could stretch across –multiple service providers –multiple network domains –Inter-domain Internet paths: –Poor availability [Labovitz’99] –Poor time-to-recovery [Labovitz’00] –Take advantage of service replicas Provider A Video-on-demand server Provider B Thin Client Transcoder Related Work –TACC: composition within cluster –Web-server choice: SPAND, Harvest –Routing around failures: Tapestry, RON We address: wide-area n/w perf., failure issues for long-lived composed sessions

3 Is “quick” failure detection possible? What is a “failure” on an Internet path? –Outage periods happen for varying durations Study outage periods using traces –12 pairs of hosts Berkeley, Stanford, UIUC, UNSW (Aus), TU-Berlin (Germany) Results could be skewed due to Internet2 backbone? –Periodic UDP heart-beat, every 300 ms –Study “gaps” between receive-times Results: –Short outage (1.2-1.8 sec)  Long outage (> 30 sec) Sometimes this is true over 50% of the time –False-positives are rare: O(once an hour) at most –Similar results with ping-based study using ping-servers –Take away: okay to react to short outage periods, by switching service-level path

4 UDP-based keep-alive stream HB destinationHB sourceTotal timeNum. False positives Num. Failures BerkeleyUNSW130:48:4513555 UNSWBerkeley130:51:4598 BerkeleyTU-Berlin130:49:46278 TU-BerlinBerkeley130:50:111748 TU-BerlinUNSW130:48:112187 UNSWTU-Berlin130:46:38245 BerkeleyStanford124:21:552587 StanfordBerkeley124:21:1926 StanfordUIUC89:53:1741 UIUCStanford76:39:10741 BerkeleyUIUC89:54:1165 UIUCBerkeley76:39:4035 Acknowledgements: Mary Baker, Mema Roussopoulos, Jayant Mysore, Roberto Barnes, Venkatesh Pranesh, Vijaykumar Krishnaswamy, Holger Karl, Yun-Shen Chang, Sebastien Ardon, Binh Thai

5 Architecture Composed services Hardware platform Peering relations, Overlay network Service clusters Logical platform Application plane Service cluster: compute cluster capable of running services Internet Peering: exchange perf. info. Destination Source Finding Overlay Entry/Exit Location of Service Replicas Service-Level Path Creation, Maintenance, and Recovery Link-State Propagation At-least -once UDP Perf. Meas. Liveness Detection Functionalities at the Cluster-Manager

6 Evaluation What is the effect of recovery mechanism on application? –Text-to-Speech application –Two possible places of failure 20-node overlay network One service instance for each service Deterministic failure for 10sec during session Metric: gap between arrival of successive audio packets at the client What is the scaling bottleneck? –Parameter: #client sessions across peering clusters Measure of instantaneous load when failure occurs –5000 client sessions in 20-node overlay network –Deterministic failure of 12 different links (12 data-points in graph) –Metric: average time-to-recovery Leg-2 Leg-1 Texttoaudio Text Source End-Client Request-response protocol Data (text, or RTP audio) Keep-alive soft-state refresh Application soft-state (for restart on failure) 1 1 2 2

7 Recovery of Application Session: CDF of gaps>100ms Recovery time: 822 ms (quicker than leg-2 due to buffer at text-to-audio service) Recovery time: 2963 ms Recovery time: 10,000 ms Jump at 350-400 ms: due to synch. text-to-audio processing (impl. artefact) 1 1

8 Average Time-to-Recovery vs. Instantaneous Load Two services in each path Two replicas per service Each data-point is a separate run End-to-End recovery algorithm High variance due to varying path length Load: 1,480 paths on failed link Avg. path recovery time: 614 ms 2 2

9 Results: Discussion Recovery after failure (leg-2): 2,963 = 1,800 + O(700) + O(450) –1,800 ms: timeout to conclude failure –700 ms: signaling to setup alternate path –450 ms: recovery of application soft-state: re-process current sentence Without recovery algorithm: takes as long as failure duration O(3 sec) recovery –Can be completely masked with buffering –Interactive apps: still much better than without recovery Quick recovery possible since failure information does not have to propagate across network 12 th data point (instantaneous load of 1,480) stresses emulator limits –1,480 translates to about 700 simul. paths per cluster- manager –In comparison, our text-to-speech implementation can support O(15) clients per machine Other scaling limits? Link-state floods? Graph computation? 1 1 2 2

10 Summary Service Composition: flexible service creation We address performance, availability, scalability Initial analysis: Failure detection -- meaningful to timeout in O(1.2-1.8 sec) Design: Overlay network of service clusters Evaluation: results so far –Good recovery time for real-time applications: O(3 sec) –Good scalability -- minimal additional provisioning for cluster managers Ongoing work: –Overlay topology issues: how many nodes, peering –Stability issues Feedback, Questions? Presentation made using VMWare Evaluation Analysis Design

11 Emulation Testbed App Lib Node 1 Node 2 Node 3 Node 4 Rule for 1  2 Rule for 1  3 Rule for 3  4 Rule for 4  3 Emulator Operational limits of emulator: 20,000 pkts/sec, for upto 500 byte pkts, 1.5GHz Pentium-4


Download ppt "Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R."

Similar presentations


Ads by Google