Presentation is loading. Please wait.

Presentation is loading. Please wait.

Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless.

Similar presentations


Presentation on theme: "Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless."— Presentation transcript:

1 Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless service instance –Path: composition of operators Control signaling –Done by Call-Agent (CA) –Soft-state protocol (H.Wang et.al., Infocom 2000) ICEBERG Call Session Data path: Examples MPEG-3 PCM GSM PCM GSM Problem: monitoring and fail-over Challenges Real-time streams Scaling Allow service composition Optimal operator placement Wide-area latency issues

2 Related Work Transcoding platforms for client adaptation –Active Services (E. Amir), TACC (A. Fox) oAS: Composition not considered oTACC: no long-lived sessions oWhat happens on cluster failure? oOptimal operator placement not addressed Fault-tolerant networks –Telephone, ATM, Supercomputing: link-level failure detection oWill not work for application-level operation on the Internet Distributed computing platforms –Fault-tolerant computing, State recovery protocols –Monitoring and load-balancing (Remulac, N/w Weather Service) oWe do not require strict consistency (assumption) oFail-over in our case – almost like handoffs

3 Single-cluster vs. Wide-Area paths Single-cluster-based approach –Path contained within single cluster running APC –Appealing since we can reuse a lot mechanisms from TACC/AS –Cluster-wide manager(s) to monitor and restore operators (workers/servents) Wide-area approach –Multiple clusters of operators –Allow composition across clusters Source Destination Manager Cluster Source Destination Cluster 1 Cluster 2

4 Comparing the two approaches Single-cluster-based approach +Tight control possible +Quick fail-over +Communication between two adjacent operators is easier –Entire path fails on cluster failure –May result in non-optimal network resource usage –Difficulties with proprietary operators, special hardware-based operators Wide-area approach –Need a cross-cluster monitoring mechanism –Wide-area latency issues +Can handle cluster-failure or cut-off +Allows better resource management (optimal operator placement) +Orthogonality in fault- tolerance: geographical, across ISPs +More flexible in terms of deployment

5 Monitoring and Fail-over in Wide-area Service Composition Bhaskaran Raman, Z. Morley Mao ICEBERG, EECS, U.C.Berkeley Service 1 Service 2 Service 3

6 Wide-Area Approach Monitoring the path –Centralized  Lack of tight control  Sacrifice quick recovery –Distributed  Complications (how to do?) Hierarchical approach to fault-tolerance –Within cluster: »Handle failure within cluster if possible »Cluster managers to do this »Can be done quickly because of tight control within cluster –Across clusters: »Separate cluster failure detection mechanism »Need monitoring infrastructure This is appropriate since: –Pr(process-failure) > Pr(machine-failure) > Pr(cluster-failure)

7 Wide-Area Approach (Continued) Network of clusters Control paths for cluster monitoring –Replicated across manager machines in the cluster-pair –Replicated in the wide-area –To handle machine and network failures Aggregated control paths for scaling Fits in well with ICEBERG model of iPOP clusters Cluster Manager Cluster Control Path

8 Status and Plans Finished the first prototype using Ninja 1.5 (iSpace) Supports the following applications: –Listening to MPEG-3 songs from a Jukebox using cell-phone –Communication between a VAT and a cell-phone –Retrieving emails using cell-phone Failure recovery model: –Partial path repair when possible »Detects process-level failure Proposed evaluation mechanisms –Wide-area test-bed: between Berkeley and TU-Berlin –Trace-driven simulation of failure recovery algorithms –Evaluation criteria: »Path construction latency »Failure-recovery time »Scaling in number of paths, control mechanism overhead »Ease of service composition

9 Summary Monitoring/fail-over infrastructure crucial for data paths Hierarchical monitoring to align with failure probabilities Application to other replicated services? –having loose consistency semantics –E.g., Internet video servers, OceanStore storage servers, Web-cache servers Interesting research issues: –Cross-cluster monitoring –Aggregated, hierarchical monitoring –Shadow data paths –Replicated control paths –Feasibility of real-time operator recovery –Optimal operator placement


Download ppt "Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless."

Similar presentations


Ads by Google