Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul Gauthier Paul Gauthier Presenter: Kang Cao
Over View IntroductionIntroduction Cluster-Based Scalable Service ArchitectureCluster-Based Scalable Service Architecture Service ImplementationService Implementation MeasurementsMeasurements DiscussDiscuss conclusionconclusion
Introduction GoalGoal Advantages of ClustersAdvantages of Clusters Challenges of Cluster computingChallenges of Cluster computing BASE SemanticsBASE Semantics
Goal ScalabilityScalability –Keep same per-user cost as load increases. Availability:Availability: –Run 24 hour a day and 7 day a week Cost effectivenessCost effectiveness
Advantages ScalabilityScalability –Clusters are well suited to Internet Service workload –Incremental scalability High availabilityHigh availability Commodity building blocksCommodity building blocks –Cheap commodity PC –Get service quickly and cheap
challenges AdministrationAdministration Component VS. System replicationComponent VS. System replication Partial failuresPartial failures Share statesShare states
BASE Semantics Against ACID(atomicity, consistency,isolation,durability) StaleStale Soft stateSoft state ApproximateApproximate
Cluster-Based Scalable Service Architecture Layer ArchitectureLayer Architecture Separate network services from their implementationSeparate network services from their implementation Stateless workersStateless workers
Cluster-Based Scalable Service Architecture SNSSNS TACCTACC ServiceService
Scalable network service Incremental and absolute scalabilityIncremental and absolute scalability Worker load balancing and overflow managementWorker load balancing and overflow management Front-end availability, fault tolerance mechanismsFront-end availability, fault tolerance mechanisms System monitoring and loggingSystem monitoring and logging
SNS SNSManagerSNSManager InternalNetwork Front End MSMS MSMS MSMS Worker Driver WorkerWorker WorkerWorker... $ $ Internet
Load balance Centralized load balancingCentralized load balancing Easy to implementEasy to implement
How to handle Bursts Has a overflow poolHas a overflow pool Manager can spawn workers on overflow machines on the demandManager can spawn workers on overflow machines on the demand
Scalability Components replicatedComponents replicated Amount of additional resources required is a linear function of the increase in offered loadAmount of additional resources required is a linear function of the increase in offered load Partition the function between front end and workerPartition the function between front end and worker Keep worker as simple as possibleKeep worker as simple as possible
Fault Tolerance and Availability Fault Tolerance and Availability Process peer fault toleranceProcess peer fault tolerance Using soft statesUsing soft states Timeout as an additional fault- tolerance mechanismTimeout as an additional fault- tolerance mechanism
TACC TACC: Transformation, Aggregation, Caching, Customization API for composition of stateless data transformation and content aggregation modulesAPI for composition of stateless data transformation and content aggregation modules Uniform caching of original, post- aggregation and post-transformation dataUniform caching of original, post- aggregation and post-transformation data Transparent access to Customization databaseTransparent access to Customization database
TACC A programming model for internet Service TransformationTransformation AggregationAggregation CachingCaching CustomizationCustomization
Service Implementation Workers that present human interface to what TACC modules do, including device-specific presentationWorkers that present human interface to what TACC modules do, including device-specific presentation User interface to control the serviceUser interface to control the service Most service can be done at the service and TACC layersMost service can be done at the service and TACC layers
Example:TranSend Model pool switch workstationWorkstationworkstation Internet
TranSend Front EndsFront Ends Load balancing ManagerLoad balancing Manager User profile DatabaseUser profile Database Cache NodesCache Nodes Datatype-Specific DistillersDatatype-Specific Distillers Graphical MonitorGraphical Monitor
Load Balancing Manager Client-side JavaScript support balance load across multiple front endsClient-side JavaScript support balance load across multiple front ends Centralized manager for internal load balancingCentralized manager for internal load balancing
Load balancing components register to managercomponents register to manager Front end asks manager to give it a worker when it has taskFront end asks manager to give it a worker when it has task Manager locates a worker to Front endManager locates a worker to Front end Manager may create a new distillerManager may create a new distiller Workers report their load to managerWorkers report their load to manager
Load balancing Manager broadcast the information of load periodicallyManager broadcast the information of load periodically FrontEnds cache these informationFrontEnds cache these information FrontEnds use the cached information to dispatch requests to workersFrontEnds use the cached information to dispatch requests to workers
Fault Tolerance and crash Recovery Using BASE semantics simplifies crash recoveryUsing BASE semantics simplifies crash recovery Manager reports workers failures to the FrontEndManager reports workers failures to the FrontEnd Manager detects and restarts a crashed front endManager detects and restarts a crashed front end The front end detects and restarts a crashed managerThe front end detects and restarts a crashed manager
Performance Load balancing
Performance: Load balancing
Conclusions: Layer architecture for cluster- base scalable network serviceLayer architecture for cluster- base scalable network service The architecture is reusableThe architecture is reusable Cluster-based value-added network services will become an important Internet-service paradigmCluster-based value-added network services will become an important Internet-service paradigm
Performance: Scalability
question 1.Why are the cluster-based network service well suited to internet service
answer The requirements are highly parallel( many indepent simultaneous users)The requirements are highly parallel( many indepent simultaneous users) The grain size typically corresponds to at most a few CPU seconds on a commodity PCThe grain size typically corresponds to at most a few CPU seconds on a commodity PC
Question 2 Why does the cluster-base network service use BASE semantics?Why does the cluster-base network service use BASE semantics?
Answer: BASE semantics allow us to handle partial failure in clusters with less complexity and cost.BASE semantics allow us to handle partial failure in clusters with less complexity and cost.
Question 3 When the overflow machines are being recruited unusually often, what should be done at this time?When the overflow machines are being recruited unusually often, what should be done at this time?
Answer: It is time to add new machines.It is time to add new machines.
Question 4 Does the Frontend crash not lost any information? If does, what kind information will be lost?Does the Frontend crash not lost any information? If does, what kind information will be lost?
Answer: User requests will be lost and user need to handle timeout and resend request.User requests will be lost and user need to handle timeout and resend request.