Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

Slides:

Advertisements

Similar presentations

Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.

Advertisements

Distributed Processing, Client/Server and Clusters

High throughput chain replication for read-mostly workloads

Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.

Distributed Processing, Client/Server, and Clusters

Technical Architectures

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.

Approaches to Clustering CS444I Internet Services Winter 00 © Armando Fox

© 2001 Stanford Distinguishing P, S, D state n Persistent: loss inevitably affects application correctness, cannot easily be regenerated l Example: billing.

G Robert Grimm New York University Scalable Network Services.

“ Adapting to Network and Client Variation Using Infrastructural Proxies : Lessons and Perspectives ” University of California Berkeley Armando Fox, Steven.

Online Magazine Bryan Ng. Goal of the Project Product Dynamic Content Easy Administration Development Layered Architecture Object Oriented Adaptive to.

Cluster Computing Overview CS241 Winter 01 © Armando Fox

Big Infrastructure, Small Clients Prof. Eric A. Brewer

1 Porcupine: A Highly Available Cluster-based Mail Service Yasushi Saito Brian Bershad Hank Levy University of Washington Department of Computer Science.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

G Robert Grimm New York University Scalable Network Services.

Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.

Post-PC Summary Prof. Eric A. Brewer

Systems Issues for Scalable, Fault Tolerant Internet Services Yatin Chawathe Eric Brewer To appear in Middleware ’98

Advanced Distributed Software Architectures and Technology group ADSaT 1 Application Architectures Ian Gorton, Paul Greenfield.

Distributed Systems: Client/Server Computing

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Porcupine: A Highly Available Cluster- based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington.

Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.

Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

PMIT-6102 Advanced Database Systems

1 The Google File System Reporter: You-Wei Zhang.

Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.

Chapter 7: Architecture Design Omar Meqdadi SE 273 Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.

What is Architecture  Architecture is a subjective thing, a shared understanding of a system’s design by the expert developers on a project  In the.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.

Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.

Advanced Computer Networks Topic 2: Characterization of Distributed Systems.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.

Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul.

Chap 7: Consistency and Replication

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

CS223: Software Engineering Lecture 14: Architectural Patterns.

Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.

Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Cluster-Based Scalable

Client/Server Databases and the Oracle 10g Relational Database

Software Design and Architecture

CHAPTER 3 Architectures for Distributed Systems

Systems Issues for Scalable, Fault Tolerant Internet Services

THE GOOGLE FILE SYSTEM.

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture

Extensible Cluster Based Network Services Armando Fox Steven Gribble Yatin Chawathe Eric Brewer Paul Gauthier University of California Berkeley Inktomi Corporation Presenter: Ashish Gupta Advanced Operating Systems

Motivation Proliferation of network-based services Two critical issues must be addressed by Internet services:  System scalability  Incremental and linear scalability  Availability and fault tolerance  24x7 operation Clusters of workstations meet these requirements

Commodity PCs as unit of scaling Good Cost/performance Incremental Scalability “Embarrassingly parallel” workloads Map well onto workstations Redundancy of clusters Masks transient failures

Contribution of this work Isolate common requirements of cluster- based Internet apps into a reusable substrate the Scalable Network Services (SNS) framework Goal: complete separation of *ility concerns from application logic l Legacy code encapsulation l Insulate programmers from nasty engineering

Contribution of this work Architecture for SNS, exploiting the strength of cluster computing Separation of content of network services from implementation Encapsulation of low level functions in a lower layer Example of a new service A Programming Model to go with the architecture

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends Front Ends Front Ends Front EndsCachesCaches User Profile Database User Profile Database User Profile Database User Profile Database Workers Manager: Load Balancing & Fault Tolerance Load Balancing & Fault Tolerance Manager: Load Balancing & Fault Tolerance Load Balancing & Fault Tolerance Administration Interface Administration Interface Administration Interface Administration Interface Workers and Front-ends All control decisions for satisfying user requests localized in the front-ends: Which Servers to invoke, access profile database, notify the end-user etc. Workers simple and stateless  Behaviour of service defined entirely at the front-end  Analogy of processes in a Unix pipeline: ls –l | grep.pl | wc

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface Front-ends User Interface to SNS Queue requests for service Can Maintain State for many simultaneous outstanding requests

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface User Profile Allows Mass customization of request processing

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface Workers Caches, Service specific Modules Multiple Instantiation possible Themselves just perform a specific task, not responsible for load balancing, fault tolerance

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface Administrative Interface Tracking and Visualization of system’s behaviour Administrative actions

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface Manager Collects load information from the workers Balances load across workers Spawn additional workers on increased load, faults

The SNS architecture C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches WorkersWorkers Manager: Load Balancing & Fault Tolerance Manager: Administration Interface Workers and Front-ends All control decisions for satisfying user requests localized in the front-ends: Which Servers to invoke, access profile database, notify the end-user etc. Workers simple and stateless  Behaviour of service defined entirely at the front-end  Analogy of processes in a Unix pipeline: ls –l | grep.pl | wc User Profile Database

Separating the content from implementation Layered Software model Previous Components SNS Scalable Network Service Support TACC Transformation, Aggregation, Caching, Customization Service Service Specific Code SNS Provides Scalability Load Balancing Fault tolerance High Availability

The SNS Layer Scalability  Replicate well-encapsulated components  Prolonged Bursts: Notion of Overflow Pool Load Balancing  Centralized: Simple to implement and predicable

The SNS Layer Soft State for fault-tolerance and availability  Process peers watch each other  Because of no hard state, “recovery” == “restart” Load balancing, hot updates, migration are “easy”  Shoot down a worker, and it will recover  Upgrade == install new software, shoot down old  Mostly graceful degradation

“Starfish” Availability: LB Death FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A

“Starfish” Availability: LB Death FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

Question: How do we build the services in the higher layers? Transformation Operation on a single data object that changes its content T T C Aggregation Collecting data from several sources and collating it Caching Storing/re-computing easier than moving across internet Can also store post- transformation (or post- aggregation) content Customization Per user: for content generation Per device: data delivery, content “packaging” The TACC Model a model for structuring services

The TACC Model a model for structuring services Programming model based on composable building blocks Many existing services fit well within the TACC model

A Meta-Search Engine In TACC Uses existing services to create a new service 2.5 hours to write using TACC franework Metasearch Web UI Internet

An Example Service TRANSEND

Datatype-Specific Distillation Lossy compression that preserves semantic content Tailor content for each client Reduce end-to-end latency when link is slow Meaningful presentation for range of clients 1.2 The Remote Queue Model We introduce Remote Queues (RQ), …. 1.2 The Remote Queue Model We introduce Remote Queues (RQ), …. 65x 6.8x

TranSend SNS Components Workers = Distillers here Simple restart mechanism for fault-tolerance Each distiller took 5-6 hrs to write SNS Fault tolerance removes worries about occasional bugs/crashes

Measurements Request Generation:  High performance HTTP request playback engine Burstiness  Handled by the overflow pool

Load Balancing Metric: Queue Length at distillers Load reaches threshold: Manager spawns a new distiller

Scalability Strategy: Begin with minimal instance Increase offered load until saturation Add more resources to eliminate saturation Observations: Nearly perfect linear growth 1 Distiller ~ 23 requests/sec Front end ~ 70 requests/sec Ultimate bottleneck: Shared components of the system (Manager and the SAN) SAN could be bottleneck for communication- intensive workloads (Example of 10Mb/s eth) Topic for future research

Conclusion A layered architecture for cluster-based scalable network services Authors shielded from software complexity of automatic scaling, high availability, and failure management New services as composition of stateless workers A useful paradigm for deploying new Internet services

ACID vs BASE semantics An approximate answer delivered quickly is more useful than the exact answer slowly ACIDBASE Strong consistency - data precise or NOT OK Weak consistency - stale data OK Availability??? (not always)Availability First Focus on “commit”Best Effort Guarantees accurate answersApproximate Answers OK Difficult evolutionEasy Evolution Conservative (pessimistic)Aggressive (optimistic) Simpler, Faster (?)

ACID vs BASE semantics Search Engine as a database 1 Big table Unknown but large growth Must be truly highly available An approximate answer delivered quickly is more useful than the exact answer slowly

A DBMS would be too slow Choose availability over consistency Graceful degradation: OK to temporarily lose small random subsets of data due to faults C onsistency A tomicity I solation D urability Replace with Availablity Graceful degradation Performance BASE Basically Available Soft-State Eventual Consistency Database research is about ACID

Why BASE ? Idea: focus on looser semantics rather than ACID semantics ACID => data unavailable rather than available but inconsistent BASE => data available, but could be stale, inconsistent or approximate Real systems use BOTH semantics Claim: BASE can lead to simpler systems and better performance  Performance: caching and avoidance of communication and some locks (e.g. ACID requires strict locking and communication with replicas for every write and any reads without locks)  Simpler: soft-state leads to easy recovery and interchangable components BASE fits clusters well due to partial failure

More BASE… Reduces complexity of service implementation, consistency for simplicity  Fault Tolerance  Availability Opportunities for better performance optimizations in the SNS framework  ACID : durable and consistent state across partial failures  This Is relaxed in the BASE model Example of HotBot

THANK You

Backup Slides

Question 1. Why are the cluster-based network service well suited to internet service

answer The requirements are highly parallel( many indepent simultaneous users) The grain size typically corresponds to at most a few CPU seconds on a commodity PC

Question 2 Why does the cluster-base network service use BASE semantics?

Answer: BASE semantics allow us to handle partial failure in clusters with less complexity and cost.

Question 3 When the overflow machines are being recruited unusually often, what should be done at this time?

Answer: It is time to add new machines.

Question 4 Does the Front-end crash not lost any information? If does, what kind information will be lost?

Answer: User requests will be lost and user need to handle timeout and resend request.

Clustering and Internet Workloads Internet vs. “traditional” workloads  e.g. Database workloads (TPC benchmarks)  e.g. traditional scientific codes (matrix multiply, simulated annealing and related simulations, etc.) Some characteristic differences  Read mostly  Quality of service (best-effort vs. guarantees)  Task granularity “Embarrasingly parallel”…why?  HTTP is stateless with short-lived requests  Web’s architecture has already forced app designers to work around this! (not obvious in 1990)

Meeting the Cluster Challenges Software & programming models Partial failure and application semantics System administration Two case studies to contrast programming models  GLUnix goal: support “all” traditional Unix apps, providing a single system image  SNS/TACC goal: simple programming model for Internet services (caching, transformation, etc.), with good robustness and easy administration