Systems Issues for Scalable, Fault Tolerant Internet Services

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Tableau Software Australia

Distributed Processing, Client/Server and Clusters

Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.

ITIS 3110 Jason Watson. Replication methods o Primary/Backup o Master/Slave o Multi-master Load-balancing methods o DNS Round-Robin o Reverse Proxy.

CREAM-CE status and evolution plans Paolo Andreetto, Sara Bertocco, Alvise Dorigo, Eric Frizziero, Alessio Gianelle, Massimo Sgaravatto, Lisa Zangrando.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

Distributed components

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

G Robert Grimm New York University Scalable Network Services.

“ Adapting to Network and Client Variation Using Infrastructural Proxies : Lessons and Perspectives ” University of California Berkeley Armando Fox, Steven.

Big Infrastructure, Small Clients Prof. Eric A. Brewer

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

G Robert Grimm New York University Scalable Network Services.

Lesson 1: Configuring Network Load Balancing

Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.

Post-PC Summary Prof. Eric A. Brewer

Systems Issues for Scalable, Fault Tolerant Internet Services Yatin Chawathe Eric Brewer To appear in Middleware ’98

.NET Mobile Application Development Introduction to Mobile and Distributed Applications.

DISTRIBUTED COMPUTING

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

1 The Google File System Reporter: You-Wei Zhang.

Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.

Module 12: Designing High Availability in Windows Server ® 2008.

Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

MapReduce M/R slides adapted from those of Jeff Dean’s.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul.

ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.

Objective What is RFT ? How does it work Architecture of RFT RFT and OGSA Issues Demo Questions.

WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.

Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.

By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.

Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.

Configuring Print Services Lesson 7. Print Sharing Print device sharing is another one of the most basic applications for which local area networks were.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.

Cluster-Based Scalable

REPLICATION & LOAD BALANCING

Introduction to Distributed Platforms

Slicer: Auto-Sharding for Datacenter Applications

Affinity Depending on the application and client requirements of your Network Load Balancing cluster, you can be required to select an Affinity setting.

Building Distributed Educational Applications using P2P

Introduction to Load Balancing:

Module 8: Concepts of a Network Load Balancing Cluster

CHAPTER 3 Architectures for Distributed Systems

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Replication Middleware for Cloud Based Storage Service

An Introduction to Computer Networking

Lecture 1: Multi-tier Architecture Overview

Design pattern for cloud Application

Software models - Software Architecture Design Patterns

Data Security in Local Networks using Distributed Firewalls

Indirect Communication Paradigms (or Messaging Methods)

Indirect Communication Paradigms (or Messaging Methods)

CS703 - Advanced Operating Systems

Database System Architectures

An Architecture for Secure Wide-Area Service Discovery

Abstractions for Fault Tolerance

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

Systems Issues for Scalable, Fault Tolerant Internet Services Yatin Chawathe Eric Brewer To appear in Middleware ’98 http://www.cs.berkeley.edu/~yatin/papers/sns-crc.ps

Motivation Proliferation of network-based services Two critical issues must be addressed by Internet services: System scalability Incremental and linear scalability Availability and fault tolerance 24x7 operation

A Reusable SNS Framework Clusters of workstations are ideal for Internet services [FGC+97] But, clusters are difficult to manage To ensure linear scalability, service must distribute load across the cluster Service must grow the cluster with increasing load Partial failures within a cluster complicate fault management Isolate common requirements of cluster-based Internet apps into a reusable substrate -- the Scalable Network Services (SNS) framework

Architecture ... ... Outside World Worker Worker Driver Worker Internal Network SNS Manager Worker Driver Worker Worker Driver Worker ...

Workers Workers are grouped into classes. Within a class, workers are identical Workers can receive tasks from the outside world, or from other workers Workers have a simple serial interface for tasks The originator sends a task to the consumer by specifying the class and inputs for the task Tasks are atomic and restartable Worker Drivers present a narrow interface between the SNS substrate and the worker application

Centralized SNS Manager SNS Manager is intentionally centralized makes it easier to reason about and implement the various policies “all” we need to do is ensure the fault tolerance of the manager, and make sure it is not a performance bottleneck Three key functions Resource location Load balancing and scalability Fault tolerance

Resource Location Persistent Connection Find Register Found Worker Worker Driver Worker Worker Driver Persistent Connection Find Register Found Multicast Beacons Multicast Beacons Multicast Beacons SNS Manager

Load Balancing Load measurement and reporting Each worker examines incoming requests and estimates the “load” that would be generated Simplest load metric: queue length at workers Workers periodically report their current load to the SNS Manager SNS Manager maintains load history and aggregates load reports from all workers Load reports are piggybacked on manager beacons to rest of the system

Load Balancing Each worker performs local load balancing decisions Use lottery scheduling -- # of tickets are inversely proportional to worker load Stale load reports can cause oscillations Use a correction factor based on the number of requests that were sent since last load report

Auto-launch for Scalability Worker replication to handle short traffic bursts Multiple workers handle requests in parallel If load on a class of workers gets too high, the SNS Manager launches a new one Overflow pool for long bursts non-dedicated set of machines (e.g. users’ desktop machines) when all dedicated nodes are exhausted, harness an overflow node; release it after burst subsides useful for incremental scalability

Fault Tolerance Starfish Fault tolerance Two mechanisms: “Peer” monitoring as opposed to primary/secondary fault tolerance Two mechanisms: Timeouts and retries Preemptive detection and component restart Reliance on soft state simplifies crash recovery

Fault Tolerance Worker Worker Driver Worker Worker Driver Worker AmRestarting ReRegister SNS Manager SNS Manager SNS Manager SNS Manager SNS Manager

Example Applications TranSend Wingman TopGun Mediaboard MARS Web proxy for on-the-fly content distillation Wingman The world’s only graphical web browser for the 3COM PalmPilot TopGun Mediaboard PDA groupware: shared electronic whiteboard for the 3COM PalmPilot MARS MBone archive server

Evaluation

Evaluation

Evaluation Workers 4 & 5started Worker 3 started Worker 2 started

Summary Reusable architecture substrate for building Internet service applications Application developers program their services to a well-defined narrow interface SNS takes care of resource location, spawning, load balancing, fault tolerance Number of interesting applications on top of the SNS substrate Next step: SNSv2 NINJA