G22.3250-001 Robert Grimm New York University Scalable Network Services.

Slides:

Advertisements

Similar presentations

Distributed Processing, Client/Server and Clusters

Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

Database Architectures and the Web

Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.

NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

CSE 190: Internet E-Commerce Lecture 16: Performance.

© 2001 Stanford Distinguishing P, S, D state n Persistent: loss inevitably affects application correctness, cannot easily be regenerated l Example: billing.

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”

“ Adapting to Network and Client Variation Using Infrastructural Proxies : Lessons and Perspectives ” University of California Berkeley Armando Fox, Steven.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

G Robert Grimm New York University Scalable Network Services.

Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.

Capacity planning for web sites. Promoting a web site Thoughts on increasing web site traffic but… Two possible scenarios…

Systems Issues for Scalable, Fault Tolerant Internet Services Yatin Chawathe Eric Brewer To appear in Middleware ’98

G Robert Grimm New York University (with some slides by Steve Gribble) Distributed Data Structures for Internet Services.

Study of Server Clustering Technology By Thao Pham and James Horton For CS526, Dr. Chow.

DISTRIBUTED COMPUTING

11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.

Copyright © 2002 Wensong Zhang. Page 1 Free Software Symposium 2002 Linux Virtual Server: Linux Server Clusters for Scalable Network Services Wensong Zhang.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Word Wide Cache Distributed Caching for the Distributed Enterprise.

1 The Google File System Reporter: You-Wei Zhang.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Database Design – Lecture 16

STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Software Performance Testing Based on Workload Characterization Elaine Weyuker Alberto Avritzer Joe Kondek Danielle Liu AT&T Labs.

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

Server to Server Communication Redis as an enabler Orion Free

9 Systems Analysis and Design in a Changing World, Fourth Edition.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.

 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.

System Models Advanced Operating Systems Nael Abu-halaweh.

Cluster-Based Scalable

Improving searches through community clustering of information

Large Distributed Systems

Network Load Balancing

Ministry of Higher Education

Systems Issues for Scalable, Fault Tolerant Internet Services

Building a Database on S3

Introduction to Databases Transparencies

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

Database System Architectures

Abstractions for Fault Tolerance

Presentation transcript:

G Robert Grimm New York University Scalable Network Services

Altogether Now: The Three Questions  What is the problem?  What is new or different or notable?  What are the contributions and limitations?

Implementing Global Services: Clusters to the Rescue?  Advantages  Scalability  Internet workloads tend to be highly parallel  Clusters can be more powerful than biggest iron  Clusters can also be expanded incrementally  Availability  Clusters have “natural” redundancy due to independence of nodes  There are no central points of failure in clusters anymore (?)  Clusters can be upgraded piecemeal (hot upgrade)  Commodity building blocks  Clusters of PCs have better cost/performance than big iron  PCs are also much easier to upgrade and order

Implementing Global Services: Clusters to the Rescue? (cont.)  Challenges  Administration  Maintaining lots of PCs vs. one big box can be daunting  Component vs. system replication  To achieve incremental scalability, we need component-level instead of system-level replication  Partial failures  Clusters need to handle partial failures while big iron does not  Do we believe this?  Shared state  Managing state across the cluster is harder than for big iron, which can rely on distributed shared memory

BASE Semantics  Trading between consistency, performance, availability  One extreme: ACID  Perfect consistency, higher overheads, limited availability  An alternative point: BASE  Basically available  Can tolerate stale data  Soft state to improve performance  Eventual consistency  Can provide approximate answers  The reality: Different components with different semantics  Focus on services with mostly BASE data

Overview of Architecture  Front ends accept requests and include service logic  Workers perform the actual operations  Caches store content before and after processing  User profile DB stores user preferences (ACID!)  Manager performs load balancing  Graphical monitor supports administrators

Software Structured in Three Layers  SNS (Scalable Network Service)  To be reused across clusters  TACC (Transformation, Aggregation, Caching, Customization)  Easily extensible through building blocks  Service  Application-specific

The SNS Support Layer  Components replicated for availability and scalability  Instances are independent from each other, launched on demand  Exceptions: Network, resource manager, user profile database  Control decisions isolated in front ends  Workers are simple and stateless (profile data part of input)  Load balancing enabled by central manager  Load data collected from workers and distributed to FEs  Centralization limits complexity  Overflow pool helps handle bursts  User workstations within an organization (is this realistic?)

The SNS Support Layer (cont.)  Timeouts detect failures, peers restart services  Cached state used to initialize service  Soft state rebuilt over time  Stubs hide complexity from front ends and workers  Manager stubs select workers  Including load balancing decisions based on hints  Worker stubs isolate workers from complexities of system  Including hiding multi-threaded operation  Also report back load information

The TACC Programming Model  Transformation  Changes content of a single resource  Aggregation  Collects data from several sources  Caching  Avoids lengthy accesses to origin servers  Avoids repeatedly performing transformations/aggregations  Customization  Based on key-value pairs automatically provided w/request  Key goal: Compose workers just like Unix pipes

TranSend Implementation  Web distillation proxy for Berkeley’s modem pool  Front ends process HTTP requests  Look up user preferences  Build pipeline of distillers  Centralized manager tracks distillers  Announces itself on IP multicast group  Collects load information as running estimate  Performs lottery-scheduling between instances  Actually, the scheduling is performed by manager stub  Spawns new instances of distillers  May utilize overflow pool during bursts

TranSend Implementation (cont.)  Fault tolerance and recovery provided by peers  Based on broken connections, timeouts, beacons  For former, distillers register with manager  Manager (re)starts distillers and front ends  Front ends restart manager  Overall scalability and availability aided by BASE  Load balancing data is (slightly) stale  Timeouts allow recovery from incorrect choices  Content is soft state and can be recovered from cache/server  Answers may be approximate (on system overload)  Use cached data instead of recomputing data

Comparison between TranSend and HotBot

TranSend Evaluation  Based on traces from modem pool  25,000 users  600 modems (14.4K and 28.8K)  20 million HTTP requests (over 1 1/2 months)  Distribution of content sizes  Most content accessed is small  But average byte part of large content  Therefore, distillation makes sense  Evaluation of TranSend based on trace playback engine, isolated cluster

Burstiness of Requests  Three different time scales  24 hours  3.5 hours  3.5 minutes  How to manage overflow pool?  Select utilization level for worker pool  Or, select percent time for overflow pool  Are these the right strategies?

Distiller Performance  Not surprisingly, grows with input size: 8ms/1KB  However, with large variation

Cache Partition Performance  Summary of results  Average cache hit takes 27ms, 15ms of which used for TCP  What’s wrong with this picture?  95% of all cache hits take less than 100ms to service  Miss penalty varies widely: 100ms to 100s  Some observations  Cache hit rate grows with cache size, but plateaus for population size (why?)  Cache hit rate also grows with population size, even for fixed-size cache (why?)  What do we conclude for front end capacity?

Self-Tuning and Load Balancing  Spawning threshold H, stability period D

Self-Tuning and Load Balancing The Gory Details  What is the trade-off for D?

Scalability under Load  What are the most important bottlenecks?  How can we overcome them?

What Do You Think?