Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio.

Slides:

Advertisements

Similar presentations

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Advertisements

Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.

Chess Review May 8, 2003 Berkeley, CA Compiler Support for Multithreaded Software Jeremy ConditRob von Behren Feng ZhouEric Brewer George Necula.

Introduction CSCI 444/544 Operating Systems Fall 2008.

1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

CS533 Concepts of Operating Systems Jonathan Walpole.

1 Capriccio: Scalable Threads for Internet Services Matthew Phillips.

SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.

Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

Capriccio: Scalable Threads for Internet Services (von Behren) Kenneth Chiu.

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer University of California at Berkeley.

Capriccio: Scalable Threads For Internet Services Authors: Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, Eric Brewer Presentation by: Will.

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.

Why Events Are a Bad Idea (for high-concurrency servers) Author: Rob von Behren, Jeremy Condit and Eric Brewer Presenter: Wenhao Xu Other References: [1]

CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.

Operational Analysis L. Grewe. Operational Analysis Relationships that do not require any assumptions about the distribution of service times or inter-arrival.

Microkernels: Mach and L4

Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer Computer Science Division, University of California.

CS 3013 & CS 502 Summer 2006 Threads1 CS-3013 & CS-502 Summer 2006.

Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer University of California at Berkeley

Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer Presented by: Hisham Benotman CS533 - Concepts of.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services by, Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.

Servers: Concurrency and Performance Jeff Chase Duke University.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania

Extensibility, Safety and Performance in the SPIN Operating System Ashwini Kulkarni Operating Systems Winter 2006.

Threads, Thread management & Resource Management.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Presented by Changdae Kim and Jaeung Han OFFENCE.

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.

5204 – Operating Systems Threads vs. Events. 2 CS 5204 – Operating Systems Forms of task management serial preemptivecooperative (yield) (interrupt)

Department of Computer Science and Software Engineering

Full and Para Virtualization

CSE 60641: Operating Systems Next topic: CPU (Process/threads/scheduling, synchronization and deadlocks) –Why threads are a bad idea (for most purposes).

6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.

Capriccio: Scalable Threads for Internet Service

By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan

SEDA An architecture for Well-Conditioned, scalable Internet Services Matt Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium.

Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004.

1 Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer (May 2003) CS533 – Spring 2006 – DONG, QIN.

An Efficient Threading Model to Boost Server Performance Anupam Chanda.

CS533 Concepts of Operating Systems Jonathan Walpole.

Paper Review of Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer By Anandhi Sundaram.

1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.

Capriccio:Scalable Threads for Internet Services

SEDA: An Architecture for Scalable, Well-Conditioned Internet Services

Capriccio : Scalable Threads for Internet Services

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Why Events Are A Bad Idea (for high-concurrency servers)

Processes and Threads Processes and their scheduling

William Stallings Computer Organization and Architecture

Presenter: Godmar Back

Capriccio – A Thread Model

Capriccio: Scalable Threads for Internet Services

Why Events Are a Bad Idea (for high concurrency servers)

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

CS 5204 Operating Systems Lecture 5

Presentation transcript:

Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio

Ben Y. Outline Overview of cluster services lessons from giant-scale services (BR01) SEDA (staged event-driven architecture) Capriccio

Ben Y. Scalable Servers Clustered services natural platform for large web services search engines, DB servers, transactional servers Key benefit low cost of computing, COTS vs. SMP incremental scalability load balance traffic/requests across servers Extension from single server model reliable/fast communication, but partitioned data

Ben Y. Goals Failure transparency hot-swapping components w/o loss of avail homogeneous functionality and/or replication Load balancing partition data / requests for max service rate need to colocate requests w/ associated data Scalability aggregate performance should scale w/# of servers

Ben Y. Two Different Models Read-mostly data web servers, DB servers, search engines (query) replicate across servers + (RR DNS / redirector) IP Network (WAN) client Round Robin DNS

Ben Y. Two Different Models … Read-write model mail servers, e-commerce sites, hosted services small(er) replication factor for stronger consistency IP Network (WAN) client Load Redirector

Ben Y. Key Architecture Challenges Providing high availability availability across component failures Handling flash crowds / peak load need support for massive concurrency Other challenges upgradability: maintaining availability and minimal cost during upgrades in S/W, H/W, functionality error diagnosis: fast isolation of failures / performance degradation

Ben Y. Nuggets Definition uptime = (MTBF – MTTR)/MTBF yield = queries completed / queries offered harvest = data available / complete data MTTR at least as important at MTBF much easier to tune and quantify DQ principle data/query x queries/second  constant physical bottlenecks limit overall throughput

Ben Y. Staged Event-driven Architecture SEDA (SOSP’05)

Ben Y. Break… Come back in 5 mins more on threads vs. events…

Ben Y. Tapestry Software Architecture SEDA event-driven framework Java Virtual Machine Dynamic Tap. distance map core router application programming interface applications Patchwork network

Ben Y. Impact of Correlated Events web / application servers independent requests maximize individual throughput Network ? ? ? ? ? ? ? A B C correlated requests: A+B+C  D e.g. online continuous queries, sensor aggregation, p2p control layer, streaming data mining event handler ++=

Ben Y. Capriccio User-level light-weight threads (SOSP03) Argument threads are the natural programming model current problems result of implementation not fundamental flaw Approach aim for massive scalability compiler assistance linked stacks, block graph scheduling

Ben Y. The Price of Concurrency Why is concurrency hard? Race conditions Code complexity Scalability (no O(n) operations) Scheduling & resource sensitivity Inevitable overload Performance vs. Programmability No good solution Performance Ease of Programming Threads Events Ideal

Ben Y. The Answer: Better Threads Goals Simple programming model Good tools & infrastructure Languages, compilers, debuggers, etc. Good performance Claims Threads are preferable to events User-Level threads are key

Ben Y. “But Events Are Better!” Recent arguments for events Lower runtime overhead Better live state management Inexpensive synchronization More flexible control flow Better scheduling and locality All true but… Lauer & Needham duality argument Criticisms of specific threads packages No inherent problem with threads!

Ben Y. Criticism: Runtime Overhead Criticism: Threads don’t perform well for high concurrency Response Avoid O(n) operations Minimize context switch overhead Simple scalability test Slightly modified GNU Pth Thread-per-task vs. single thread Same performance!

Ben Y. Criticism: Synchronization Criticism: Thread synchronization is heavyweight Response Cooperative multitasking works for threads, too! Also presents same problems Starvation & fairness Multiprocessors Unexpected blocking (page faults, etc.) Both regimes need help Compiler / language support for concurrency Better OS primitives

Ben Y. Criticism: Scheduling Criticism: Thread schedulers are too generic Can’t use application-specific information Response 2D scheduling: task & program location Threads schedule based on task only Events schedule by location (e.g. SEDA) Allows batching Allows prediction for SRCT Threads can use 2D, too! Runtime system tracks current location Call graph allows prediction Task Program Location Threads Events

Ben Y. The Proof’s in the Pudding User-level threads package Subset of pthreads Intercept blocking system calls No O(n) operations Support > 100K threads 5000 lines of C code Simple web server: Knot 700 lines of C code Similar performance Linear increase, then steady Drop-off due to poll() overhead KnotC (Favor Connections) KnotA (Favor Accept) Haboob Concurrent Clients Mbits / second

Ben Y. Arguments For Threads More natural programming model Control flow is more apparent Exception handling is easier State management is automatic Better fit with current tools & hardware Better existing infrastructure

Ben Y. Why Threads: control Flow Events obscure control flow For programmers and tools ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); }... ExitHandler(struct session *s) { …; unpin(&s); free_session(s); } Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit

Ben Y. ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); }... ExitHandler(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } Events obscure control flow For programmers and tools Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit Why Threads: control Flow

Ben Y. Why Threads: Exceptions Exceptions complicate control flow Harder to understand program flow Cause bugs in cleanup code Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); }... ExitHandler(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

Ben Y. Why Threads: State Management ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); }... ExitHandler(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit Events require manual state management Hard to know when to free Use GC or risk bugs

Ben Y. Why Threads: Existing Infrastructure Lots of infrastructure for threads Debuggers Languages & compilers Consequences More amenable to analysis Less effort to get working systems

Ben Y. Building Better Threads Goals Simplify the programming model Thread per concurrent activity Scalability (100K+ threads) Support existing APIs and tools Automate application-specific customization Mechanisms User-level threads Plumbing: avoid O(n) operations Compile-time analysis Run-time analysis

Ben Y. Case for User-Level Threads Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning Threads App OS User

Ben Y. Case for User-Level Threads Threads OS User App Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning Similar argument to the design of overlay networks

Ben Y. Capriccio Internals Cooperative user-level threads Fast context switches Lightweight synchronization Kernel Mechanisms Asynchronous I/O (Linux) Efficiency Avoid O(n) operations Fast, flexible scheduling

Ben Y. Safety: Linked Stacks The problem: fixed stacks Overflow vs. wasted space LinuxThreads: 2MB/stack Limits thread numbers The solution: linked stacks Allocate space as needed Compiler analysis Add runtime checkpoints Guarantee enough space until next check Fixed Stacks Linked Stack waste overflow

Ben Y. Linked Stacks: Algorithm Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack MaxPath = 8

Ben Y. Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack

Ben Y. Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack

Ben Y. Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack

Ben Y. Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack

Ben Y. Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back chkpts limit MaxPath length Special Cases Function pointers External calls Use large stack

Ben Y. Special Cases Function pointers categorize f* by # and type of arguments “guess” which func will/can be called External functions users annotate trusted stack bounds on libs or (re)use a small # of large stack chunks Result use/reuse stack chunks much like VM can efficiently share stack chunks memory-touch benchmark, factor of 3 reduction in paging cost

Ben Y. Scheduling: Blocking Graph Lessons from event systems Break app into stages Schedule based on stage priorities Allows SRCT scheduling, finding bottlenecks, etc. Capriccio does this for threads Deduce stage with stack traces at blocking points Prioritize based on runtime information Accept Write Read Open Web Server Close

Ben Y. Resource-Aware Scheduling Track resources used along BG edges Memory, file descriptors, CPU Predict future from the past Algorithm Increase use when underutilized Decrease use near saturation Advantages Operate near the knee w/o thrashing Automatic admission control Accept Write Read Open Web Server Close

Ben Y. Pitfalls What is the max amt of resource? depends on workload e.g.: disk thrashing depends on sequential or random seeks use early signs of thrashing to indicate max capacity Detecting thrashing only estimate using “productivity/overhead” productivity from guessing (threads created, files opened/closed)

Ben Y. Thread Performance CapriccioCapriccio-notraceLinuxThreadsNPTL Thread Creation Context Switch Uncontested mutex lock Slightly slower thread creation Faster context switches Even with stack traces! Much faster mutexes Time of thread operations (microseconds)

Ben Y. Runtime Overhead Tested Apache Stack linking 78% slowdown for null call 3-4% overall Resource statistics 2% (on all the time) 0.1% (with sampling) Stack traces 8% overhead

Ben Y. Microbenchmark: Producer / Consumer

Ben Y. Web Server Performance

Ben Y. Example of “Great Systems Paper” observe higher level issue threads vs. event programming abstraction use previous work (duality) to identify problem why are threads not as efficient as events? good systems design call graph analysis for linked stacks resource aware scheduling good execution full solid implementation analysis leading to full understanding of detailed issues cross-area approach (help from PL research)

Ben Y. Acknowledgements Many slides “borrowed” from the respective talks / papers: Capriccio (Rob von Behren) SEDA (Matt Welsh) Brewer01: “Lessons…”