Capriccio: Scalable Threads for Internet Service

Slides:



Advertisements
Similar presentations
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Advertisements

Chess Review May 8, 2003 Berkeley, CA Compiler Support for Multithreaded Software Jeremy ConditRob von Behren Feng ZhouEric Brewer George Necula.
1 Capriccio: Scalable Threads for Internet Services Matthew Phillips.
Chapter 5 Threads os5.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.
Computer Science Lecture 6, page 1 CS677: Distributed OS Processes and Threads Processes and their scheduling Multiprocessor scheduling Threads Distributed.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Capriccio: Scalable Threads for Internet Services (von Behren) Kenneth Chiu.
Capriccio: Scalable Threads For Internet Services Authors: Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, Eric Brewer Presentation by: Will.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Memory Management 2010.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 4: Threads Dr. Mohamed Hefeeda.
Threads CSCI 444/544 Operating Systems Fall 2008.
Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Objectives Thread definitions and relationship to process Multithreading.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
CS 3013 & CS 502 Summer 2006 Threads1 CS-3013 & CS-502 Summer 2006.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Multithreading Allows application to split itself into multiple “threads” of execution (“threads of execution”). OS support for creating threads, terminating.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Threads, Thread management & Resource Management.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
CS333 Intro to Operating Systems Jonathan Walpole.
1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
5204 – Operating Systems Threads vs. Events. 2 CS 5204 – Operating Systems Forms of task management serial preemptivecooperative (yield) (interrupt)
Chapter 4: Multithreaded Programming. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts What is Thread “Thread is a part of a program.
Department of Computer Science and Software Engineering
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Distributed (Operating) Systems -Processes and Threads-
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004.
1 Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer (May 2003) CS533 – Spring 2006 – DONG, QIN.
Threads. Readings r Silberschatz et al : Chapter 4.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
CS533 Concepts of Operating Systems Jonathan Walpole.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Operating System Concepts
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Lecturer 3: Processes multithreaded Operating System Concepts Process Concept Process Scheduling Operation on Processes Cooperating Processes Interprocess.
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Introduction to threads
Capriccio:Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Why Events Are A Bad Idea (for high-concurrency servers)
Processes and Threads Processes and their scheduling
CS399 New Beginnings Jonathan Walpole.
Presenter: Godmar Back
Capriccio – A Thread Model
CSCI1600: Embedded and Real Time Software
Chapter 4: Threads.
Page Replacement.
Threads Chapter 4.
Multithreaded Programming
Operating Systems Lecture 1.
CS 5204 Operating Systems Lecture 5
CSCI1600: Embedded and Real Time Software
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Capriccio: Scalable Threads for Internet Service

Introduction Internet services have ever-increasing scalability demands Current hardware is meeting these demands Software has lagged behind Recent approaches are event-based Pipeline stages of events

Drawbacks of Events Events systems hide the control flow Difficult to understand and debug Eventually evolved into call-and-return event pairs Programmers need to match related events Need to save/restore states Capriccio: instead of event-based model, fix the thread-based model

Goals of Capriccio Support for existing thread API Little changes to existing applications Scalability to thousands of threads One thread per execution Flexibility to address application-specific needs Threads Ideal Ease of Programming Events Threads Performance

Thread Design Principles Kernel-level threads are for true concurrency User-level threads provide a clean programming model with useful invariants and semantics Decouple user from kernel level threads More portable

Capriccio Thread package All thread operations are O(1) Linked stacks Address the problem of stack allocation for large numbers of threads Combination of compile-time and run-time analysis Resource-aware scheduler

Thread Design and Scalability POSIX API Backward compatible

User-Level Threads + Performance + Flexibility - Complex preemption - Bad interaction with kernel scheduler

Flexibility Decoupling user and kernel threads allows faster innovation Can use new kernel thread features without changing application code Scheduler tailored for applications Lightweight

Performance Reduce the overhead of thread synchronization No kernel crossing for preemptive threading More efficient memory management at user level

Disadvantages Need to replace blocking calls with nonblocking ones to hold the CPU Translation overhead Problems with multiple processors Synchronization becomes more expensive

Context Switches Built on top of Edgar Toernig’s coroutine library Fast context switches when threads voluntarily yield

I/O Capriccio intercepts blocking I/O calls Uses epoll for asynchronous I/O

Scheduling Very much like an event-driven application Events are hidden from programmers

Synchronization Supports cooperative threading on single-CPU machines Requires only Boolean checks

Threading Microbenchmarks SMP, two 2.4 GHz Xeon processors 1 GB memory two 10 K RPM SCSI Ultra II hard drives Linux 2.5.70 Compared Capriccio, LinuxThreads, and Native POSIX Threads for Linux

Latencies of Thread Primitives Capriccio LinuxThreads NPTL Thread creation 21.5 17.7 Thread context switch 0.24 0.71 0.65 Uncontended mutex lock 0.04 0.14 0.15

Thread Scalability Producer-consumer microbenchmark LinuxThreads begin to degrade after 20 threads NPTL degrades after 100 Capriccio scales to 32K producers and consumers (64K threads total)

Thread Scalability

I/O Performance Network performance Token passing among pipes Simulates the effect of slow client links 10% overhead compared to epoll Twice as fast as both LinuxThreads and NPTL when more than 1000 threads Disk I/O comparable to kernel threads

Linked Stack Management LinuxThreads allocates 2MB per stack 1 GB of VM holds only 500 threads Fixed Stacks

Linked Stack Management But most threads consumes only a few KB of stack space at a given time Dynamic stack allocation can significantly reduce the size of VM Linked Stack

Compiler Analysis and Linked Stacks Whole-program analysis Based on the call graph Problematic for recursions Static estimation may be too conservative

Compiler Analysis and Linked Stacks Grow and shrink the stack size on demand Insert checkpoints to determine whether we need to allocate more before the next checkpoint Result in noncontiguous stacks

Placing Checkpoints One checkpoint in every cycle in the call graph Bound the size between checkpoints with the deepest call path

Dealing with Special Cases Function pointers Don’t know what procedure to call at compile time Can find a potential set of procedures

Dealing with Special Cases External functions Allow programmers to annotate external library functions with trusted stack bounds Allow larger stack chunks to be linked for external functions

Tuning the Algorithm Stack space can be wasted Tradeoffs Internal and external fragmentation Tradeoffs Number of stack linkings External fragmentation

Memory Benefits Tuning can be application-specific No preallocation of large stacks Reduced requirement to run a large numbers of threads Better paging behavior Stacks—LIFO

Case Study: Apache 2.0.44 Maximum stack allocation chunk: 2KB Apache under SPECweb99 Overall slowdown is about 3% Dynamic allocation 0.1% Link to large chunks for external functions 0.5% Stack removal 10%

Resource-Aware Scheduling Advantages of event-based scheduling Tailored for applications With event handlers Events provide two important pieces of information for scheduling Whether a process is close to completion Whether a system is overloaded

Resource-Aware Scheduling Thread-based View applications as sequence of stages, separated by blocking calls Analogous to event-based scheduler

Blocking Graph Node: A location in the program that blocked Edge: between two nodes if they were consecutive blocking points Generated at runtime

Resource-Aware Scheduling 1. Keep track of resource utilization 2. Annotate each node with resource used and its outgoing edges 3. Dynamically prioritize nodes Prefer nodes that release resources

Resources CPU Memory (malloc) File descriptors (open, close)

Pitfalls Tricky to determine the maximum capacity of a resource Thrashing depends on the workload Disk can handle more requests that are sequential instead of random Resources interact VM vs. disk Applications may manage memory themselves

Yield Profiling User threads are problematic if a thread fails to yield They are easy to detect, since their running times are orders of magnitude larger Yield profiling identifies places where programs fail to yield sufficiently often

Web Server Performance 4x500 MHz Pentium server 2GB memory Intel e1000 Gigabit Ethernet card Linux 2.4.20 Workload: requests for 3.2 GB of static file data

Web Server Performance Request frequencies match those of the SPECweb99 A client connects to a server repeated and issue a series of five requests, separated by 20ms pauses Apache’s performance improved by 15% with Capriccio

Resource-Aware Admission Control Consumer-producer applications Producer loops, adding memory, and randomly touching pages Consumer loops, removing memory from the pool and freeing it Fast producer may run out of virtual address space

Resource-Aware Admission Control Touching pages too quickly will cause thrashing Capriccio can quickly detect the overload conditions and limit the number of producers

Programming Models for High Concurrency Event Application-specific optimization Thread Efficient thread runtimes

User-Level Threads Capriccio is unique Blocking graph Resource-aware scheduling Target at a large number of blocking threads POSIX compliant

Application-Specific Optimization Most approaches require programmers to tailor their application to manage resources Nonstandard APIs, less portable

Stack Management No garbage collection

Future Work Multi-CPU machines Profiling tools for system tuning