Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004.

Slides:



Advertisements
Similar presentations
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Advertisements

Chess Review May 8, 2003 Berkeley, CA Compiler Support for Multithreaded Software Jeremy ConditRob von Behren Feng ZhouEric Brewer George Necula.
1 Capriccio: Scalable Threads for Internet Services Matthew Phillips.
SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Capriccio: Scalable Threads for Internet Services (von Behren) Kenneth Chiu.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer University of California at Berkeley.
G Robert Grimm New York University Extensibility: SPIN and exokernels.
Capriccio: Scalable Threads For Internet Services Authors: Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, Eric Brewer Presentation by: Will.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
3.5 Interprocess Communication
Microkernels: Mach and L4
CS533 Concepts of Operating Systems Class 3 Integrated Task and Stack Management.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer Computer Science Division, University of California.
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.
Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer University of California at Berkeley
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services by, Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Threads, Thread management & Resource Management.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Presented by Changdae Kim and Jaeung Han OFFENCE.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
5204 – Operating Systems Threads vs. Events. 2 CS 5204 – Operating Systems Forms of task management serial preemptivecooperative (yield) (interrupt)
Department of Computer Science and Software Engineering
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Capriccio: Scalable Threads for Internet Service
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
SEDA An architecture for Well-Conditioned, scalable Internet Services Matt Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium.
1 Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer (May 2003) CS533 – Spring 2006 – DONG, QIN.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
CS533 Concepts of Operating Systems Jonathan Walpole.
Paper Review of Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer By Anandhi Sundaram.
Operating System Concepts
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Capriccio:Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Why Events Are A Bad Idea (for high-concurrency servers)
Chapter 3: Process Concept
Operating Systems : Overview
William Stallings Computer Organization and Architecture
Presenter: Godmar Back
Capriccio – A Thread Model
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
Operating Systems : Overview
Multithreaded Programming
Capriccio: Scalable Threads for Internet Services
Why Events Are a Bad Idea (for high concurrency servers)
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
CS 5204 Operating Systems Lecture 5
Presentation transcript:

Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004

Server Applications High concurrency Internet servers & frameworks Web servers, e-commerce, etc. Transaction processing databases Many independent tasks Workload High performance Unpredictable load spikes Operate “near the knee” Avoid thrashing! Ideal Peak: some resource at max Overload: some resource thrashing Load (concurrent tasks) Performance

The Price of Concurrency What makes concurrency hard? Race conditions Code complexity Scalability (no O(n) operations) Scheduling & resource sensitivity Inevitable overload Performance vs. Programmability No current system solves Need tools to manage complexity Performance Ease of Programming Thr ead s Events Id ea l

Insight: Many Standard Idioms Memory management Object pools Generic containers (lists, trees, etc.) Ownership transfer Timeouts Error handling Protocol processing Locking Component interaction APIs

Insight: Wrong System Boundaries OS Assumes adversarial applications Provides “one size fits all” service Hides performance details Implicit performance API Runtime No application knowledge Generic, thin shim b/w app and OS Language / Compiler Throws away programmer idioms Assumes infinite stacks

Application control & extensible kernels ACPM, U-net, scheduler activiations SPIN, Exokernel Problem: hard to use OS-level adaptation Simple scheduling tweaks Disk access pattern prediction Problem: too general, ignores app structure Language Race detection: Flannagan & Qadeer, NesC first-class concurrency: Java, ML Problem: ignores the OS and runtime Related Work

Solution: Holistic System Programming OS Flexible and lightweight Greater user-level control (Declarative API) Integrated runtime & compiler Control concurrency model & scheduling Improve performance Provide guarantees Manage resources Analysis, prediction and adaptation Programming Environment Simple, expressive programming interface

Capriccio: Better Threads Goals Simplify the programming model Thread per concurrent activity Scalability (100K+ threads) Support existing APIs and tools Automate application-specific customization Mechanisms User-level threads Plumbing: avoid O(n) operations Compile-time analysis Run-time analysis

Capriccio Internals Cooperative user-level threads Fast context switches Lightweight synchronization Kernel Mechanisms Asynchronous I/O (Linux) Efficiency Avoid O(n) operations Fast, flexible scheduling Scales to over 100,000 threads

Thread Performance Uncontested mutex lock Context Switch Thread Creation NPTLLinuxThreadsCapriccio-notraceCapriccio Slightly slower thread creation Faster context switches Even with stack traces! Much faster mutexes Time of thread operations (microseconds)

The Blocking Graph Lessons from event systems Break app into stages Schedule based on stage priorities Allows SRCT scheduling, finding bottlenecks, etc. Capriccio does this for threads Deduce stage with stack traces at blocking points Prioritize based on runtime information Acce pt WriteRead Open Web Server Close

Resource-Aware Scheduling Track resources along BG edges Memory, file descriptors, CPU Predict future from the past Algorithm Increase use when underutilized Decrease use near saturation Advantages Operate near the knee w/o thrashing Automatic admission control Acce pt WriteRead Open Web Server Close

Apache Blocking Graph

Runtime Overhead Tested Apache Stack linking 78% slowdown for null call 3-4% overall Resource statistics 2% (on all the time) 0.1% (with sampling) Stack traces 8% overhead

Web Server Performance Simple web server: Knot 700 lines of C code A few days to write Network-bound test SPEC web, 100MB workload KnotA: favor accept() KnotC: favor connections Results Low overhead  high throughput Overload in Haboob and KnotA Due to poll() overhead Scheduling prevents overload in KnotC KnotC (Favor Connections) KnotA (Favor Accept) Haboob Concurrent Clients Bandwidth (Mb/s)

Web Server Performance Disk-bound test SPEC Web, 3 GB workload Apache w/ and w/o Capriccio Results Knot and Haboob were similar Roughly 15% improvement for Apache w/ Capriccio No impact from scheduling Need further investigation

Timeline: Capriccio Core Goals Multiprocessor support Fast thread-local memory allocator Better resource tracking Additional scheduling options Status Multiprocessor support (mostly) working Recently finished rewrite should make most other features easier to add Deadline August 2004

Timeline: Kernel Work Goals Unified interface for disk & network I/O Support for simple atomic sections Status Initial design of kernel API complete Basic support for many things exists in Linux 2.6 Prototype of some kernel features done (Feng Zhou) Deadline December 2004

Timeline: Compiler Tools Goals Linked stacks Improve/integrate Jeremy Condit's work on this Stack fingerprints Limited thread-local state analysis Status Use Cil for program transformations and analysis framework Others working on related things Jeremy Condit & Bill McCloskey Deadline March 2005

Conclusion Holistic approach to systems programming Examine entire stack Powerful, adaptive runtime Exploit program structure Promising initial progress Capriccio threading system (+SOSP paper) Blocking graph Resource-aware scheduling Future work Kernel interface Simple compile-time tools

Microbenchmark: Buffer Cache

Microbenchmark: Disk I/O

Microbenchmark: Producer / Consumer

Microbenchmark: Pipe Test

The Case for User-Level Threads Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning Threads App OS User

The Case for User-Level Threads Threads OS User App Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning

Safety: Linked Stacks The problem: fixed stacks Overflow vs. wasted space Limits thread numbers The solution: linked stacks Allocate space as needed Compiler analysis Add runtime checkpoints Guarantee enough space until next check Fixed Stacks Linked Stack waste overflow

Linked Stacks: Algorithm Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack MaxPath = 8

Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

Linked Stacks: Algorithm MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack