Presentation is loading. Please wait.

Presentation is loading. Please wait.

Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004.

Similar presentations


Presentation on theme: "Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004."— Presentation transcript:

1 Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren (jrvb@cs.berkeley.edu) June 21, 2004

2 Server Applications High concurrency Internet servers & frameworks Web servers, e-commerce, etc. Transaction processing databases Many independent tasks Workload High performance Unpredictable load spikes Operate “near the knee” Avoid thrashing! Ideal Peak: some resource at max Overload: some resource thrashing Load (concurrent tasks) Performance

3 The Price of Concurrency What makes concurrency hard? Race conditions Code complexity Scalability (no O(n) operations) Scheduling & resource sensitivity Inevitable overload Performance vs. Programmability No current system solves Need tools to manage complexity Performance Ease of Programming Thr ead s Events Id ea l

4 Insight: Many Standard Idioms Memory management Object pools Generic containers (lists, trees, etc.) Ownership transfer Timeouts Error handling Protocol processing Locking Component interaction APIs

5 Insight: Wrong System Boundaries OS Assumes adversarial applications Provides “one size fits all” service Hides performance details Implicit performance API Runtime No application knowledge Generic, thin shim b/w app and OS Language / Compiler Throws away programmer idioms Assumes infinite stacks

6 Application control & extensible kernels ACPM, U-net, scheduler activiations SPIN, Exokernel Problem: hard to use OS-level adaptation Simple scheduling tweaks Disk access pattern prediction Problem: too general, ignores app structure Language Race detection: Flannagan & Qadeer, NesC first-class concurrency: Java, ML Problem: ignores the OS and runtime Related Work

7 Solution: Holistic System Programming OS Flexible and lightweight Greater user-level control (Declarative API) Integrated runtime & compiler Control concurrency model & scheduling Improve performance Provide guarantees Manage resources Analysis, prediction and adaptation Programming Environment Simple, expressive programming interface

8 Capriccio: Better Threads Goals Simplify the programming model Thread per concurrent activity Scalability (100K+ threads) Support existing APIs and tools Automate application-specific customization Mechanisms User-level threads Plumbing: avoid O(n) operations Compile-time analysis Run-time analysis

9 Capriccio Internals Cooperative user-level threads Fast context switches Lightweight synchronization Kernel Mechanisms Asynchronous I/O (Linux) Efficiency Avoid O(n) operations Fast, flexible scheduling Scales to over 100,000 threads

10 Thread Performance 0.150.140.04 Uncontested mutex lock 0.650.710.240.56Context Switch 17.737.521.5 Thread Creation NPTLLinuxThreadsCapriccio-notraceCapriccio Slightly slower thread creation Faster context switches Even with stack traces! Much faster mutexes Time of thread operations (microseconds)

11 The Blocking Graph Lessons from event systems Break app into stages Schedule based on stage priorities Allows SRCT scheduling, finding bottlenecks, etc. Capriccio does this for threads Deduce stage with stack traces at blocking points Prioritize based on runtime information Acce pt WriteRead Open Web Server Close

12 Resource-Aware Scheduling Track resources along BG edges Memory, file descriptors, CPU Predict future from the past Algorithm Increase use when underutilized Decrease use near saturation Advantages Operate near the knee w/o thrashing Automatic admission control Acce pt WriteRead Open Web Server Close

13 Apache Blocking Graph

14 Runtime Overhead Tested Apache 2.0.44 Stack linking 78% slowdown for null call 3-4% overall Resource statistics 2% (on all the time) 0.1% (with sampling) Stack traces 8% overhead

15 Web Server Performance Simple web server: Knot 700 lines of C code A few days to write Network-bound test SPEC web, 100MB workload KnotA: favor accept() KnotC: favor connections Results Low overhead  high throughput Overload in Haboob and KnotA Due to poll() overhead Scheduling prevents overload in KnotC 0 100 200 300 400 500 600 700 800 900 1 4 16 64 256 1024 4096 16384 KnotC (Favor Connections) KnotA (Favor Accept) Haboob Concurrent Clients Bandwidth (Mb/s)

16 Web Server Performance Disk-bound test SPEC Web, 3 GB workload Apache w/ and w/o Capriccio Results Knot and Haboob were similar Roughly 15% improvement for Apache w/ Capriccio No impact from scheduling Need further investigation

17 Timeline: Capriccio Core Goals Multiprocessor support Fast thread-local memory allocator Better resource tracking Additional scheduling options Status Multiprocessor support (mostly) working Recently finished rewrite should make most other features easier to add Deadline August 2004

18 Timeline: Kernel Work Goals Unified interface for disk & network I/O Support for simple atomic sections Status Initial design of kernel API complete Basic support for many things exists in Linux 2.6 Prototype of some kernel features done (Feng Zhou) Deadline December 2004

19 Timeline: Compiler Tools Goals Linked stacks Improve/integrate Jeremy Condit's work on this Stack fingerprints Limited thread-local state analysis Status Use Cil for program transformations and analysis framework Others working on related things Jeremy Condit & Bill McCloskey Deadline March 2005

20 Conclusion Holistic approach to systems programming Examine entire stack Powerful, adaptive runtime Exploit program structure Promising initial progress Capriccio threading system (+SOSP paper) Blocking graph Resource-aware scheduling Future work Kernel interface Simple compile-time tools

21 Microbenchmark: Buffer Cache

22 Microbenchmark: Disk I/O

23 Microbenchmark: Producer / Consumer

24 Microbenchmark: Pipe Test

25 The Case for User-Level Threads Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning Threads App OS User

26 The Case for User-Level Threads Threads OS User App Decouple programming model and OS Kernel threads Abstract hardware Expose device concurrency User-level threads Provide clean programming model Expose logical concurrency Benefits of user-level threads Control over concurrency model! Independent innovation Enables static analysis Enables application-specific tuning

27 Safety: Linked Stacks The problem: fixed stacks Overflow vs. wasted space Limits thread numbers The solution: linked stacks Allocate space as needed Compiler analysis Add runtime checkpoints Guarantee enough space until next check Fixed Stacks Linked Stack waste overflow

28 Linked Stacks: Algorithm 5 4 2 6 3 3 23 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack MaxPath = 8

29 Linked Stacks: Algorithm 5 4 2 6 3 3 23 MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

30 Linked Stacks: Algorithm 54263 3 23 MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

31 Linked Stacks: Algorithm 54263 3 23 MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

32 Linked Stacks: Algorithm 54263 3 23 MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack

33 Linked Stacks: Algorithm 54263 3 23 MaxPath = 8 Parameters MaxPath MinChunk Steps Break cycles Trace back Special Cases Function pointers External calls Use large stack


Download ppt "Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004."

Similar presentations


Ads by Google