U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

50.003: Elements of Software Construction Week 10 Thread Pool.
CSC Multiprocessor Programming, Spring, 2011 Outline for Chapter 6 – Task Execution Dr. Dale E. Parson, week 7.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Deadlock Emery Berger and Mark Corner University of Massachusetts.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.
Operating Systems CMPSCI 377 Lecture 11: Memory Management
Computer Science Lecture 6, page 1 CS677: Distributed OS Processes and Threads Processes and their scheduling Multiprocessor scheduling Threads Distributed.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Server Architecture Models Operating Systems Hebrew University Spring 2004.
CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.
Multithreading in Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Lecture 8 Epidemic communication, Server implementation.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Resource containers: A new facility for resource management in server systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Fundamentals of Python: From First Programs Through Data Structures
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
HTTP; The World Wide Web Protocol
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Threads, Thread management & Resource Management.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
Multi-Core Architectures
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Processes & Threads Emery Berger and Mark Corner University.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Status of the vector transport prototype Andrei Gheata 12/12/12.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Emery Berger and Mark Corner University of Massachusetts Amherst Computer Systems.
Threaded Programming in Python Adapted from Fundamentals of Python: From First Programs Through Data Structures CPE 401 / 601 Computer Network Systems.
Spring/2002 Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads 1 Reusing threads.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Software Systems Advanced Synchronization Emery Berger and Mark Corner University.
Department of Computer Science and Software Engineering
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Acknowledgement: These slides are adapted from slides provided in Thißen & Spaniol's course Distributed Systems and Middleware, RWTH Aachen Processes Distributed.
Martin Kruliš by Martin Kruliš (v1.1)1.
Threads. Readings r Silberschatz et al : Chapter 4.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CS122B: Projects in Databases and Web Applications Spring 2017
CS122B: Projects in Databases and Web Applications Winter 2017
Threaded Programming in Python
Remote execution of long-running CGIs
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Processes and Threads Processes and their scheduling
Task Scheduling for Multicore CPUs and NUMA Systems
Processes The most important processes used in Web-based systems and their internal organization.
Internet Networking recitation #12
CS122B: Projects in Databases and Web Applications Winter 2018
CS122B: Projects in Databases and Web Applications Spring 2018
Prof. Leonardo Mostarda University of Camerino
Threads David Ferry CSCI 3500 – Operating Systems
Database System Architectures
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University of Massachusetts Amherst

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 200x100/75.jpg not found client web server Web Server  Client (browser) –Requests HTML, images  Server –Caches requests –Sends to client

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 3 Possible Implementation while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 4 Possible Implementation while (true) { wait for connection; // net read from socket & parse URL; // cpu look up URL contents in cache; // cpu if (!in cache) { fetch from disk / execute CGI;//disk put in cache; // cpu } send data to client; // net }

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science clients web server Problem: Concurrency  Sequential fine until: –More clients –Bigger server Multicores, multiprocessors  Goals: –Hide latency of I/O Don’t keep clients waiting –Improve throughput Serve up more pages

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 6 Building Concurrent Apps  Patterns / Architectures –Thread pools –Producer-consumer –“Bag of tasks” –Worker threads (work stealing)  Goals: –Minimize latency –Maximize parallelism –Keep progs. simple to program & maintain

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 7 Thread Pools  Thread creation relatively expensive  Instead: use pool of threads –When new task arrives, get thread from pool to work on it; block if pool empty –Faster with many tasks –Limits max threads (thus resources) –( ThreadPoolExecutor class in Java)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 8 producer consumer Producer-Consumer  Can get pipeline parallelism: –One thread (producer) does work E.g., I/O –and hands it off to other thread (consumer)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 9 producer consumer Producer-Consumer  Can get pipeline parallelism: –One thread (producer) does work E.g., I/O –and hands it off to other thread (consumer)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 10 producer consumer Producer-Consumer  Can get pipeline parallelism: –One thread (producer) does work E.g., I/O –and hands it off to other thread (consumer)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 11 producer consumer Producer-Consumer  Can get pipeline parallelism: –One thread (producer) does work E.g., I/O –and hands it off to other thread (consumer)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 12 producer consumer LinkedBlockingQueue Blocks on put() if full, poll() if empty Producer-Consumer  Can get pipeline parallelism: –One thread (producer) does work E.g., I/O –and hands it off to other thread (consumer)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 13 while (true) { do something… queue.put (x); } while (true) { x = queue.poll(); do something… } while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } Producer-Consumer Web Server  Use 2 threads: producer & consumer –queue.put(x) and x = queue.poll();

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 14 while (true) { wait for connection; read from socket & parse URL; queue.put (URL); } while (true) { URL = queue.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } Producer-Consumer Web Server  Pair of threads – one reads, one writes

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 15 while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); return; } send data to client; } while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; } 1 2 Producer-Consumer Web Server  More parallelism – optimizes common case (cache hit)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 16 When to Use Producer-Consumer  Works well for pairs of threads –Best if producer & consumer are symmetric Proceed roughly at same rate –Order of operations matters  Not as good for –Many threads –Order doesn’t matter –Different rates of progress

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 17 while (true) { wait for connection; read from socket & parse URL; queue1.put (URL); } while (true) { URL = queue1.poll(); look up URL contents in cache; if (!in cache) { queue2.put (URL); } send data to client; } while (true) { URL = queue2.poll(); fetch from disk / execute CGI; put in cache; send data to client; } 1 2 Producer-Consumer Web Server  Should balance load across threads

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 18 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 19 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 20 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 21 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 22 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 23 worker Bag of Tasks  Collection of mostly independent tasks

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 24 worker addWork Bag of Tasks  Collection of mostly independent tasks  Bag could also be LinkedBlockingQueue (put, poll)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 25 while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } Exercise: Restructure into BOT  Re-structure this into bag of tasks: –addWork & worker threads –t = bag.poll() or bag.put(t)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 26 addWork: while (true) { wait for connection; t.URL = URL; t.sock = socket; bag.put (t); } Worker: while (true) { t = bag.poll(); look up t.URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client via t.sock; } Exercise: Restructure into BOT  Re-structure this into bag of tasks: –addWork & worker –t = bag.poll() or bag.put(t)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 27 worker addWork: while (true){ wait for connection; bag.put (URL); } worker: while (true) { URL = bag.poll(); look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } worker addWork Bag of Tasks Web Server  Re-structure this into bag of tasks: –t = bag.poll() or bag.put(t)

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 28 Bag of Tasks vs. Prod/Consumer  Exploits more parallelism  Even with coarse-grained threads –Don’t have to break up tasks too finely  What does task size affect? –possibly latency… smaller might be better  Easy to change or add new functionality  But: one major performance problem…

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 29 worker addWork What’s the Problem?

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 30 worker addWork What’s the Problem?  Contention – single lock on structure –Bottleneck to scalability

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 31 executor Work Queues  Each thread has own work queue (deque) –No single point of contention  Threads now generic “executors” –Tasks (balls): blue = parse, yellow = connect…

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science executor 32 Work Queues  Each thread has own work queue (deque) –No single point of contention

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 33 executor Work Queues  Each thread has own work queue (deque) –No single point of contention

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science executor 34 Work Queues  Each thread has own work queue (deque) –No single point of contention

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 35 executor Work Queues  Each thread has own work queue –No single point of contention  Now what?

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 36 worker Work Stealing  When thread runs out of work, steal work from random other thread

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 37 worker Work Stealing  When thread runs out of work, steal work from top of random deque  Optimal load balancing algorithm

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 38 while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; } Work Stealing Web Server  Re-structure: readURL, lookUp, addToCache, output –myQueue.put(new readURL (url))

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science  readURL, lookUp, addToCache, output  class Work { public: virtual void run(); };  class readURL : public Work { public: void run() {…} readURL (socket s) { …} }; 39 while (true) { wait for connection; read from socket & parse URL; look up URL contents in cache; if (!in cache) { fetch from disk / execute CGI; put in cache; } send data to client; }

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 40 readURL output lookUp addToCache worker

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science  class readURL { public: void run() { read from socket, f = get file myQueue.put (new lookUp(_s, f)); } readURL(socket s) { _s = s; } }; 41

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science  class lookUp { public: void run() { look in cache for file “f” if (!found) myQueue.put (new addToCache(_f)); else myQueue.put (new Output(s, cont)); } lookUp (socket s, string f) { _s = s; _f = f; } }; 42

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science  class addToCache { public: void run() { fetch file f from disk into cont add file to cache (hashmap) myQueue.put (new Output(s, cont)); } 43

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 44 readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); } Work Stealing Web Server  Re-structure: readURL, lookUp, addToCache, output –myQueue.put(new readURL (url))

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 45 readURL(url) { wait for connection; read from socket & parse URL; myQueue.put (new lookUp (URL)); } lookUp(url) { look up URL contents in cache; if (!in cache) { myQueue.put (new addToCache (URL)); } else { myQueue.put (new output(contents)); } addToCache(URL) { fetch from disk / execute CGI; put in cache; myQueue.put (new output(contents)); } Work Stealing Web Server  Re-structure: readURL, lookUp, addToCache, output –myQueue.put(new readURL (url))

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 46 Work Stealing  Works great for heterogeneous tasks –Convert addWork and worker into units of work (different colors)  Flexible: can easily re-define tasks –Coarse, fine-grained, anything in-between  Automatic load balancing  Separates thread logic from functionality  Popular model for structuring servers

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 47 The End