Anthony D. Joseph and Ion Stoica

Slides:

Advertisements

Similar presentations

Race Conditions. Isolated & Non-Isolated Processes Isolated: Do not share state with other processes –The output of process is unaffected by run of other.

Advertisements

CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads February 1, 2006 Prof. Anthony D. Joseph

CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads September 14, 2009 Prof. John Kubiatowicz

CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads September 14, 2005 Prof. John Kubiatowicz

Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.

Threads Many software packages are multi-threaded Web browser: one thread display images, another thread retrieves data from the network Word processor:

CSC139 Operating Systems Lecture 6 Synchronization Adapted from Prof. John Kubiatowicz's lecture notes for CS162 Copyright.

COSC 3407: Operating Systems Lecture 6: Synchronization.

Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.

Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.

COSC 3407: Operating Systems Lecture 5: Independent Vs. Cooperating Threads.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

CS 162 Discussion Section Week 2. Who am I? Haoyuan (HY) Li Office Hours: 1pm-3pm

CS 162 Discussion Section Week 2. Who am I? Wesley Chow Office Hours: 12pm-2pm 411 Soda Does Monday 1-3 work for everyone?

CS162 Operating Systems and Systems Programming Lecture 3 Concurrency and Thread Dispatching September 7 th, 2011 Anthony D. Joseph and Ion Stoica

CS 162 Discussion Section Week 2 (9/16 – 9/20). Who am I? Kevin Klues Office Hours:

1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.

CS162 Operating Systems and Systems Programming Lecture 2 Synchronization January 26 th, 2011 Ion Stoica

Goals for Today How do we provide multiprogramming? What are Processes? How are they related to Threads and Address Spaces? Note: Some slides and/or pictures.

CS162 Operating Systems and Systems Programming Lecture 4 Synchronization, Atomic operations, Locks September 10, 2012 Ion Stoica

CPS110: Implementing threads on a uni-processor Landon Cox January 29, 2008.

CS 162 Discussion Section Week 2. Who am I? Prashanth Mohan Office Hours: 11-12pm Tu W at.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.

Multiprogramming. Readings r Chapter 2.1 of the textbook.

Introduction to Operating Systems Concepts

Computer System Structures Interrupts

February 10, 2016 Prof. David Culler

Threads & Multithreading

Processes and threads.

Process concept.

CS 6560: Operating Systems Design

CSCI 511 Operating Systems Ch3, 4 (Part C) Cooperating Threads

CS162 Operating Systems and Systems Programming Lecture 6 Synchronization September 17, 2007 Prof. John Kubiatowicz

Advanced OS Concepts (For OCR)

CSCS 511 Operating Systems Ch3, 4 (Part B) Further Understanding Threads Dr. Frank Li Skipped sections 3.5 & 3.6

Lecture Topics: 11/1 Processes Process Management

CS399 New Beginnings Jonathan Walpole.

CS162 Operating Systems and Systems Programming Lecture 6 Synchronization September 20, 2010 Prof. John Kubiatowicz

Intro to Processes CSSE 332 Operating Systems

CSCI 511 Operating Systems Chapter 5 (Part A) Process Synchronization

CS162 Operating Systems and Systems Programming Lecture 4 Cooperating Threads February 4, 2008 Prof. Anthony D. Joseph

CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads September 15, 2010 Prof. John Kubiatowicz

February 2, 2009 Ion Stoica CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads February.

CS162 Operating Systems and Systems Programming Lecture 3 Concurrency and Thread Dispatching January 30, 2013 Anthony D. Joseph

Chapter 4: Threads.

CS162 Operating Systems and Systems Programming Lecture 5 Synchronization February 6, 2008 Prof. Anthony D. Joseph

Anthony D. Joseph and John Canny

CSCI 511 Operating Systems Ch3, 4 (Part B) Further Understanding Threads Dr. Frank Li Skipped sections 3.5 & 3.6

CSCI 511 Operating Systems Chapter 5 (Part A) Process Synchronization

CSCI 511 Operating Systems Ch3, 4 (Part C) Cooperating Threads

More examples How many processes does this piece of code create?

January 28, 2010 Ion Stoica CS162 Operating Systems and Systems Programming Lecture 4 Thread Dispatching January.

February 4, 2010 Ion Stoica CS162 Operating Systems and Systems Programming Lecture 6 Synchronization February 4,

Process & its States Lecture 5.

Operating System Concepts

Process Description and Control

CS162 Operating Systems and Systems Programming Lecture 5 Cooperating Threads September 15, 2008 Prof. John Kubiatowicz

Lecture Topics: 11/1 General Operating System Concepts Processes

Process Description and Control

CS162 Operating Systems and Systems Programming Lecture 6 Synchronization February 6, 2006 Prof. Anthony D. Joseph

Processes and Process Management

Operating Systems: A Modern Perspective, Chapter 6

Chapter 2 Processes and Threads 2.1 Processes 2.2 Threads

CS510 Operating System Foundations

System calls….. C-program->POSIX call

Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines

Chapter 3: Process Concept

CS703 – Advanced Operating Systems

EECE.4810/EECE.5730 Operating Systems

Presentation transcript:

Anthony D. Joseph and Ion Stoica CS162 Operating Systems and Systems Programming Lecture 3 Concurrency and Thread Dispatching January 25, 2012 Anthony D. Joseph and Ion Stoica http://inst.eecs.berkeley.edu/~cs162

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; addrX: . addrY: . addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: Stack Pointer . Stack Growth addrY: . addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: Stack Pointer . Stack Growth addrY: . addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY Stack Pointer addrY: . Stack Growth addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY C: ret=addrU addrY: Stack Pointer . Stack Growth addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY C: ret=addrU addrY: . A: tmp=2 ret=addrV Stack Pointer Stack Growth addrU: . Stack holds function arguments, return address Permits recursive execution Crucial to modern languages addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY C: ret=addrU addrY: . A: tmp=2 ret=addrV Stack Pointer Stack Growth addrU: . Output: addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY C: ret=addrU addrY: . A: tmp=2 ret=addrV Stack Pointer Stack Growth addrU: . Output: 2 addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY C: ret=addrU addrY: Stack Pointer . Stack Growth addrU: . Output: 2 addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: . B: ret=addrY Stack Pointer addrY: . Stack Growth addrU: . Output: 2 addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; A: tmp=1 ret=addrZ addrX: Stack Pointer . Stack Growth addrY: . addrU: . Output: 2 1 addrV: . addrZ:

Review: Execution Stack Example A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); C() { A(2); A(1); exit; addrX: . addrY: . addrU: . Output: 2 1 addrV: . addrZ:

Goals for Today Thread Dispatching Cooperating Threads Concurrency examples Need for synchronization Note: Some slides and/or pictures in the following are adapted from slides ©2005 Silberschatz, Galvin, and Gagne. Many slides generated from lecture notes by Kubiatowicz.

Single-Threaded Example Imagine the following C program: main() { ComputePI(“pi.txt”); PrintClassList(“clist.text”); } What is the behavior here? Program would never print out class list Why? ComputePI would never finish

Use of Threads Version of program with Threads: main() { CreateThread(ComputePI(“pi.txt”)); CreateThread(PrintClassList(“clist.text”)); } What does “CreateThread” do? Start independent thread running given procedure What is the behavior here? Now, you would actually see the class list This should behave as if there are two separate CPUs CPU1 CPU2 Time

Memory Footprint of Two-Thread Example If we stopped this program and examined it with a debugger, we would see Two sets of CPU registers Two sets of Stacks Questions: How do we position stacks relative to each other? What maximum size should we choose for the stacks? What happens if threads violate this? How might you catch violations? Code Global Data Heap Stack 1 Stack 2 Address Space

Per Thread State Each Thread has a Thread Control Block (TCB) Execution State: CPU registers, program counter, pointer to stack Scheduling info: State, priority, CPU time Various Pointers (for implementing scheduling queues) Pointer to enclosing process? (PCB)? Etc (add stuff as you find a need) OS Keeps track of TCBs in protected memory In Array, or Linked List, or …

Lifecycle of a Thread (or Process) As a thread executes, it changes state: new: The thread is being created ready: The thread is waiting to run running: Instructions are being executed waiting: Thread waiting for some event to occur terminated: The thread has finished execution “Active” threads are represented by their TCBs TCBs organized into queues based on their state

Ready Queue And Various I/O Device Queues Thread not running  TCB is in some scheduler queue Separate queue for each device/signal/condition Each queue can have a different scheduler policy Head Tail Ready Queue Tape Unit 0 Disk Unit 2 Ether Netwk 0 Other State TCB9 Link Registers TCB6 TCB16 Other State TCB2 Link Registers TCB3 Other State TCB8 Link Registers

Dispatch Loop Conceptually, the dispatching loop of the operating system looks as follows: Loop { RunThread(); ChooseNextThread(); SaveStateOfCPU(curTCB); LoadStateOfCPU(newTCB); } This is an infinite loop One could argue that this is all that the OS does Emergency crash of operating system called “panic()”

Running a thread Consider first portion: RunThread() How do I run a thread? Load its state (registers, PC, stack pointer) into CPU Load environment (virtual memory space, etc) Jump to the PC How does the dispatcher get control back? Internal events: thread returns control voluntarily External events: thread gets preempted

Review: Yielding through Internal Events Blocking on I/O The act of requesting I/O implicitly yields the CPU Waiting on a “signal” from other thread Thread asks to wait and thus yields the CPU Thread executes a yield() Thread volunteers to give up CPU computePI() { while(TRUE) { ComputeNextDigit(); yield(); } Note that yield() must be called by programmer frequently enough!

Review: Stack for Yielding Thread ComputePI Stack growth yield run_new_thread kernel_yield Trap to OS switch How do we run a new thread? run_new_thread() { newThread = PickNewThread(); switch(curThread, newThread); ThreadHouseKeeping(); /* deallocates finished threads */ } Finished thread not killed right away. Why? Move them in “exit/terminated” state ThreadHouseKeeping() deallocates finished threads

Review: Stack for Yielding Thread ComputePI Stack growth yield run_new_thread kernel_yield Trap to OS switch How do we run a new thread? run_new_thread() { newThread = PickNewThread(); switch(curThread, newThread); ThreadHouseKeeping(); /* deallocates finished threads */ } How does dispatcher switch to a new thread? Save anything next thread may trash: PC, regs, stack Maintain isolation for each thread

Review: Two Thread Yield Example Consider the following code blocks: proc A() { B(); } proc B() { while(TRUE) { yield(); Suppose we have 2 threads: Threads S and T Thread S Stack growth A B(while) yield run_new_thread switch Thread T A B(while) yield run_new_thread switch

Detour: Interrupt Controller Interrupt Mask Priority Encoder IntID CPU Interrupt Int Disable Timer NMI Control Software Interrupt Network Interrupts invoked with interrupt lines from devices Interrupt controller chooses interrupt request to honor Mask enables/disables interrupts Priority encoder picks highest enabled interrupt Software Interrupt Set/Cleared by Software Interrupt identity specified with ID line CPU can disable all interrupts with internal flag Non-maskable interrupt line (NMI) can’t be disabled

Review: Preemptive Multithreading Use the timer interrupt to force scheduling decisions Timer Interrupt routine: TimerInterrupt() { DoPeriodicHouseKeeping(); run_new_thread(); } This is often called preemptive multithreading, since threads are preempted for better scheduling Solves problem of user who doesn’t insert yield(); Some Routine run_new_thread TimerInterrupt Interrupt switch Stack growth

Announcements We are using Piazza instead of the newsgroup Got to http://www.piazza.com/berkeley/spring2012/cs162 Make an account and join Berkeley, CS 162 Please ask questions on Piazza instead of emailing TAs Section assignments posted on Piazza Attend new sections THIS week Email cs162@cory if you don’t have a group! Suggestions for in-class question technology? Email cs162@cory with ideas Question for the break: Discuss best practices for managing a home computer (think about: things breaking, viruses, earthquakes, …)

5min Break

Why allow cooperating threads? People cooperate; computers help/enhance people’s lives, so computers must cooperate By analogy, the non-reproducibility/non-determinism of people is a notable problem for “carefully laid plans” Advantage 1: Share resources One computer, many users One bank balance, many ATMs What if ATMs were only updated at night? Embedded systems (robot control: coordinate arm & hand) Advantage 2: Speedup Overlap I/O and computation Multiprocessors – chop up program into parallel pieces Advantage 3: Modularity Chop large problem up into simpler pieces To compile, for instance, gcc calls cpp | cc1 | cc2 | as | ld Makes system easier to extend

Threaded Web Server Multithreaded version: serverLoop() { connection = AcceptCon(); ThreadCreate(ServiceWebPage(),connection); } Advantages of threaded version: Can share file caches kept in memory, results of CGI scripts, other things Threads are much cheaper to create than processes, so this has a lower per-request overhead What if too many requests come in at once?

Thread Pools Problem with previous version: Unbounded Threads When web-site becomes too popular – throughput sinks Instead, allocate a bounded “pool” of threads, representing the maximum level of multiprogramming Master Thread Thread Pool queue slave(queue) { while(TRUE) { con=Dequeue(queue); if (con==null) sleepOn(queue); else ServiceWebPage(con); } } master() { allocThreads(slave,queue); while(TRUE) { con=AcceptCon(); Enqueue(queue,con); wakeUp(queue); } }

ATM Bank Server ATM server problem: Service a set of requests Do so without corrupting database Don’t hand out too much money

ATM bank server example Suppose we wanted to implement a server process to handle requests from an ATM network: BankServer() { while (TRUE) { ReceiveRequest(&op, &acctId, &amount); ProcessRequest(op, acctId, amount); } } ProcessRequest(op, acctId, amount) { if (op == deposit) Deposit(acctId, amount); else if … } Deposit(acctId, amount) { acct = GetAccount(acctId); /* may use disk I/O */ acct->balance += amount; StoreAccount(acct); /* Involves disk I/O */ } How could we speed this up? More than one request being processed at once Event driven (overlap computation and I/O) Multiple threads (multi-proc, or overlap comp and I/O)

Event Driven Version of ATM server Suppose we only had one CPU Still like to overlap I/O with computation Without threads, we would have to rewrite in event-driven style Example BankServer() { while(TRUE) { event = WaitForNextEvent(); if (event == ATMRequest) StartOnRequest(); else if (event == AcctAvail) ContinueRequest(); else if (event == AcctStored) FinishRequest(); } } What if we missed a blocking I/O step? What if we have to split code into hundreds of pieces which could be blocking? This technique is used for graphical programming

Can Threads Make This Easier? Threads yield overlapped I/O and computation without “deconstructing” code into non-blocking fragments One thread per request Requests proceeds to completion, blocking as required: Deposit(acctId, amount) { acct = GetAccount(actId); /* May use disk I/O */ acct->balance += amount; StoreAccount(acct); /* Involves disk I/O */ } Unfortunately, shared state can get corrupted: Thread 1 Thread 2 load r1, acct->balance load r1, acct->balance add r1, amount2 store r1, acct->balance add r1, amount1 store r1, acct->balance

Problem is at the lowest level Most of the time, threads are working on separate data, so scheduling doesn’t matter: Thread A Thread B x = 1; y = 2; However, What about (Initially, y = 12): x = y+1; y = y*2; What are the possible values of x? x = 1; x = y+1; y = 2; y = y*2 X could be (13, 5, 3) x=13

Problem is at the lowest level Most of the time, threads are working on separate data, so scheduling doesn’t matter: Thread A Thread B x = 1; y = 2; However, What about (Initially, y = 12): x = y+1; y = y*2; What are the possible values of x? y = 2; y = y*2; x = 1; x = y+1; X could be (13, 5, 3) x=5

Problem is at the lowest level Most of the time, threads are working on separate data, so scheduling doesn’t matter: Thread A Thread B x = 1; y = 2; However, What about (Initially, y = 12): x = y+1; y = y*2; What are the possible values of x? y = 2; x = 1; x = y+1; y= y*2; X could be (13, 5, 3) x=3

Atomic Operations To understand a concurrent program, we need to know what the underlying indivisible operations are! Atomic Operation: an operation that always runs to completion or not at all It is indivisible: it cannot be stopped in the middle and state cannot be modified by someone else in the middle Fundamental building block – if no atomic operations, then have no way for threads to work together On most machines, memory references and assignments (i.e. loads and stores) of words are atomic Many instructions are not atomic Double-precision floating point store often not atomic VAX and IBM 360 had an instruction to copy a whole array

Correctness Requirements Threaded programs must work for all interleavings of thread instruction sequences Cooperating threads inherently non-deterministic and non-reproducible Really hard to debug unless carefully designed! Example: Therac-25 Machine for radiation therapy Software control of electron accelerator and electron beam/ Xray production Software control of dosage Software errors caused the death of several patients A series of race conditions on shared variables and poor software design “They determined that data entry speed during editing was the key factor in producing the error condition: If the prescription data was edited at a fast pace, the overdose occurred.”

Space Shuttle Example Original Space Shuttle launch aborted 20 minutes before scheduled launch Shuttle has five computers: Four run the “Primary Avionics Software System” (PASS) Asynchronous and real-time Runs all of the control systems Results synchronized and compared 440 times per second The Fifth computer is the “Backup Flight System” (BFS) Stays synchronized in case it is needed Written by completely different team than PASS Countdown aborted because BFS disagreed with PASS A 1/67 chance that PASS was out of sync one cycle Bug due to modifications in initialization code of PASS A delayed init request placed into timer queue As a result, timer queue not empty at expected time to force use of hardware clock Bug not found during extensive simulation PASS BFS Very hard to get concurrency correct Heisenbugs are hard to find Late state design changes are hard to handle/text

Summary Concurrent threads are a very useful abstraction Allow transparent overlapping of computation and I/O Allow use of parallel processing when available Concurrent threads introduce problems when accessing shared data Programs must be insensitive to arbitrary interleavings Without careful design, shared variables can become completely inconsistent Important concept: Atomic Operations An operation that runs to completion or not at all These are the primitives on which to construct various synchronization primitives