Review: threads and synchronization

Review: threads and synchronization
Landon Cox January 13/18, 2017

Intro to processes Remember, for any area of OS, ask Physical reality?
What interface does the hardware provide? What interface does the OS provide? Physical reality? Single computer (CPUs + memory) Execute instructions from many programs What applications see? Each app thinks it has its own CPU + memory

Hardware, OS interfaces
Applications Job 1 Job 2 Job 3 CPU, Mem CPU, Mem CPU, Mem OS Memory CPUs Hardware

What is a process? Informal Formal A program in execution
Running code + things it can read/write Process ≠ program Formal ≥ 1 threads in their own address space (soon threads will share an address space)

Parts of a process Thread Address space
Sequence of executing instructions Active: does things Address space Data the process uses as it runs Passive: acted upon by threads

Play analogy Process is like a play performance
Program is like the play’s script What are the threads? Threads What is the address space? Address space

What is in the address space?
Program code Instructions, also called “text” Data segment Global variables, static variables Heap (where “new” memory comes from) Stack Where local variables are stored

Review of the stack Each stack frame contains a function’s
Local variables Parameters Return address Saved values of calling function’s registers The stack enables recursion

Example stack A C B A main tmp=0 RA=0x8048347 const=0 RA=0x8048354
void C () { A (0); } void B () { C (); void A (int tmp){ if (tmp) B (); int main () { A (1); return 0; 0x 0x 0x 0x804838c Code Memory Stack SP 0xfffffff tmp=0 RA=0x A SP const=0 RA=0x C SP RA=0x B … SP tmp=1 RA=0x804838c A SP const1=1 const2=0 main 0x0

The stack and recursion
void A (int bnd){ if (bnd) A (bnd-1); } int main () { A (3); return 0; 0x 0x804838c Code Memory Stack SP 0xfffffff bnd=0 RA=0x A SP bnd=1 RA=0x A SP bnd=2 RA=0x A … … SP bnd=3 RA=0x804838c A SP const1=3 const2=0 main How can recursion go wrong? Can overflow the stack … Keep adding frame after frame 0x0

What is missing? What state isn’t in the address space?
Registers Program counter (PC) General purpose registers Review architecture for more details

Multiple threads in an addr space
Several actors on a single set Sometimes they interact (speak, dance) Sometimes they are apart (different scenes)

Private vs global thread state
What state is private to each thread? PC (where actor is in his/her script) Stack, SP (actor’s mindset) What state is shared? Global variables, heap (props on set) Code (like the script)

Concurrency Concurrency Primary topics
Having multiple threads active at a time Concurrent != parallel Thread is the unit of concurrency Primary topics How threads cooperate on a single task How multiple threads can share the CPU

Address spaces Address space Primary topics
Unit of “state partitioning” Primary topics Many addr spaces sharing physical memory Efficiency Safety (protection)

Cooperating threads Memory CPU Thread A Thread B Thread C
Assume each thread has its own CPU We will relax this assumption later CPUs run at unpredictable speeds Can be stalled for any number of reasons Source of non-determinism Memory CPU Thread A Thread B Thread C

Non-determinism and ordering
Thread A Thread B Thread C Global ordering Time Why do we care about the global ordering? Might have dependencies between events Different orderings can produce different results Why is this ordering unpredictable? Can’t predict how fast processors will run

Non-determinism example 1
Thread A: cout << “ABC”; Thread B: cout << “123”; Possible outputs? “A1BC23”, “ABC123”, … Impossible outputs? Why? “321CBA”, “B12C3A”, … What is shared between threads? Screen, maybe the output buffer

y=10; Thread A: int x = y+1; Thread B: y = y*2; Possible results? A goes first: x = 11 and y = 20 B goes first: y = 20 and x = 21 What is shared between threads? Variable y

Thread A: x = 1; Thread B: x = 2; Possible results? B goes first: x = 1 A goes first: x = 2 Is x = 3 possible?

Example 3, continued What if “x = <int>;” is implemented as
x := x & 0 x := x | <int> Consider this schedule Thread A: x := x & 0 Thread B: x := x & 0 Thread B: x := x | 1 Thread A: x := x | 2

Atomic operations Must know what operations are atomic Atomic
before we can reason about cooperation Atomic Indivisible Happens without interruption Between start and end of atomic action No events from other threads can occur

Review of examples Print example (ABC, 123)
What did we assume was atomic? What if “print” is atomic? What if printing a char was not atomic? Arithmetic example (x=y+1, y=y*2)

Atomicity in practice On most machines
Memory assignment/reference is atomic E.g.: a=1, a=b Many other instructions are not atomic E.g.: double-precision floating point store (often involves two memory operations)

Virtual/physical interfaces
Applications SW atomic operations OS If you don’t have atomic operations, you can’t make one. HW atomic operations Hardware

Constraining concurrency
Synchronization Controlling thread interleavings Some events are independent No shared state Relative order of these events don’t matter Other events are dependent Output of one can be input to another Their order can affect program results

Goals of synchronization
All interleavings must give correct result Correct = works no matter how fast threads run Constrain program as little as possible Why? Constraints slow program down Constraints create complexity

Raising the level of abstraction
Locks Also called mutexes Provide mutual exclusion Prevent threads from entering a critical section Lock operations Lock (aka Lock::acquire) Unlock (aka Lock::release)

Lock operations Lock: wait until lock is free, then acquire it
This is a busy-waiting implementation Unlock: atomic lock = 0 do { if (lock is free) { lock = 1 break } } while (1) Must be atomic with respect to other threads calling this code

Elements of locking Key idea Threads are either running or blocked
The lock is initially free Threads acquire lock before an action Threads release lock when action completes Lock() must wait if someone else has lock Key idea All synchronization involves waiting Threads are either running or blocked

Example: thread-safe queue
enqueue () { lock (qLock) // ptr is private // head is shared new_element = new node(); if (head == NULL) { head = new_element; } else { node *ptr; // find queue tail for (ptr=head; ptr->next!=NULL; ptr=ptr->next){} ptr->next=new_element; } new_element->next=0; unlock(qLock); dequeue () { lock (qLock); element=NULL; if (head != NULL) { // if queue non-empty if (head->next!=0) { // remove head element=head->next; head->next= head->next->next; } else { element = head; head = NULL; } unlock (qLock); return element; What can go wrong?

Thread-safe queue Can enqueue unlock anywhere? Must leave shared data
No Must leave shared data In a consistent/sane state Data invariant “consistent/sane state” “always” true enqueue () { lock (qLock) // ptr is private // head is shared new_element = new node(); if (head == NULL) { head = new_element; } else { node *ptr; // find queue tail for (ptr=head; ptr->next!=NULL; ptr=ptr->next){} ptr->next=new_element; } unlock(qLock); // safe? new_element->next=0;

Invariants What are the queue invariants?
Each node appears once (from head to null) Enqueue results in prior list + new element Dequeue removes exactly one element Can invariants ever be false? Must be Otherwise you could never change states

More on invariants So when is the invariant broken?
Can only be broken while lock is held And only by thread holding the lock

BROKEN INVARIANT (CLOSE AND LOCK DOOR)

INVARIANT RESTORED (UNLOCK DOOR)

More on invariants So when is the invariant broken?
Can only be broken while lock is held And only by thread holding the lock Really a “public” invariant The data’s state in when the lock is free Like having your house tidy before guests arrive Hold lock whenever accessing shared data

More on invariants What about reading shared data?
Still must hold lock Else another thread could break invariant (Thread A prints Q as Thread B enqueues)

Intro to ordering constraints
Say you want dequeue to wait while the queue is empty Can we just busy-wait? No! Still holding lock dequeue () { lock (qLock); element=NULL; while (head==NULL) {} // remove head element=head->next; head->next=NULL; unlock (qLock); return element; }

Release lock before spinning?
dequeue () { lock (qLock); element=NULL; unlock (qLock); while (head==NULL) {} // remove head element=head->next; head->next=NULL; return element; } What can go wrong? Head might be NULL when we try to remove entry

One more try Does it work? Why? Downside? Seems ok Shared state is
dequeue () { lock (qLock); element=NULL; while (head==NULL) { unlock (qLock); } // remove head element=head->next; head->next=NULL; return element; Does it work? Seems ok Why? Shared state is protected Downside? Busy-waiting Wasteful

Ideal solution Would like dequeueing thread to “sleep”
Pause execution, add self to “waiting list” Enqueuer can resume when Q is non-empty Problem: what to do with the lock? Why can’t dequeueing thread sleep with lock? Enqueuer would never be able to add

Release the lock before sleep?
enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock dequeue () { acquire lock … if (queue empty) { release lock add self to wait list sleep } Does this work?

enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock dequeue () { acquire lock … if (queue empty) { release lock add self to wait list sleep } 1 2 3 Thread can sleep forever

enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock dequeue () { acquire lock … if (queue empty) { add self to wait list release lock sleep }

enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock dequeue () { acquire lock … if (queue empty) { add self to wait list release lock sleep } 1 2 3 Problem: missed wake-up Note: this can be fixed, but it’s messy

Two types of synchronization
As before we need to raise the level of abstraction Mutual exclusion One thread doing something at a time Use locks Ordering constraints Describe “before-after” relationships One thread waits for another Use monitors: a lock + its condition variable

Locks and condition variables
Let threads sleep inside a critical section Internal atomic actions (for now, by definition) CV State = queue of waiting threads + one lock // begin atomic release lock put thread on wait queue go to sleep // end atomic

Condition variable operations
Lock always held wait (lock){ release lock put thread on wait queue go to sleep // after wake up acquire lock } Atomic Lock always held signal (){ wakeup one waiter (if any) } Lock usually held Atomic broadcast (){ wakeup all waiters (if any) } Lock usually held Atomic

CVs and invariants Ok to leave invariants violated before wait?
No: wait can release the lock Larger rule about returning from wait Lock may have changed hands State can change between wait entry and return Don’t make assumptions about shared state

Multi-threaded queue enqueue () { acquire lock find tail of queue add new element signal (lock, CV) release lock } dequeue () { acquire lock if (queue empty) { wait (lock, CV) } remove item from queue release lock return removed item What if “queue empty” takes more than one instruction? Any problems with the “if” statement in dequeue?

Multi-threaded queue enqueue () { dequeue () { acquire lock
find tail of queue add new element signal (lock, CV) release lock } dequeue () { acquire lock if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } remove item from queue return removed item

Multi-threaded queue 1 2 4 3 enqueue () { dequeue () { acquire lock
find tail of queue add new element signal (lock, CV) release lock } dequeue () { acquire lock if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } remove item from queue return removed item 1 2 4 dequeue () { acquire lock … return removed item } 3

Multi-threaded queue How to solve? enqueue () { dequeue () {
acquire lock find tail of queue add new element signal (lock, CV) release lock } dequeue () { acquire lock if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } remove item from queue return removed item How to solve?

Multi-threaded queue Solve with a while loop (“loop before you leap”)
The “condition” in condition variable enqueue () { acquire lock find tail of queue add new element signal (lock, CV) release lock } dequeue () { acquire lock while (queue empty) { wait (lock, CV) } remove item from queue release lock return removed item Solve with a while loop (“loop before you leap”)

Recap and looking ahead
Applications Threads, synchronization primitives OS Atomic Load-Store, Interrupt enable- disable, Atomic Test-Set Hardware

Threads that aren’t running
What is a non-running thread? thread=“sequence of executing instructions” non-running thread=“paused execution” Must save thread’s private state To re-run, re-load private state Want thread to start where it left off

Private vs global thread state
What state is private to each thread? PC (where actor is in his/her script) Stack, SP (actor’s mindset) What state is shared? Global variables, heap (props on set) Code (like lines of a play)

Thread control block (TCB)
What needs to access threads’ private data? The CPU This info is stored in the PC, SP, other registers The OS needs pointers to non-running threads’ data Thread control block (TCB) Container for non-running threads’ private data Values of PC, code, SP, stack, registers

Thread control block TCB1 TCB2 TCB3 Thread 1 running Ready queue PC SP
Address Space TCB1 PC SP registers TCB2 PC SP registers TCB3 PC SP registers Ready queue Code Code Code Stack Stack Stack Thread 1 running CPU PC SP registers

Thread control block TCB2 TCB3 Thread 1 running Ready queue PC SP
Address Space TCB2 PC SP registers TCB3 PC SP registers Ready queue Code Stack Stack Stack Thread 1 running CPU PC SP registers

Thread states Running Ready Blocked Currently using the CPU
Ready to run, but waiting for the CPU Blocked Stuck in lock (), wait (), or down ()

Switching threads What needs to happen to switch threads?
Thread returns control to OS For example, via the “yield” call OS chooses next thread to run OS saves state of current thread To its thread control block OS loads context of next thread From its thread control block Run the next thread On Linux swapcontext

1. Thread returns control to OS
How does the thread system get control? Voluntary internal events Thread might block inside lock or wait Thread might call into kernel for service (read a file) Thread might call yield Are internal events enough?

1. Thread returns control to OS
Involuntary external events (events not initiated by the thread) Hardware interrupts Transfer control directly to OS interrupt handlers From your architecture course CPU checks for interrupts while executing Jumps to OS code with interrupt mask set Interrupts lead to pre-emption (a forced yield) Common interrupt: timer interrupt

2. Choosing the next thread
If no ready threads, just spin Modern CPUs execute a “halt” instruction Loop switches to thread if one is ready Many ways to prioritize ready threads Huge literature on scheduling algorithms

3. Saving state of current thread
What needs to be saved? Registers, PC, SP What makes this tricky? Self-referential sequence of actions Need registers to save state But you’re trying to save all the registers Saving the PC is particularly tricky

Saving the PC Why won’t this work? Instruction address
Returning thread will execute instruction at 100 And just re-execute the switch Really want to save address 102 Instruction address 100 store PC in TCB 101 switch to next thread

4. OS loads the next thread
Where is the next thread’s state/context? Thread control block (in memory) How to load the registers? Use load instructions to grab from memory How to load the stack? Stack is already in memory, load SP

5. OS runs the next thread How to resume thread’s execution?
Jump to the saved PC On whose stack are these steps running? or Who jumps to the saved PC? The thread that called yield (or was interrupted or called lock/wait) How does this thread run again? Some other thread must switch to it

Why use locks? disable interrupts while (1){}
If we have disable-enable, why do we need locks? Program could bracket critical sections with disable-enable Might not be able to give control back to thread library Can’t have multiple locks (over-constrains concurrency) disable interrupts while (1){}

Why use locks? How do we know if disabling interrupts is safe?
Need hardware support CPU has to know if running code is trusted (i.e, is the OS) Example of why we need the kernel Other things that user programs shouldn’t do? Manipulate page tables Reboot machine Communicate directly with hardware Will cover in upcoming memory review

Lock implementation #1 Kernel implementation Disable interrupts + busy-waiting lock () { disable interrupts while (value != FREE) { enable interrupts } value = BUSY unlock () { disable interrupts value = FREE enable interrupts } Why is it ok for lock code to disable interrupts? It’s in the trusted kernel (we have to trust something).

Lock implementation #1 Do we need to disable interrupts in unlock?
Kernel implementation Disable interrupts + busy-waiting lock () { disable interrupts while (value != FREE) { enable interrupts } value = BUSY unlock () { disable interrupts value = FREE enable interrupts } Do we need to disable interrupts in unlock? Only if “value = FREE” is multiple instructions (safer)

Lock implementation #1 Why enable-disable in lock loop body?
Kernel implementation Disable interrupts + busy-waiting lock () { disable interrupts while (value != FREE) { enable interrupts } value = BUSY unlock () { disable interrupts value = FREE enable interrupts } Why enable-disable in lock loop body? Otherwise, no one else will run (including unlockers)

Using read-modify-write instructions
Disabling interrupts Ok for uni-processor, breaks on multi-processor Why? Could use atomic load-store to make a lock Inefficient, lots of busy-waiting Hardware people to the rescue!

Using read-modify-write instructions
Most modern processor architectures Provide an atomic read-modify-write instruction Atomically Read value from memory into register Write new value to memory Implementation details Lock memory location at the memory controller

Test&set on most architectures
Slightly different on x86 (Exchange) Atomically swaps value between register and memory test&set (X) { tmp = X X = 1 return (tmp) } Set: sets location to 1 Test: retruns old value

Lock implementation #2 Use test&set Initially, value = 0
while (test&set(value) == 1) { } unlock () { value = 0 } What happens if value = 1? What happens if value = 0?

Locks and busy-waiting
All implementations have used busy-waiting Wastes CPU cycles To reduce busy-waiting, integrate Lock implementation Thread dispatcher data structures

Lock implementation #3 Interrupt disable, no busy-waiting lock () {
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts

Lock implementation #3 Who gets the lock after someone calls unlock?
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts This is called a “hand-off” lock. unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts Who gets the lock after someone calls unlock?

Lock implementation #3 This is called a “hand-off” lock.
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts This is called a “hand-off” lock. unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts Who might get the lock if it weren’t handed-off directly? (i.e., if value weren’t set BUSY in unlock)

Lock implementation #3 This is called a “hand-off” lock.
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts This is called a “hand-off” lock. unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts What kind of ordering of lock acquisition guarantees does the hand-off lock provide? Fumble lock?

Lock implementation #3 What does this mean? Are we saving the PC?
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts This is called a “hand-off” lock. unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts What does this mean? Are we saving the PC?

Lock implementation #3 No, just adding a pointer to the TCB/context.
disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { lockqueue.push(&current_thread->ucontext); swapcontext(&current_thread->ucontext, &new_thread->ucontext)); } enable interrupts This is called a “hand-off” lock. unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts No, just adding a pointer to the TCB/context.

Thread A Thread B B holds lock yield () { disable interrupts …
switch (B->A) enable interrupts } // exit thread library <user code> lock () { switch (A->B) back from switch (B->A) } // exit yield unlock () // moves A to ready queue back from switch (A->B) } // exit lock B holds lock

Lock implementation #4 Test&set, minimal busy-waiting lock () {
while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable

Lock implementation #4 Why is this better than t&s-only lock implementation? lock () { while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable Only busy-wait while another thread is in lock or unlock unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable Before, we busy-waited while lock was held

Lock implementation #4 What is the switch invariant?
Threads promise to call switch with guard set to 1. lock () { while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable

Summary of implementing locks
Synchronization code needs atomicity Three options Atomic load-store Lots of busy-waiting Interrupt disable-enable No busy-waiting Breaks on a multi-processor machine Atomic test-set Minimal busy-waiting Works on multi-processor machines

Upcoming Next! After that Questions?
Review of memory and address spaces After that We’ll start reading papers Start programming Project 1 Questions?

Review: threads and synchronization

Similar presentations

Presentation on theme: "Review: threads and synchronization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Review: threads and synchronization

Similar presentations

Presentation on theme: "Review: threads and synchronization"— Presentation transcript:

Similar presentations

About project

Feedback