Synchronization II Hakim Weatherspoon CS 3410, Spring 2013

Slides:

Advertisements

Similar presentations

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Advertisements

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Silberschatz, Galvin and Gagne ©2007 Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Chapter 6 (a): Synchronization.

Chapter 6: Process Synchronization

Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Atomic Instructions P&H Chapter 2.11.

Atomic Instructions Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P&H Chapter 2.11.

CS 316: Synchronization-II Andrew Myers Fall 2007 Computer Science Cornell University.

5.6 Semaphores Semaphores –Software construct that can be used to enforce mutual exclusion –Contains a protected variable Can be accessed only via wait.

Language Support for Concurrency. 2 Common programming errors Process i P(S) CS P(S) Process j V(S) CS V(S) Process k P(S) CS.

Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.

Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.

Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P&H Chapter 2.11.

Synchronization CSCI 444/544 Operating Systems Fall 2008.

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 2 (19/01/2006) Instructor: Haifeng YU.

CS510 Concurrent Systems Introduction to Concurrency.

Synchronization II Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P&H Chapter 2.11.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

4061 Session 21 (4/3). Today Thread Synchronization –Condition Variables –Monitors –Read-Write Locks.

Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P&H Chapter 2.11, 5.10, and 6.5.

Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P&H Chapter 2.11.

1 Interprocess Communication (IPC) - Outline Problem: Race condition Solution: Mutual exclusion –Disabling interrupts; –Lock variables; –Strict alternation.

CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.

1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.

CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.

CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.

Synchronization II Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P&H Chapter 2.11.

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.

1 Section 5 Synchronization primitives (Many slides taken from Winter 2006)

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Multiprocessors – Locks

Interprocess Communication Race Conditions

Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:

CS703 – Advanced Operating Systems

Background on the need for Synchronization

Lecture 19: Coherence and Synchronization

Atomic Operations in Hardware

Atomic Operations in Hardware

Operating Systems CMPSC 473

CS510 Operating System Foundations

Synchronization II Hakim Weatherspoon CS 3410, Spring 2012

Parallelism, Multicore, and Synchronization

Race Conditions Critical Sections Dekker’s Algorithm

Designing Parallel Algorithms (Synchronization)

Lecture 14: Pthreads Mutex and Condition Variables

Multicore, Parallelism, and Synchronization

Lecture 21: Synchronization and Consistency

Lecture: Coherence and Synchronization

Critical section problem

CSCI1600: Embedded and Real Time Software

CSE 451: Operating Systems Autumn Lecture 8 Semaphores and Monitors

Lecture 14: Pthreads Mutex and Condition Variables

Kernel Synchronization II

Parallelism, Multicore, and Synchronization

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization

CSE 153 Design of Operating Systems Winter 19

CS333 Intro to Operating Systems

CSCI1600: Embedded and Real Time Software

Lecture: Coherence and Synchronization

CSE 451 Section 1/27/2000.

Lecture 18: Coherence and Synchronization

Parallelism, Multicore, and Synchronization

Process/Thread Synchronization (Part 2)

CSE 542: Operating Systems

CSE 542: Operating Systems

Presentation transcript:

Synchronization II Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P&H Chapter 2.11 and 5.8

Goals for Today Synchronization Threads and processes Critical sections, race conditions, and mutexes Atomic Instructions HW support for synchronization Using sync primitives to build concurrency-safe data structures Language level synchronization

Next Goal Understanding challenges of taking advantage of parallel processors and resources? i.e. Challenges in parallel programming!

Programming with Threads Need it to exploit multiple processing units …to provide interactive applications …to parallelize for multicore …to write servers that handle many clients Problem: hard even for experienced programmers Behavior can depend on subtle timing differences Bugs may be impossible to reproduce Needed: synchronization of threads

Programming with Threads Concurrency poses challenges for: Correctness Threads accessing shared memory should not interfere with each other Liveness Threads should not get stuck, should make forward progress Efficiency Program should make good use of available computing resources (e.g., processors). Fairness Resources apportioned fairly between threads

Two threads, one counter Example: Web servers use concurrency Multiple threads handle client requests in parallel. Some shared state, e.g. hit counts: each thread increments a shared counter to track number of hits What happens when two threads execute concurrently? … LW R0, addr(hits) ADDI R0, r0, 1 SW R0, addr(hits) … hits = hits + 1;

Two threads, one counters Possible result: lost update! Timing-dependent failure  race condition Very hard to reproduce  Difficult to debug hits = 0 T2 T1 time LW (0) LW (0) ADDIU/SW: hits = 0 + 1 ADDIU/SW: hits = 0 + 1 hits = 1

Race conditions Def: timing-dependent error involving access to shared state Whether a Race condition happens depends on how threads scheduled i.e. who wins “races” to instruction that updates state vs. instruction that accesses state Challenges about Race conditions Races are intermittent, may occur rarely Timing dependent = small changes can hide bug A program is correct only if all possible schedules are safe Number of possible schedule permutations is huge Need to imagine an adversary who switches contexts at the worst possible time

Takeaway Need parallel abstraction like threads to take advantage of parallel resources like multicore. Writing parallel programs are hard to get right! Need to prevent data races, timing dependent updates that result in errors in programs.

Next Goal How to prevent data races and write correct parallel programs?

Critical sections To eliminate races: use critical sections that only one thread can be in Contending threads must wait to enter T2 T1 time CSEnter(); Critical section CSExit(); CSEnter(); # wait Critical section CSExit(); T1 T2

Mutexes Critical sections typically associated with mutual exclusion locks (mutexes) Only one thread can hold a given mutex at a time Acquire (lock) mutex on entry to critical section Or block if another thread already holds it Release (unlock) mutex on exit Allow one waiting thread (if any) to acquire & proceed pthread_mutex_init(&m); pthread_mutex_lock(&m); hits = hits+1; pthread_mutex_unlock(&m); pthread_mutex_lock(&m); # wait hits = hits+1; pthread_mutex_unlock(&m); T1 T2

Takeaway Need parallel abstraction like threads to take advantage of parallel resources like multicore. Writing parallel programs are hard to get right! Need to prevent data races, timing dependent updates that result in errors in programs. Need critical sections where prevent data races and write parallel safe programs. Mutex, mutual exclusion, can be used to implement critical sections, often implemented via a lock abstraction.

Next Goal How to implement mutex locks? What are the hardware primitives? Then, use these mutex locks to implement critical sections, and use critical sections to write parallel safe programs.

Mutexes Q: How to implement critical section in code? A: Lots of approaches…. Mutual Exclusion Lock (mutex) lock(m): wait till it becomes free, then lock it unlock(m): unlock it safe_increment() { pthread_mutex_lock(&m); hits = hits + 1; pthread_mutex_unlock(&m) }

Synchronization in MIPS The University of Adelaide, School of Computer Science Synchronization in MIPS 14 November 2018 Load linked: LL rt, offset(rs) Store conditional: SC rt, offset(rs) Succeeds if location not changed since the LL Returns 1 in rt Fails if location is changed Returns 0 in rt Any time a processor intervenes and modifies the value in memory between the LL and SC instruction, the SC returns 0 in $t0, causing the code to try again. Chapter 2 — Instructions: Language of the Computer

Synchronization in MIPS The University of Adelaide, School of Computer Science Synchronization in MIPS 14 November 2018 Load linked: LL rt, offset(rs) Store conditional: SC rt, offset(rs) Succeeds if location not changed since the LL Returns 1 in rt Fails if location is changed Returns 0 in rt Example: atomic incrementor Time Step Thread A Thread B Thread A $t0 Thread B $t0 Memory M[$s0] 1 try: LL $t0, 0($s0) 2 ADDIU $t0, $t0, 1 3 SC $t0, 0($s0) SC $t0, 0 ($s0) 4 BEQZ $t0, try Chapter 2 — Instructions: Language of the Computer

Synchronization in MIPS The University of Adelaide, School of Computer Science Synchronization in MIPS 14 November 2018 Load linked: LL rt, offset(rs) Store conditional: SC rt, offset(rs) Succeeds if location not changed since the LL Returns 1 in rt Fails if location is changed Returns 0 in rt Example: atomic incrementor Time Step Thread A Thread B Thread A $t0 Thread B $t0 Memory M[$s0] 1 try: LL $t0, 0($s0) 2 ADDIU $t0, $t0, 1 3 SC $t0, 0($s0) SC $t0, 0 ($s0) 4 BEQZ $t0, try Chapter 2 — Instructions: Language of the Computer

Mutex from LL and SC Linked load / Store Conditional m = 0; // m=0 means lock is free; otherwise, if m=1, then lock locked mutex_lock(int *m) { while(test_and_set(m)){} } int test_and_set(int *m) { old = *m; *m = 1; return old; LL Atomic SC LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional try: mutex_lock(int *m) { while(test_and_set(m)){} } int test_and_set(int *m) { LI $t0, 1 LL $t1, 0($a0) SC $t0, 0($a0) MOVE $v0, $t1 try: LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail” BEQZ $t0, try

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { while(test_and_set(m)){} } int test_and_set(int *m) { try: LI $t0, 1 LL $t1, 0($a0) SC $t0, 0($a0) BEQZ $t0, try MOVE $v0, $t1 LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { test_and_set: LI $t0, 1 LL $t1, 0($a0) BNEZ $t1, test_and_set SC $t0, 0($a0) BEQZ $t0, test_and_set } mutex_unlock(int *m) { *m = 0; LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional This is called a mutex_lock(int *m) { test_and_set: LI $t0, 1 LL $t1, 0($a0) BNEZ $t1, test_and_set SC $t0, 0($a0) BEQZ $t0, test_and_set } mutex_unlock(int *m) { SW $zero, 0($a0) This is called a Spin lock Aka spin waiting LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { Time Step Thread A Thread B ThreadA $t0 ThreadA $t1 ThreadB $t0 ThreadB $t1 Mem M[$a0] 1 try: LI $t0, 1 2 LL $t1, 0($a0) 3 BNEZ $t1, try 4 SC $t0, 0($a0) SC $t0, 0 ($a0) 5 BEQZ $t0, try 6 LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { Time Step Thread A Thread B ThreadA $t0 ThreadA $t1 ThreadB $t0 ThreadB $t1 Mem M[$a0] 1 try: LI $t0, 1 2 LL $t1, 0($a0) 3 BNEZ $t1, try 4 SC $t0, 0($a0) SC $t0, 0 ($a0) 5 BEQZ $t0, try 6 LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { Time Step Thread A Thread B ThreadA $t0 ThreadA $t1 ThreadB $t0 ThreadB $t1 Mem M[$a0] 1 try: LI $t0, 1 2 LL $t1, 0($a0) 3 BNEZ $t1, try 4 SC $t0, 0($a0) SC $t0, 0 ($a0) 5 BEQZ $t0, try 6 Critical section LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional This is called a mutex_lock(int *m) { test_and_set: LI $t0, 1 LL $t1, 0($a0) BNEZ $t1, test_and_set SC $t0, 0($a0) BEQZ $t0, test_and_set } mutex_unlock(int *m) { SW $zero, 0($a0) This is called a Spin lock Aka spin waiting LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { Time Step Thread A Thread B ThreadA $t0 ThreadA $t1 ThreadB $t0 ThreadB $t1 Mem M[$a0] 1 try: LI $t0, 1 2 3 4 5 6 7 8 9 LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Mutex from LL and SC Linked load / Store Conditional m = 0; mutex_lock(int *m) { Time Step Thread A Thread B ThreadA $t0 ThreadA $t1 ThreadB $t0 ThreadB $t1 Mem M[$a0] 1 try: LI $t0, 1 2 LL $t1, 0($a0) 3 BNEZ $t1, try 4 5 6 7 8 9 LL just like LW SC jus like SW, but overwrites src register with 1 “success” or 0 “fail”

Alternative Atomic Instructions Other atomic hardware primitives - test and set (x86) - atomic increment (x86) - bus lock prefix (x86) - compare and exchange (x86, ARM deprecated) - linked load / store conditional (MIPS, ARM, PowerPC, DEC Alpha, …) hardware implementation: lock bus while read-modify-write cycle happens, very expensive

Synchronization Synchronization techniques clever code must work despite adversarial scheduler/interrupts used by: hackers also: noobs disable interrupts used by: exception handler, scheduler, device drivers, … disable preemption dangerous for user code, but okay for some kernel code mutual exclusion locks (mutex) general purpose, except for some interrupt-related cases device drivers if interrupts always go to same core

Takeaway Need parallel abstraction like threads to take advantage of parallel resources like multicore. Writing parallel programs are hard to get right! Need to prevent data races, timing dependent updates that result in errors in programs. Need critical sections where prevent data races and write parallel safe programs. Mutex, mutual exclusion, can be used to implement critical sections, often implemented via a lock abstraction. We need synchronization primitives such as LL and SC (load linked and store conditional) instructions to efficiently implement parallel and correct programs.

Next Goal How do we use synchronization primitives to build concurrency-safe data structure?

Attempt#1: Producer/Consumer Access to shared data must be synchronized goal: enforce datastructure invariants // invariant: // data is in A[h … t-1] char A[100]; int h = 0, t = 0; // producer: add to list tail void put(char c) { A[t] = c; t = (t+1)%n; } 1 2 3 head tail draw picture invariant broken just before last/first increments needs: make the list into a ring buffer, add code in writer to wait if list is full

Attempt#1: Producer/Consumer Access to shared data must be synchronized goal: enforce datastructure invariants // invariant: // data is in A[h … t-1] char A[100]; int h = 0, t = 0; // producer: add to list tail void put(char c) { A[t] = c; t = (t+1)%n; } 1 2 3 4 head tail // consumer: take from list head char get() { while (h == t) { }; char c = A[h]; h = (h+1)%n; return c; } draw picture invariant broken just before last/first increments needs: make the list into a ring buffer, add code in writer to wait if list is full

Attempt#1: Producer/Consumer Access to shared data must be synchronized goal: enforce datastructure invariants // invariant: // data is in A[h … t-1] char A[100]; int h = 0, t = 0; // producer: add to list tail void put(char c) { A[t] = c; t = (t+1)%n; } 2 3 4 head tail // consumer: take from list head char get() { while (h == t) { }; char c = A[h]; h = (h+1)%n; return c; } draw picture invariant broken just before last/first increments needs: make the list into a ring buffer, add code in writer to wait if list is full Error: could miss an update to t or h due to lack of synchronization Current implementation will break invariant: only produce if not full and only consume if not empty Need to synchronize access to shared data

Attempt#2: Protecting an invariant // invariant: (protected by mutex m) // data is in A[h … t-1] pthread_mutex_t *m = pthread_mutex_create(); char A[100]; int h = 0, t = 0; // producer: add to list tail void put(char c) { pthread_mutex_lock(m); A[t] = c; t = (t+1)%n; pthread_mutex_unlock(m); } // consumer: take from list head char get() { pthread_mutex_lock(m); while(h == t) {} char c = A[h]; h = (h+1)%n; pthread_mutex_unlock(m); return c; } bug: can’t do busy waiting inside mutex… what to do? Rule of thumb: all access and updates that can affect invariant become critical sections

Attempt#2: Protecting an invariant // invariant: (protected by mutex m) // data is in A[h … t-1] pthread_mutex_t *m = pthread_mutex_create(); char A[100]; int h = 0, t = 0; // producer: add to list tail void put(char c) { pthread_mutex_lock(m); A[t] = c; t = (t+1)%n; pthread_mutex_unlock(m); } BUG: Can’t wait while holding lock // consumer: take from list head char get() { pthread_mutex_lock(m); while(h == t) {} char c = A[h]; h = (h+1)%n; pthread_mutex_unlock(m); return c; } bug: can’t do busy waiting inside mutex… what to do? Rule of thumb: all access and updates that can affect invariant become critical sections

Guidelines for successful mutexing Insufficient locking can cause races Skimping on mutexes? Just say no! Poorly designed locking can cause deadlock know why you are using mutexes! acquire locks in a consistent order to avoid cycles use lock/unlock like braces (match them lexically) lock(&m); …; unlock(&m) watch out for return, goto, and function calls! watch out for exception/error conditions! P1: lock(m1); lock(m2); P2: lock(m2); lock(m1); Circular Wait

Attempt#3: Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: don’t busy wait… go to sleep instead while(empty) {} char get() { acquire(L); char c = A[h]; h = (h+1)%n; release(L); return c; } head last==head empty

Attempt#3: Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: don’t busy wait… go to sleep instead Cannot check condition while Holding the lock, BUT, empty condition may no longer hold in critical section char get() { while (h == t) { }; acquire(L); char c = A[h]; h = (h+1)%n; release(L); return c; } char get() { acquire(L); while (h == t) { }; char c = A[h]; h = (h+1)%n; release(L); return c; } char get() { acquire(L); char c = A[h]; h = (h+1)%n; release(L); return c; } head last==head empty Dilemma: Have to check while holding lock,

Attempt#3: Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: don’t busy wait… go to sleep instead char get() { acquire(L); while (h == t) { }; char c = A[h]; h = (h+1)%n; release(L); return c; } char get() { acquire(L); char c = A[h]; h++; release(L); return c; } Dilemma: Have to check while holding lock, but cannot wait while hold lock

Attempt#4: Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: don’t busy wait… go to sleep instead char get() { do { acquire(L); empty = (h == t); if (!empty) { c = A[h]; h = (h+1)%n; } release(L); } while (empty); return c; Works, but wasteful sin will spin wait

Language-level Synchronization

Condition variables Use [Hoare] a condition variable to wait for a condition to become true (without holding lock!) wait(m, c) : atomically release m and sleep, waiting for condition c wake up holding m sometime after c was signaled signal(c) : wake up one thread waiting on c broadcast(c) : wake up all threads waiting on c POSIX (e.g., Linux): pthread_cond_wait, pthread_cond_signal, pthread_cond_broadcast

Attempt#5: Using a condition variable wait(m, c) : release m, sleep until c, wake up holding m signal(c) : wake up one thread waiting on c cond_t *not_full = ...; cond_t *not_empty = ...; mutex_t *m = ...; void put(char c) { lock(m); while ((t-h) % n == 1) wait(m, not_full); A[t] = c; t = (t+1) % n; unlock(m); signal(not_empty); } char get() { lock(m); while (t == h) wait(m, not_empty); char c = A[h]; h = (h+1) % n; unlock(m); signal(not_full); return c; }

Monitors A Monitor is a concurrency-safe datastructure, with… one mutex some condition variables some operations All operations on monitor acquire/release mutex one thread in the monitor at a time Ring buffer was a monitor Java, C#, etc., have built-in support for monitors

Java concurrency Java objects can be monitors “synchronized” keyword locks/releases the mutex Has one (!) builtin condition variable o.wait() = wait(o, o) o.notify() = signal(o) o.notifyAll() = broadcast(o) Java wait() can be called even when mutex is not held. Mutex not held when awoken by signal(). Useful?

More synchronization mechanisms Lots of synchronization variations… (can implement with mutex and condition vars.) Reader/writer locks Any number of threads can hold a read lock Only one thread can hold the writer lock Semaphores N threads can hold lock at the same time Message-passing, sockets, queues, ring buffers, … transfer data and synchronize

Summary Hardware Primitives: test-and-set, LL/SC, barrier, ... … used to build … Synchronization primitives: mutex, semaphore, ... Language Constructs: monitors, signals, ...

Administrivia Project3 due next week, Monday, April 22nd Games night Friday, April 26th, 5-7pm. Location: B17 Upson Come, eat, drink, have fun and be merry! Prelim3 is next week, Thursday, April 25th Time and Location: 7:30pm in Phillips 101 and Upson B17 Old prelims are online in CMS Prelim Review Session: Monday, April 22, 6-8pm in B17 Upson Hall Tuesday, April 23, 6-8pm in 101 Phillips Hall Project4: Final project out next week Demos: May 14 and 15 Will not be able to use slip days

Administrivia Next three weeks Final Project for class Week 12 (Apr 15): Project3 design doc due and HW4 due Week 13 (Apr 22): Project3 due and Prelim3 Week 14 (Apr 29): Project4 handout Final Project for class Week 15 (May 6): Project4 design doc due Week 16 (May 13): Project4 due by May 15th