W4118 Operating Systems Instructor: Junfeng Yang.

Slides:



Advertisements
Similar presentations
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.
Advertisements

Synchronization. Shared Memory Thread Synchronization Threads cooperate in multithreaded environments – User threads and kernel threads – Share resources.
Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.
Today’s Agenda  Midterm: Nov 3 or 10  Finish Message Passing  Race Analysis Advanced Topics in Software Engineering 1.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
CS444/CS544 Operating Systems Synchronization 2/21/2007 Prof. Searleman
CS533 Concepts of Operating Systems Class 3 Monitors.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS510 Concurrent Systems Introduction to Concurrency.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Games Development 2 Concurrent Programming CO3301 Week 9.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
4061 Session 21 (4/3). Today Thread Synchronization –Condition Variables –Monitors –Read-Write Locks.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Kernel Locking Techniques by Robert Love presented by Scott Price.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
NETW 3005 Monitors and Deadlocks. Reading For this lecture, you should have read Chapter 7. NETW3005 (Operating Systems) Lecture 06 - Deadlocks2.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Detecting Data Races in Multi-Threaded Programs
Deadlock CSE451 Andrew Whitaker. Review: What Can Go Wrong With Threads? Safety hazards  “Program does the wrong thing” Liveness hazards  “Program never.
Process Management Deadlocks.
Presenter: Godmar Back
CS703 - Advanced Operating Systems
Healing Data Races On-The-Fly
CSE 120 Principles of Operating
About Me I'm a software Committer on HDFS
Instructor: Junfeng Yang
Synchronization.
Atomic Operations in Hardware
Atomic Operations in Hardware
CS533 Concepts of Operating Systems Class 3
CS703 - Advanced Operating Systems
Threads and Memory Models Hal Perkins Autumn 2011
COP 4600 Operating Systems Spring 2011
Lecture 14: Pthreads Mutex and Condition Variables
COT 5611 Operating Systems Design Principles Spring 2014
COP 4600 Operating Systems Fall 2010
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 2 Part 2 Process Synchronization
Review: Readers-Writers Problem
CSCI1600: Embedded and Real Time Software
CS533 Concepts of Operating Systems Class 3
First slide Rest of project 2 due next Friday Today:
Lecture 14: Pthreads Mutex and Condition Variables
Kernel Synchronization II
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
CSE 153 Design of Operating Systems Winter 2019
CS333 Intro to Operating Systems
CSCI1600: Embedded and Real Time Software
CSE 153 Design of Operating Systems Winter 2019
Lecture 20: Synchronization
CSE 451 Section 1/27/2000.
CSE 332: Concurrency and Locks
Eraser: A dynamic data race detector for multithreaded programs
EECE.4810/EECE.5730 Operating Systems
Process/Thread Synchronization (Part 2)
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

W4118 Operating Systems Instructor: Junfeng Yang

Last lecture: synchronization in Linux  Low-level atomic operations:  Memory barrier avoids compile-time, or runtime instruction re-ordering  Atomic operations memory bus lock, read-modify-write ops  Interrupt/softirq disabling/enabling Local, global  Spin locks general, read/write  High-level synchronization primitives:  Semaphores general, read/write  Mutex  Completion

Last lecture: synchronization in Linux  Synchronization strategy  Avoid synch  Use atomic operations or spin locks  Use semaphores, mutexes, or completions if need to sleep  Optimize for common case  “Fast-path” is typically a few instructions  Examples: spin_lock when lock available down() when semaphore available Replace semaphore with mutex driven by hardware change, usage pattern

Today: concurrency errors  Goals  learn concurrency error patterns, so you can avoid them in your code  How to detection concurrency errors  Concurrency error patterns  Concurrency error detection

Concurrency error classification  Deadlock: a situation wherein two or more processes are never able to proceed because each is waiting for the others to do something  Key: circular wait  Race condition: a timing dependent error involving shared state  Data race: concurrent accesses to a shared variable and at least one access is write  Atomicity bugs: code does not enforce the atomicity programmers intended for a group of memory accesses  Order bugs: code does not enforce the order programmers intended for a group of memory accesses  If you have to choose between deadlocks and race conditions, which one would you choose?

Synchronization is hard. Why?  Too many possible thread schedules  Exponential to program size  Synchronization is global; not easy to divide- and-conquer  Synchronization cross-cuts abstraction boundaries  Local correctness may not yield global correctness. i.e. properly synchronized modules don’t compose  We’ll see a few error examples next

Example 1: good + bad  bad  deposit() is properly synchronized  withdraw() is not  Result: race  Synch is global. Your function can be made incorrect due to someone else’s bug  Debugging: hard void withdraw(int *balance) { -- *balance; } void deposit(int *balance) { lock(); ++ *balance; unlock(); }

Example 2: good + good  bad  Compose single-account operations to operations on two accounts  deposit(), withdraw() and balance() are properly synchronized  sum() and transfer()? Race  Synch doesn’t compose int sum(Account *a1, Account *a2) { return balance(a1) + balance(a2) } void transfer(Account *a1, Account *a2) { withdraw(a1); deposit(a2); } int balance(Account *acnt) { int b; lock(acnt->guard); b = acnt->balance; unlock(acnt->guard); return b; } void withdraw(Account *acnt) { lock(acnt->guard); -- acnt->balance; unlock(acnt->guard); } void deposit(Account *acnt) { lock(acnt->guard); ++ acnt->balance; unlock(acnt->guard); }

Example 3: good + good  deadlock  2 nd attempt: use locks in sum()  One sum() call, correct  Two concurrent sum() calls? Deadlock int sum(Account *a1, Account *a2) { int s; lock(a1->guard); lock(a2->guard); s = a1->balance; s += a2->balance; unlock(a2->guard); unlock(a1->guard); return s } T1: sum(a1, a2) T2: sum(a2, a1)

Example 4: not just locks, monitors don’t compose  Usually bad to hold lock (in this case Monitor lock) across abstraction boundary Monitor M1 { cond_t cv; wait() { // releases monitor lock wait(cv); } signal() { signal(cv); } };’ Monitor M2 { foo() {M1.wait();} bar() {M1.signal();} };’ T1: M2.foo(); T2: M2.bar();

Today: concurrency errors  Concurrency error patterns  Deadlock  Race Data race Atomicity Order  Concurrency error detection  Deadlock detection  Race detecttion

Deadlock detection  Root cause of deadlock: circular wait  Detecting deadlock is easy: when deadlock occurs, system halts. Can run debugger and see wait cycle  Can we detection potential deadlocks before we run into them?

Resource allocation graph  Nodes  Locks (resources)  Threads (processes)  Edges  Assignment edge: lock->thread Removed on unlock()  Request edge: thread->lock Converted to assignment edges on lock() return  Detection: cycles  deadlock  Problem: still detects a deadlock only when it occurs a1->guard a2->guard T1: sum(a1,a2) T2: sum(a2,a1) Resource allocation graph for example 3 deadlock

Detecting potential deadlocks  Can deduce lock order: the order in which locks are acquired  For each lock acquired, order with locks held  Cycles in lock order  potential deadlock a1->guard a2->guard T1: sum(a1, a2) // locks held lock(a1->guard) // {} lock(a2->guard) // {a1->guard} T2: sum(a1, a2) // locks held lock(a2->guard) // {} lock(a1->guard) // {a2->guard} Cycle  Potential deadlock!

Today: concurrency errors  Concurrency error patterns  Deadlock  Race Data race Atomicity Order  Concurrency error detection  Deadlock detection  Race detection

Race detection  We will only look at Eraser, a data races detector  Techniques exist to detect atomicity and order bugs  Two approaches to data race detection  Happens-before  Lockset (Eraser’s algorithm)

Happens-before  Event A happens-before event B if  B follows A in the thread  A inT1, and B inT2, and a synch event C such that A happens in T1 C is after A in T1 and before B in T2 B in T2  Happens-before defines a natural partial- order on events

Happens-before based race detection  Tools before eraser based on happens-before  Sketch  Monitor all data accesses and synch operations  Watch for Access of v in thread T1 Access of v in thread T2 No synch operation between the two accesses One of the accesses is write

Problems with happens-before  Problem I: expensive  Requires per thread List of accesses to shared data List of synch operations  Problem II: false negatives  Happens-before looks for actual data races (moment in time when multiple threads access shared data w/o synch)  Ignores programmer intention; the synch op between accesses may happen to be there T1: ++ y lock(m) unlock(m) T2: lock(m); unlock(m); ++ y;

Eraser: a different approach  Idea: check invariants  Violations of invariants  likely data races  Invariant: the locking discipline  Assume: accesses to shared variables are protected by locks Realistic?  Every access is protected by at least one lock  Any access unprotected by a lock  an error  Problem: how to find out what lock protects a variable?  Linkage between locks and vars undeclared

Lockset algorithm v1: infer the locks  Intuition: it must be one of the locks held at the time of access  C(v): a set of candidate locks for protecting v  Initialize C(v) to the set of all locks  On access to v by thread t, refine C(v)  C(v) = C(v) ^ locks_held(t)  If C(v) = {}, report error  Question: is locks_held(t) per thread?  Sounds good! But …

Problems with lockset v1  Initialization  When shared data is first created and initialized  Read-shared data  Shared data is only read (once initialized)  Read/write lock  We’ve seen it last week  Locks can be held in either write mode or read mode

Initialization  When shared data first created, only one thread can see it  locking unnecessary with only one thread  Solution: do not refine C(v) until the creator thread finishes initialization and makes the shared data accessible by other threads  How do we know when initialization is done?  We don’t …  Approximate with when a second thread accesses the shared data

Read-shared data  Some data is only read (once initialized)  locking unnecessary with read-only data  Solution: refine C(v), but don’t report warnings  Question: why refine C(v) in case of read?  To catch the case when C(v) is {} for shared read A thread writes to v

State transitions  Each shared data value (memory location) is in one of the four states Virgin Exclusive Shared/ Modified Shared write, first thread Read, new thread write, new thread write Refine C(v) and check Refine C(v), no check

Read-write locks  Read-write locks allow a single writer and multiple readers  Locks can be held in read mode and write mode  read_lock(m); read v; read_unlock(m)  write_lock(m); write v; write_unlock(m)  Locking discipline  Lock can be held in some mode (read or write) for read access  Lock must be held in write mode for write access A write access with lock held in read mode  error

Handling read-write locks  Idea: distinguish read and write access when refining lockset  On each read of v by thread t (same as before)  C(v) = C(v) ^ locks_held(t)  If C(v) = {}, report error  On each write of v by thread t  C(v) = C(v) ^ write_locks_held(t)  If C(v) = {}, report error

Implementation  How to monitor variable access?  Binary instrumentation  How to represent state?  For each memory word, keep a shadow word  First two bits: what state the word is in  How to represent lockset?  The remaining 30 bits: lockset index  A table maps lockset index to a set of locks  Assumption: not many distinct locksets

Results  Eraser works  Find bugs in mature software  Though many limitations Major: benign races (intended races)  However, slow  10-30X slow down  Instrumentation each memory access is costly  Can be made faster With static analysis Smarter instrumentation  Lockset algorithm is influential, used by many tools  E.g. Helgrind (a race detetection tool in Valgrind)

Benign race examples  Double-checking locking  Faster if v is often 0  Doesn’t work with compiler/hardware reordering  Statistical counter  ++ nrequests if(v) { // race lock(m); if(v) …; unlock(m); }

Automatic software error detection  Static analysis: inspect the code/binary without actually running it  E.g., gcc does some simple static analysis $ gcc –Wall  Dynamic analysis: actually run the software  E.g. testing $ run-test  Static v.s. dynamic  Static has better coverage, since compiler sees all code  Dynamic is more precise, since can see all values  Which one to use for concurrency errors?