About Me I'm a software Committer on HDFS

Slides:



Advertisements
Similar presentations
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Advertisements

Operating Systems Lecture Notes Deadlocks Matthew Dailey Some material © Silberschatz, Galvin, and Gagne, 2002.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
Chapter 6 Concurrency: Deadlock and Starvation
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman
Traffic Server Debugging using ASAN / TSAN Brian Geffon.
Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.
Review: Process Management Objective: –Enable fair multi-user, multiprocess computing on limited physical resources –Security and efficiency Process: running.
Deadlock CSCI 444/544 Operating Systems Fall 2008.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
OSE 2013 – synchronization (lec3) 1 Operating Systems Engineering Locking & Synchronization [chapter #4] By Dan Tsafrir,
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Programming with POSIX* Threads Intel Software College.
Cpr E 308 Spring 2004 Real-time Scheduling Provide time guarantees Upper bound on response times –Programmer’s job! –Every level of the system Soft versus.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Kernel Locking Techniques by Robert Love presented by Scott Price.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Week 3 January 22, 2004 Adrienne Noble. Today CVS – a great tool to use with your groups Threads – basic thread operations Intro to synchronization Hand.
Deadlock Operating Systems: Internals and Design Principles.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
NETW 3005 Monitors and Deadlocks. Reading For this lecture, you should have read Chapter 7. NETW3005 (Operating Systems) Lecture 06 - Deadlocks2.
Kernel Synchronization David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
W4118 Operating Systems Instructor: Junfeng Yang.
Chapter 4 – Thread Concepts
Process Management Deadlocks.
Healing Data Races On-The-Fly
CSE 120 Principles of Operating
Towards a Taxonomy of Security for Distributed Computing
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Logger, Assert and Invariants
Applied Operating System Concepts -
The Singleton Pattern SE-2811 Dr. Mark L. Hornick.
Principles of Operating Systems Lecture 11
Mechanism: Limited Direct Execution
Chapter 4 – Thread Concepts
CS703 - Advanced Operating Systems
runtime verification Brief Overview Grigore Rosu
Chapter 12: Concurrency, Deadlock and Starvation
Threads and Memory Models Hal Perkins Autumn 2011
Lecture 14: Pthreads Mutex and Condition Variables
MICROPROCESSOR MEMORY ORGANIZATION
BASIC SOFTWARE MAINTENANCE
Threads and Memory Models Hal Perkins Autumn 2009
Shared Memory Programming
Software Transactional Memory Should Not be Obstruction-Free
Chapter 3 Deadlocks 3.1. Resource 3.2. Introduction to deadlocks
Lecture 14: Pthreads Mutex and Condition Variables
Why Threads Are A Bad Idea (for most purposes)
Chapter 3 Deadlocks 3.1. Resource 3.2. Introduction to deadlocks
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
Introduction to Deadlocks
Foundations and Definitions
Chapter 3 Deadlocks 3.1. Resource 3.2. Introduction to deadlocks
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Dynamic Binary Translators and Instrumenters
More concurrency issues
Transactions, Properties of Transactions
Presentation transcript:

About Me I'm a software engineer @Cloudera Committer on HDFS Previously I worked on the Ceph distributed filesystem and the Linux kernel, among other things.

Project Motivation Parallel programming is becoming more and more important. New CPUs are expanding “out, not up”... more cores, same MHz Solid-state drives are removing the I/O bottlenecks to parallelism

Motivation Parallel programming is hard. Race conditions often don't show up in testing. Mutex locks and unlocks can be buried deep inside library code, which makes inspection more difficult.

C/C++ Challenges For performance reasons, many behaviors are left undefined in pthreads. Attempting to unlock a mutex you have not locked. Destroying a mutex that is held by another thread. Using a condition variable with two distinct mutexes from two different threads.

C/C++ Challenges The pthreads library offers no introspection features Can't assert(mutex is locked)

Design Challenges Performance Can benchmark Correctness Can test... but tests may not reveal all problems Not deterministic

Deadlock Deadlock occurs when two or more threads both require resources that the other thread(s) hold. Thread 1 Lock A Lock B Thread 2 Lock B Lock A

Visualizing Deadlock One way to think of deadlock is as a dependency cycle. Thread 1 Thread 2 Thread 3

Preventing Deadlock, Idea #1 Use Static analysis Problem: determining whether an arbitrary program will deadlock is equivalent to the halting problem. It is undecidable.

Preventing Deadlock, Idea #1 Despite their limitations, static analysis tools can still be helpful in some cases. Example: the Linux kernel's “sparse” tool allows programmers to make annotations. __must_hold: the specified lock is held on function entry and exit.

Preventing Deadlock, Idea #2 Use pthread_mutex_trylock cleverly Wait with a timeout. Release some resources if the timeout expires. Problem: requires clever programming Another Problem: timeouts slow down our application. This solution is useful in some cases, but it's not very general.

Preventing Deadlock, Idea #3 Don't prevent deadlock. Just detect it when it happens and restart the system. Some distributed databases take this approach, using a deadlock detector thread Not very general

Preventing Deadlock, Idea #4 Absolute ordering If you ever take mutex B after A, never take mutex A after B.

Preventing Deadlock, Idea #4 Absolute ordering This is easier than using pthread_mutex_trylock, and it doesn't involve timeouts. Formalized as CERT recommendation CON35-C: Avoid deadlock by locking in predefined order.

Problems It's difficult to maintain correct mutex ordering. There are a lot of mutexes in most programs. Even one cycle could cause a deadlock. You need to analyze all mutexes to be safe... even those found in library code

Locksmith Locksmith detects mutex ordering violations at runtime. It also detects many cases of undefined behavior. Previous implementations Ceph Linux kernel

Locksmith Not all possible deadlocks will occur most of the time. Locksmith makes the potential for deadlock visible, even in cases where the potential is rare. Complains, gives stack traces for mutex ordering issues or bad behavior.

Locksmith Locksmith is implemented as a shared library which overrides the pthreads functions. Application Using LD_PRELOAD Locksmith Pthreads

Locksmith Being implemented on top of pthreads gives Locksmith very good portability Programs don't need to be recompiled to make use of Locksmith. User-defined mutexes are possible

Enforcing Mutex Ordering For every pair of mutexes, we should never take B before A, if we once take A before B. Essentially, each “lock” operation that a thread does creates an edge in a graph from each mutex it holds to the mutex it is taking.

Deadlock, revisited A cycle means a possible deadlock. Thread 1 Lock A Lock B Thread 2 Lock B Lock A A cycle means a possible deadlock. A B

Features: logging Can log to many different sinks via LKSMITH_LOG stderr or stdout syslog File Program callbacks

Features: ignore patterns Can ignore mutexes locked inside certain frames LKSMITH_IGNORED_FRAMES LKSMITH_IGNORED_FRAME_PATTERNS Useful for working around known bugs

Error checking In addition to doing its own checks, Locksmith enables error-checking mutexes whenever possible Checks for taking sleeping locks (mutexes) while holding a spin lock. And many other checks...

Usable in Many Environments Initialized on first use of pthreads Can be used with pthreads static initializers PTHREAD_COND_INITIALIZER PTHREAD_MUTEX_INITIALIZER Usable within C++ global constructors

Alternatives jcarder A great tool-- for Java. Not available for C/C++ Userspace version of linux kernel lockdep http://lwn.net/Articles/536363 Requires an init call GPL2 (not LGPL)

Alternatives Use PTHREAD_MUTEX_ERRORCHECK This is always available, since it's part of pthreads. This detects some undefined behavior, but has basically no race detection. Locksmith enables this.

Alternatives: Race Detectors These tools build a graph of “happens- before” relationships based on sychronization operations. They warn when they notice both a read and a write operation to the same memory location from more than one thread without such a “happens-before” relationship Popular versions Google Thread Sanitizer (TSAN) Helgrind DRD

Race Detectors Advantages Can find races that don't involve locks at all-- such as “double checked locking” en.wikipedia.org/wiki/Double- checked_locking‎ TSAN is integrated with clang

Race Detectors Disadvantages DRD and TSAN don't seem to detect locking ordering violations (at least in the versions I looked at) Can use tremendous amounts of memory (although TSAN has a “fast mode” which may help)

Race Detectors Disadvantages DRD, TSAN, and Helgrind are implemented as valgrind tools – but you can't use valgrind in some environments, like JNI

Future Directions Speed up implementation Hash commonly occuring stack traces (to avoid symbol lookup, etc) Support rwlocks and some other constructs Integrate with race detector?

References https://github.com/cmccabe/lksmith http://data-race-test.googlecode.com/files/ThreadSanitizer.pdf http://www.cs.cmu.edu/~nbeckman/papers/race_detection_su rvey.pdf