Download presentation
Presentation is loading. Please wait.
2
About Me I'm a software engineer @Cloudera Committer on HDFS
Previously I worked on the Ceph distributed filesystem and the Linux kernel, among other things.
3
Project Motivation Parallel programming is becoming more and more important. New CPUs are expanding “out, not up”... more cores, same MHz Solid-state drives are removing the I/O bottlenecks to parallelism
4
Motivation Parallel programming is hard.
Race conditions often don't show up in testing. Mutex locks and unlocks can be buried deep inside library code, which makes inspection more difficult.
5
C/C++ Challenges For performance reasons, many behaviors are left undefined in pthreads. Attempting to unlock a mutex you have not locked. Destroying a mutex that is held by another thread. Using a condition variable with two distinct mutexes from two different threads.
6
C/C++ Challenges The pthreads library offers no introspection features
Can't assert(mutex is locked)
7
Design Challenges Performance Can benchmark Correctness
Can test... but tests may not reveal all problems Not deterministic
8
Deadlock Deadlock occurs when two or more threads both require resources that the other thread(s) hold. Thread 1 Lock A Lock B Thread 2 Lock B Lock A
9
Visualizing Deadlock One way to think of deadlock is as a dependency cycle. Thread 1 Thread 2 Thread 3
10
Preventing Deadlock, Idea #1
Use Static analysis Problem: determining whether an arbitrary program will deadlock is equivalent to the halting problem. It is undecidable.
11
Preventing Deadlock, Idea #1
Despite their limitations, static analysis tools can still be helpful in some cases. Example: the Linux kernel's “sparse” tool allows programmers to make annotations. __must_hold: the specified lock is held on function entry and exit.
12
Preventing Deadlock, Idea #2
Use pthread_mutex_trylock cleverly Wait with a timeout. Release some resources if the timeout expires. Problem: requires clever programming Another Problem: timeouts slow down our application. This solution is useful in some cases, but it's not very general.
13
Preventing Deadlock, Idea #3
Don't prevent deadlock. Just detect it when it happens and restart the system. Some distributed databases take this approach, using a deadlock detector thread Not very general
14
Preventing Deadlock, Idea #4
Absolute ordering If you ever take mutex B after A, never take mutex A after B.
15
Preventing Deadlock, Idea #4
Absolute ordering This is easier than using pthread_mutex_trylock, and it doesn't involve timeouts. Formalized as CERT recommendation CON35-C: Avoid deadlock by locking in predefined order.
16
Problems It's difficult to maintain correct mutex ordering.
There are a lot of mutexes in most programs. Even one cycle could cause a deadlock. You need to analyze all mutexes to be safe... even those found in library code
17
Locksmith Locksmith detects mutex ordering violations at runtime.
It also detects many cases of undefined behavior. Previous implementations Ceph Linux kernel
18
Locksmith Not all possible deadlocks will occur most of the time.
Locksmith makes the potential for deadlock visible, even in cases where the potential is rare. Complains, gives stack traces for mutex ordering issues or bad behavior.
19
Locksmith Locksmith is implemented as a shared library which overrides the pthreads functions. Application Using LD_PRELOAD Locksmith Pthreads
20
Locksmith Being implemented on top of pthreads gives Locksmith very good portability Programs don't need to be recompiled to make use of Locksmith. User-defined mutexes are possible
21
Enforcing Mutex Ordering
For every pair of mutexes, we should never take B before A, if we once take A before B. Essentially, each “lock” operation that a thread does creates an edge in a graph from each mutex it holds to the mutex it is taking.
22
Deadlock, revisited A cycle means a possible deadlock. Thread 1 Lock A
Lock B Thread 2 Lock B Lock A A cycle means a possible deadlock. A B
23
Features: logging Can log to many different sinks via LKSMITH_LOG
stderr or stdout syslog File Program callbacks
24
Features: ignore patterns
Can ignore mutexes locked inside certain frames LKSMITH_IGNORED_FRAMES LKSMITH_IGNORED_FRAME_PATTERNS Useful for working around known bugs
25
Error checking In addition to doing its own checks, Locksmith enables error-checking mutexes whenever possible Checks for taking sleeping locks (mutexes) while holding a spin lock. And many other checks...
26
Usable in Many Environments
Initialized on first use of pthreads Can be used with pthreads static initializers PTHREAD_COND_INITIALIZER PTHREAD_MUTEX_INITIALIZER Usable within C++ global constructors
27
Alternatives jcarder A great tool-- for Java. Not available for C/C++
Userspace version of linux kernel lockdep Requires an init call GPL2 (not LGPL)
28
Alternatives Use PTHREAD_MUTEX_ERRORCHECK
This is always available, since it's part of pthreads. This detects some undefined behavior, but has basically no race detection. Locksmith enables this.
29
Alternatives: Race Detectors
These tools build a graph of “happens- before” relationships based on sychronization operations. They warn when they notice both a read and a write operation to the same memory location from more than one thread without such a “happens-before” relationship Popular versions Google Thread Sanitizer (TSAN) Helgrind DRD
30
Race Detectors Advantages
Can find races that don't involve locks at all-- such as “double checked locking” en.wikipedia.org/wiki/Double- checked_locking TSAN is integrated with clang
31
Race Detectors Disadvantages
DRD and TSAN don't seem to detect locking ordering violations (at least in the versions I looked at) Can use tremendous amounts of memory (although TSAN has a “fast mode” which may help)
32
Race Detectors Disadvantages
DRD, TSAN, and Helgrind are implemented as valgrind tools – but you can't use valgrind in some environments, like JNI
33
Future Directions Speed up implementation
Hash commonly occuring stack traces (to avoid symbol lookup, etc) Support rwlocks and some other constructs Integrate with race detector?
34
References https://github.com/cmccabe/lksmith
rvey.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.