Diagnosing and Fixing Concurrency Bugs

Diagnosing and Fixing Concurrency Bugs
Guoliang Jin University of Wisconsin–Madison Thanks for the introduction. Good morning everyone. My name is Guoliang Jin. It is my pleasure to come here today and present my work on “Diagnosing and Fixing Concurrency Bugs”.

We need reliable software
People’s daily life now depends on reliable software Software companies spend lots of resources on debugging More than 50% effort on finding and fixing bugs Around $300 billion per year The general issue I am looking at is software reliability. Nowadays, software is everywhere, it is in our smartphone, it is in the power transmission system, it is also controlling the vehicles. software companies are really trying hard to ensure software reliability. According to a 2013 Cambridge university study, developers spend more than 50% of programming time on finding and fixing bugs, the global cost of debugging software has risen to $312 billion annually

Concurrency bugs hurt It is an increasingly parallel world
Concurrency bugs in history We are now in an increasingly parallel world. Intel has just announced their eight core desktop processor, and today even smartphones could go eight cores. Further, we also have the cloud, which makes the whole eco-system more complicated. However, it is not an easy task to develop reliable software for these platforms, because these systems are inherently concurrent, and prone to concurrency bugs In the history, concurrency bugs have caused several disasters: Like the 1980 radiation overdose, the 2003 northeast blackout, and the most recent facebook IPO delays These cases are all costly and very harmful. This concurrency bug problem will only become more severe in the current multi-core era.

Multi-threaded program
Concurrent programs under the shared-memory model Programs execute multiple interacting threads in parallel Threads communicate via shared memory Shared-memory accesses should be well-synchronized The remainder of talk will be based on this multi-threaded program model for multi-core systems. To take advantage of more than one core on a multi-core system, a program has multiple threads with each can be executed by an individual core. Memory is still shared among these cores. Different threads could access the same data. Now since memory is shared, we need to add synchronization to the program to ensure correctness. This is an important but difficult topic, and undergraduate classes usually spend weeks to cover materials on this topic. Multicore chip thread1 thread2 thread3 thread4 core1 core2 core3 core4 cache cache cache cache shared memory

An example of concurrency bug The interleaving space
Huge Interleaving space Thread 1 if (ptr != NULL) { ptr->field = 1; } Thread 2 ptr = NULL; Bad interleavings Thread 1 if (ptr != NULL) { ptr->field = 1; } Thread 2 ptr = NULL; Thread 1 Thread 2 if (ptr != NULL) { ptr->field = 1; } ptr = NULL; Despite all the effort on teaching and leaving, we still make mistakes on synchronizations and lead to concurrency bugs. Here is an example of concurrency bug simplified from real-world multi-threaded software. We have a shared variable ptr. The programmer is being careful here that Thread 1 first checks it to see whether it is none, and access the field inside it only if not null. Thread 2 sets the ptr to null. without proper synchronization, there still could be a segmentation fault. In this interleaving, the two accesses in thread 1 are interleaved by the one from thread 2. However, this is not the only possible interleaving, and there could another two interleavings For real-world software, the interleaving space is much huger, which makes it difficult to find the bad interleavings. To help this, a lot of research effort has been spent on finding these bad interleavings. Segmentation Fault Thread 1 if (ptr != NULL) { ptr->field = 1; } Thread 2 ptr = NULL; Previous research focuses on finding

Bug fixing CFix: automated concurrency-bug fixing [PLDI’11*, OSDI’12]
Software quality does not improve until bugs are fixed Manual concurrency bug fixing is time-consuming: 73 days on average error-prone: 39% patches are buggy in the first release CFix: automated concurrency-bug fixing [PLDI’11*, OSDI’12] Program behaves correctly if bad interleavings do not occur Fix concurrency bugs by disabling bad interleavings However, it is not the end of the story when a bug is found, as software quality does not improve until these bugs are actually fixed. Right now, this bug fixing process is done manually. Some studies have already shown that manual concurrency bug fixing is time-consuming and error-prone. One study shows that it takes 73 days on average to correctly ﬁx a concurrency bug While another study shows that 39% of concurrency bug patches are incorrect, Developers also need better tool support on bug fixing. My work provide tools that fix concurrency bugs automatically. The key observation here is that concurrency bug are more feasible to be fixed automatically comparing to sequential bugs, because the program would behave correctly as long as these bad interleavings do not occur, as a result, we can fix concurrency bugs by using synchronization operations to disable bad interleavings reported by various bug finding tools. We have published our results on pldi and osdi. The pldi paper got a sigplan normination for publication at communication of acm. With the comment that it is one of the first papers to attack the problem of automated bug fixing I will discuss this work in detail later. *SIGPLAN: “one of the first papers to attack the problem of automated bug fixing”

The interleaving space (again)
Huge Interleaving space Disabled Bad interleavings Bad interleavings Bad interleavings Bad interleavings Disabled Given the tool support on automated fixing, developers get end-to-end support during development for concurrency bugs. Now they can run their bug finding and bug fixing tools together to first find bad interleavings and fix concurrency bugs. The real-world scenario is less perfect that software may contains multiple concurrency bugs, and we need some special interleavings to trigger each of them. But no tool can guarantee to find out all bad interleavings. These missed bad interleavings would eventually lead to production-run failures, and some of them led to real-world problems that I showed earlier. Bad interleavings Bad interleavings lead to production-run failures

Failure diagnosis CCI: a pure software solution [OOPSLA’10]
Failures still happen in production runs The reason behind failure needs to be understood Tools dealing with production runs demand low overhead Diagnostic information needs to be informative Production-run concurrency-bug failure diagnosis Design new monitoring schemes and sampling strategies CCI: a pure software solution [OOPSLA’10] PBI, LXR: hardware-assisted solutions [ASPLOS’13 & 14] We should definitely keep improving our bug finding and bug fixing tools. We should also help developers understand these bugs that manifest at users’ site and lead to failure, and that’s exactly the goal of failure diagnosis tools. Since failure diagnosis tools are now dealing with production runs, the performance impact has to be small. Otherwise, no one is going to use it. The state of art on failure diagnosis has not looked at failures caused by concurrency bugs, and their design cannot explain failure caused by concurrency bugs. We design a set of monitoring schemes to collect information that can pinpoint thread interleaving abnormalities of common concurrency bugs. We also design different sampling strategies that suit different types of predicates and help keep the run-time overhead low. With these monitoring schemes and sampling strategies, we apply statistical debugging to effectively diagnose software failures. We have a pure software solution and two hardware-assisted solutions.

My work on concurrency bugs
Bug Detection and software testing: ConSeq [ASPLOS’11] Automated Concurrency-Bug Fixing: CFix Overall, my research on concurrency-bug has been on building comprehensive end-to-end tool support. I collaborated with my colleagues on one bug-finding tool. My research has been focused on how to fix concurrency bugs automatically, which greatly extends the capability of automated concurrency-bug tools To understand those bugs manifest during production runs, I developed several failure diagnosis tools with different tradeoffs. Eventually I will also combine production-run failure diagnosis and automated bug fixing tools to create a self-healing system for concurrency-bugs in the future. [PLDI’11*, OSDI’12] *Received a SIGPLAN CACM nomination Production-Run Failure Diagnosis: CCI/PBI/LXR [OOPSLA’10, ASPLOS’13 & 14]

My work on performance bugs
They are implementation mistakes that cause inefficiency A characteristic study and bug detection [PLDI’ 12] A first-of-its-kind study with 109 real-world performance bugs Found 332 previously unknown problems An interactive patch validation tool [CAV’ 13] Simplify the reasoning of functionality equivalence I have also looked at performance bugs, which are referring to implementation mistakes that can cause inefficiency Performance bugs hurt user experience and waste resources, and are becoming more problematic as we are now entering an energy-constrained era. I have worked on real-world bug study, which is one of the first comprehensive studies for performance bugs. We believe our study can guide future work on performance bug avoidance, detection, testing, and fixing. Following our study, I conducted bug detection and found hundreds of new potential performance problems. To help performance bug fixing, we also have a patch validation tool We can talk offline if you would like to learn more about any of these.

Outline Motivation and Overview CFix: Automated Concurrency-Bug Fixing
CCI/PBI/LXR: Multi-threaded Program Failure Diagnosis Future Work and Conclusion Next, I will focus on my work on automated concurrency bug fixing After that, I will briefly talk about my work on production run failure diagnosis. Finally, I will discuss my future work and conclude.

Automated fixing is difficult
Description: Symptom Triggering condition … ? Patch: Correctness Performance Simplicity What is the correct behavior? Usually requires developers’ knowledge How to get the correct behavior? Correct program states under bug-triggering inputs No change to program states under other inputs in general, automated fixing is difficult. When programmers manually fix a bug, it usually takes some bug description, which could includes the failure symptom, bug triggering condition, or something else. After some work, a patch is generated. Ideally, we want the patch to have good correctness, performance, and simplicity. In order to automate this bug fixing process, a tool first need to infer the correct program behavior under the bug-triggering input. This usually requires developers’ involvement which is difficult to be replaced by automated tools. Even if the correct behavior can be determined somehow, an automated fixing tool need to generate patch that correct program states under bug-triggering inputs towards the correct behavior, and do not change the program under other inputs. This part is also not trivial.

Automated concurrency-bug fixing?
Description: Symptom Triggering condition … ? Patch: Correctness Performance Simplicity What is the correct behavior? The program state is already correct as long as the buggy interleaving does not occur How to get the correct behavior? Only need to disable failure-inducing interleavings Can leverage well-defined synchronization operations When it comes to concurrency bugs, since even given bug-triggering input, they only cause software to fail only under certain failure-inducing interleavings. As a result, the program behaves correctly as long as the buggy interleaving does not occur. Thus, such bugs can be fixed by systematically disabling failure-inducing interleavings using well defined synchronization operations in thread libraries just as what developers use to synchronize their program.

? Description: Description: Patch: atomicity violation detectors
Symptom Triggering condition … Description: Interleavings that lead to software failure ? How to get a general solution that generates good patches? Patch: Correctness Performance Simplicity atomicity violation detectors ParkASPLOS’09, FlanaganPOPL’04, LuASPLOS’06, ChewEuroSys’10 order violation detectors ZhangASPLOS’10, LuciaMICRO’09, YuISCA’09, GaoASPLOS’11 Since we are going to fix bugs by disabling bad-interleavings, we first need to know the bad interleavings. For this purpose, we leverage existing tools to report them. Many different detectors have been proposed, each focusing on different interleaving patterns In order to show that our approach is general, we choose all 4 representative types of non-deadlock concurrency bug detectors We excluded deadlocks because they have very different properties from other concurrency bugs. Note that, we are not going to require reports for all these four detectors. We will fix the bug based on reports from each individual tool. we relies on reports about what interleavings cause failures. Such as, software fails when an instruction executes between the other two; or whenever an instruction B executes before an instruction A; or whenever an instruction I1 executes immediately before an instruction I2, and succeed otherwise; or when there is an abnormal write-read data dependency. In the remainder of the talk, we will illustrate our technique using bad interleavings reported by an atomicity violation detection. Now with all these different types of bad interleavings, the challenge becomes concrete that we need a general solution that can handle all these different types of bug reports, we also want good patches regarding correctness, performance and simplicity p r c A B data race detectors SenPLDI’08, SavageTOCS’97, YuSOSP’05, EricksonOSDI’10, KasikciASPLOS’10 abnormal data flow detectors ZhangASPLOS’11, ShiOOPSLA’10 I1 I2 Wb R Wg

CFix Description: Patch: Interleavings that lead to software failure
Correctness Performance Simplicity Fix-Strategy Design Bug reports Source code We address the challenge and automate the whole process in the following way. first, for every bug report, we design fix strategies, that disable the buggy interleaving through two basic types of synchronization: mutual-exclusion and order. With these two basic synchronization primitives, we have a general solution to work with different bug detectors. We will rely on testing to figure out which fix strategy works best second, we design static analysis and code transformation routines that enforce mutual exclusion and order synchronization, while maintaining correctness, performance and simplicity. After this step, multiple patched binaries could be generated based on multiple fix strategies. CFix patch testing then effectively selects the optimal patch, pruning incorrect, slow, or complicate patches, including those cause by wrong fix strategy. After repeating these for every bug report, cfix use static analysis to merge related patches for better patch quality, and use low-overhead run-time support to help patch refinement. The final output is a patched binary. We leverage existing testing techniques and introduce heuristics to help quickly prune incorrect patches. . . . Mutual exclusion Mutual exclusion Synchronization Enforcement Order Order Patch Testing & Selection Patched binary . . . Patched binary Patched binary Patched binary Patch Merging Selected binary . . . Selected binary Run-time Support Merged binary Final patched binary

Contributions Fix-Strategy Design
Show the feasibility of automated fixing for non-deadlock concurrency bugs Techniques that enforce mutual exclusion and order relationship A framework that assembles a set of techniques to automate the whole bug-fixing process: CFix Synchronization Enforcement Patch Testing & Selection Overall, this work makes the following contributions: Firstly, we show that it is indeed feasible to fix non-deadlock concurrency bug fixing automatically Secondly, we build static analysis and code transformation tools that enforce mutual exclusion and order relationship that are useful for automated concurrency-bug fixing, Lastly, we build Cfix, a framework that assembles a set of bug detecting, static analysis, and testing techniques to automatically fix a wide variety of non-deadlock concurrency bugs. Patch Merging Run-time Support

CFix: fix-strategy design
Challenges: Huge variety of bugs Synchronization Enforcement Patch Testing & Selection Cfix starts with failure-inducing interleavings reported by difference bug detectors, and fixes bugs by disabling these interleavings. Now we have different types of interleaving patterns, we build a unified solution by first defining two types of sync-relationship as our fixing primitives, namely mutual exclusion and order relationships. Patch Merging Run-time Support

Two types of synchronization relationships
Why these two? Basic relationships can be achieved by typical synchronizations Based on real-world concurrency bug characteristics study Mutual Exclusion Order Relationship The mutual exclusion prevent two instructions from being interleaved by a third instruction, and the order relationship enforce one instruction to execute after another instruction We pick mutual exclusion and order as these two are the basic relationships that can be achieved by typical synchronization operations, and they are the most useful for fix real-world concurrency bugs, as a majority of non-deadlock concurrency bugs are atomicity violation or order violation.

Fix-strategy for atomicity-violation detectors
example 1 Thread 1 Thread 2 if (ptr != NULL) { ptr->field = 1; } Now let’s see the fix-strategy design for bug detectors with the two types of synchronization primitives. I with illustrate this with examples from atomicity violation detectors. A natural fix strategy for this type of bug report is to enforce mutual exclusion. For example, an atomicity violation detector will report that if the null assignment executes in between the if checking and the dereference, the program will crash. Creating two mutually exclusive critical sections will prevent the interleaving and fix the bug. ptr = NULL;

Fix-strategy for atomicity-violation detectors
example 2 Thread 1 Thread 2 ptr->field = 1; However, there are also other ways to disable the failure inducing interleavings reported by an atomicity violation detector. Order enforcement, such as forcing this red node to execute before the yellow node or forcing the green node to execute before the red node, can also make the failure-inducing interleaving disappear, and they also belong to our fix strategies. In fact, some bug report generated by atomicity-violation detectors does require order enforcement to fix. This bug is similar to the previous one except both statements in thread 1 deference the pointer now. Even if we enforce mutual exclusion, the program would still fail under this interleaving To completely fix this bug, we need to enforce the null assignment executes after the two dereferences ptr = NULL;

CFix: fix-strategy design
Challenges: Inaccurate root cause Huge variety of bugs Solution: A combination of mutual exclusion & order relationship enforcement Synchronization Enforcement Patch Testing & Selection From this, we can see the bug report may not reflect their root cause. And our solution design a set of fix strategies using a combination of mutual exclusion and order relationship enforcement, and rely on testing to pick the best patch. Patch Merging Run-time Support

Fix-strategies AV Detector OV Detector Race Detector DU Detector p r c
B I1 I2 Wb R Wg In total, we have three different fix strategies for reports from atomicity violation detectors, as we have discussed. Two of them enforces order, and one enforces mutual exclusion. I will skip details about strategies for other bug detectors here. Note that, we usually provide multiple fix strategies when the detector may report inaccurate bug root causes. We will rely on our patch testing stage to figure out the optimal fix strategy.

CFix: synchronization enforcement
Fix-Strategy Design Challenges: Correctness, performance, and simplicity Solution: Mutual exclusion enforcement: AFix [PLDI’11] Order relationship enforcement: OFix [OSDI’12] Synchronization Enforcement Patch Testing & Selection Now let’s focus on the synchronization enforcement stage. In this stage, cfix enforces the specified synchronization through static analysis and code transformation with correctness, performance, simplicity all considered We call our tool for mutual exclusion enforcement afix, and our tool for order relationship enforcement ofix. Patch Merging Run-time Support

Mutual exclusion relationship
Input: three statements (p, c, r) with contexts Goal: making the code region from p to c be mutually exclusive with r Thread 1 if (ptr != NULL) { ptr->field = 1; } Thread 2 ptr = NULL; p r c Let’s first look at mutual exclusion enforcement. Our tool Afix takes three statements as input, for each statement, with their corresponding contexts AFix will then make the code region from p to c mutually exclusive with r Since we want to put p and c into one critical section, they have to be in the same thread. However, the contexts for p and c may be different that they could be in different functions, but their contexts must also have a non-empty common prefix. For statement r, it is also possible that r is in the same thread as p and c

Mutual exclusion enforcement: AFix
Approach: lock Principles: Correctly paired lock acquisition and release operations Small critical section p r To achieve that, Afix uses lock to create critical sections, which I guess is not surprising at all. We first put p and c into one critical section, and then put r into one critical section. For correctness, we need to correctly pair lock and unlock operations to make sure that p and c are inside one critical section without introducing new problems. For performance, we need to make the critical sections small to maximize parallelism. c

Put p and c into a critical section: naïve
A naïve solution Add lock on edges reaching p Add unlock on edges leaving c Potential new bugs Could lock without unlock Could unlock without lock etc. To put p and c into a critial section, a naïve solution which adds lock before p and unlock after c could lock without unlock, unlock without lock, and many other problems p p p p c c c c

Put p and c into a critical section: AFix
Assume p and c are in the same function f Step 1: find protected nodes in critical section Step 2: add lock operations unprotected node  protected node protected node  unprotected node Avoid those potential bugs mentioned Now let’s take see afix’s solution We first assume p and c are in the same function f for now, and we will extend our algorithm when it is not the case. We analysis the control flow graph of the function f, to find out the set of control flow graph nodes on any path from the p node to the c node. In this example, we have two paths and three nodes. These nodes are the protected nodes in the critical section. Based on this set of protected nodes, we add lock acquisition operation on edge from unprotected node to protected node, and lock release on edge from protected node to unprotected node. With these, we ensure that pc are always inside one critical section while avoiding those potential bugs mentioned previously p c

Subtle details p and c adjustment when they are in different functions
Observation: people put lock and unlock in one function Find the longest common prefix of p’s and c’s stack traces Adjust p and c accordingly Put r into a critical section Do nothing if we can reach r from the p–c critical section Lock type: Lock with timeout: if critical section has blocking operations Reentrant lock: if recursion is possible within critical section Maybe expand this part There are also details about how to extend the algorithm when p and c are in different functions, how to put r into a critical section under certain special cases, and I am going to skip the details about that. Finally, we need to decide what kind of lock to use. In the case where new deadlock is possible, we will use lock with timeout, and if reentrance is possible, we will use reentrant lock. Now let’s extend the algorithm when p and c are in the different functions. The solution we take is inspired by the observation that people usually put lock and unlock in the same function. To achieve this, we leverage the calling contexts of p and c. We first find the longest common prefix for both calling contexts, since p and c comes from the same thread, it always exists. In this example from mysql, there is a function newlog, which will access the same variable through close and open. The two accesses have to be atomic with respect to some r access. The developers failed to do so, and the atomicity violation detector we use will report a triple with their corresponding call stacks. Here the longest common prefix ends with newlog Then we adjust p and c to be the instructions that call to the next function in the call stack, so that they are in the same function now and the algorithm described previously can be used again.

Order relationship Input: two statements (A, B) with contexts
There could be multiple instances of A in one thread There could be multiple threads that could execute A There could be no instance of A during the whole execution Goal: making A execute before B Now let’s look at order enforcement. Our tool ofix takes two statements A and B as input, for each statement, it also requires the corresponding context for each statement. I just said we are going to let B to wait for A, but when should we allow B to execute, given that one thread could execute A multiple times, and there could be multiple threads that could execute A. It is also possible that there is no instance of A during the whole execution.

Order relationship: two sub-types
Ai A B Aj … ? … So we divide further into two sub-types allAB, making all instances of A before B, this is good for problems where A uses some resource and B will destroy it and first AB, making one instance of A before B if there if one A, this is good for cases where A does initialization certain value and B read that value We use either or both of them to fix a bug based on the fix strategy. allA-B firstA-B A1 B An … A1 B An … use initialization destroy read

OFix allA-B enforcement
Approach: condition variable and flag Insert signal operations in A-threads Insert wait operation before B Principles A-thread signals exactly once when it will not execute more A A-thread signals as soon as possible B proceeds when each A-thread has signaled to enforce allAB, we are going to use condition variables together with flags At high level, we Insert wait operation before B; Insert signal operations in A threads. That is, threads that can execute A or create threads to execute A. our static analysis guarantees that each A-thread signals exactly once when it has no chance to execute more instance of A for correctness concerns, and signals as soon as possible for performance concerns. B will proceed when every A thread has signaled. Next I will describe how to find the location for the signal operation in one thread, how to count the number of threads that would signal, and what is in out signal operation

OFix allA-B enforcement: A side
How to identify the last A instance in one thread . . .; for (. . .) . . . ; // A A First, how to place the signal operation inside a thread. In this simple code snippet, A is inside a for loop and could execute 0 or multiple times. We decide the location for signal operation based on the cfg of this function. OFix statically analyzes the control flow graph and finds the set of nodes that can reach A. Now look at this edge which crosses from a red node inside the reaching-node set to a blue node outside the set, we place a signal operation, as one the execution come across such an edge, it can no longer execute more A. With this, OFix guarantees that the thread will execute exactly one signal operation as soon as it can execute no more A. Each thread that executes A exactly once as soon as it can execute no more A

OFix allA-B enforcement: A side
How to identify the last thread that executes A void main() { for (. . .) thread_create(thr_main); . . .; } void thr_main() { for (. . .) . . . ; // A . . .; } counter for signal threads void ofix_signal() { mutex_lock(L); --; if ( == 0) cond_broadcast(con); mutex_unlock(L); } Another challenge is that the number of threads that can execute A may be statically un-decidable, such as in this example where main creates child A-threads in a loop and the child A-thread executes A in a loop. So far, our static analysis ensures that every child A-thread will signal exactly once when it will no longer execute A. However, we do not know which child A-thread is the last to execute A. Then when can we allow B to execute? OFix addresses this challenge in two steps. First, it transforms the main thread, so that main signals exactly once when it finishes creating child A threads. The signal placement algorithm is the same as the one for child A-thread. Second, OFix introduces a counter to count how many threads are yet to signal in each run. The counter is initialized to 1 which means that the main thread will signal exactly once. It is increased by one for each successful creation of a A-thread since each child A-thread will signal exactly once. The signal operations will decrease the counter value. Consequently, it is safe to execute B, Only if the counter reaches zero. The signal operation will also do a broadcast to wait up blocked B threads if there is any. thread _create A =1 ++

OFix allA-B enforcement: B side
Safe to execute only when is 0 Give up if OFix knows that it introduces new deadlock Timed wait-operation to mask potential deadlocks void ofix_wait() { mutex_lock(L); if ( != 0) cond_timedwait(con, L, t); mutex_unlock(L); } With all these signal operations added in the A side, it is safe for the B to execute when the counter reaches 0 which means every A-thread has signaled. so, OFix simple adds code to check the counter value immediately before the B operation, and block the thread if the counter is not zero and wait to be waken up. Note that extra order enforcement could lead to new deadlocks. Sometimes, our static analysis is able to tell that and gives up the corresponding fix strategy. In all other cases, OFix allows the wait operation to time out. With the timed wait operation, liveness is ensured by turning potential undetected deadlocks into time outs. We will provide further support to understand whether there is indeed a deadlock behind a run time wait operation time out. B

OFix firstA-B A B Basic enforcement When A may not execute
Add a safety-net of signal with allA-B algorithm That ends our discussion about allAB. Now let’s look at firstAB where B is allowed to execute once at least one A has executed. To achieve this, condition variable and flag are used in the most classic way. ofix will first insert a signal operation immediately after A and a wait operation immediately before B, where the signal operation will set a flag and wake up blocked threads if the flag has not been set yet, and wait operations will check the flag to decide to proceed or wait to be waken up. However, there is a trap here that A may not execute during the whole program execution. Then the signal after it may also be skipped which may make the wait operation time out. ofix then adds a safety-net of signal operations to allow B to proceed if the whole program execution does not contain A at all. That algorithm is similar with allA-B order enforcement and is skipped here. A B

CFix: patch testing & selection
Fix-Strategy Design Synchronization Enforcement Patch Testing & Selection Challenge: Multi-thread software testing Solution: CFix-patch oriented testing After the synchronization enforcement stage, we have several candidate patched binaries, one for each fix strategy. As we discussed earlier in our fix strategy design, they are not equally good, so next we test these binaries efficiently pick the best patch. Patch Merging Run-time Support

Patch testing principles
Prune incorrect patches Patches causing failures due to wrong fix strategies, etc Prune slow patches Prune complicated patches Not exhaustive testing, but patch oriented testing Leverage existing testing techniques, with extra heuristics We are not doing traditional multi-thread software testing here, rather we are doing cfix-patch oritented testing trying to select the best patch. Our goal is to prune incorrect patches, especially those with wrong fix strategies. From the remaining, we pick patch with best performance and simplicity. Note that, we do not use exhaustive interleaving testing to find incorrect patches, which is impractically time consuming We leverage existing testing facilities in our front-ends, we also add extra heuristics to quickly prune incorrect patches.

Run once without external perturbation
Reject if there is a time-out or failure Patches fixing wrong root cause Make software to fail deterministically Our first heuristic is that We simply run the patched software once without external perturbation. If there is a timeout or failure, the patch will be rejected This is helpful because many patches with wrong fix strategies that does not fix the root cause would cause software to timeout or fail deterministically. In this example we have discussed early, one order enforcement to disable the failure-inducing interleaving will enforce the null assignment happen before the two dereferences. it actually makes the program crash deterministically, and will be pruned. Thread 1 Thread 2 ptr->field = 1; ptr = NULL;

Implicit bad patch A failure in patch_b implies a failure in patch_a
If patch_a is less restrictive than patch_b Helpful to prune patch_a Traditional testing may not find the failure in patch_a Another heuristic allows us to identify an incorrect patch without even running it if a failure in one patch implies a failure in another patch. For example, we have three different fix strategies to disable an failure inducing interleaving reported by an atomicity violation detector, if one of the order enforcement patch fails, the mutual exclusion patch will be rejected as well. The reason is that any interleaving allowed by an order relationship patch is also feasible under the mutual exclusion patch, thus the failure-inducing interleaving encountered by the former can still occur under the mutual exclusion patch. Traditional testing may not be able to make the mutual exclusion patch to fail in a few runs. Yet we can successfully reject the patch without running the patched binary. OF course, CFix testing still does not guarantee to identify all incorrect patches, just like all patch testing processes in practice. a Mutual Exclusion b c Order Relationships

CFix: patch merging Fix-Strategy Design Challenge:
One single programming mistake usually leads to multiple bug reports Solution: Heuristics to merge patches Synchronization Enforcement Patch Testing & Selection after selecting the best patches for each bug report, we are not done yet. in practice one synchronization mistake usually leads to multiple bug reports, and they should be fixed together. Patch Merging Run-time Support

An example with multiple reports
c1 p2 r1 c2, r2 void buf_write() { int tmp = buf_len + str_len; if (tmp > MAX) return; memcpy(buf[buf_len], str, str_len); buf_len = tmp; } p1 c1 p2 In this piece of code which can be executed concurrently by multiple threads, we have two atomicity violations to fix. If we handle them separately, we are going to add these lock and unlock operations, which is too many, as we are adding seven lock/unlock operations into these five lines of code. It also has a potential deadlock. It may also hurt performance and simplicity. In order to further improve patch quality, we are going to merge patches for related bugs. We have cases for Related patches and Redundant patches, and I am going to present some cases. r1 c2, r2 Too many lock/unlock operations Potential new deadlocks May hurt performance and simplicity

Related patch: a case of AFix
Merge if p, c, or r is in some other patch’s critical sections lock(L1) p1 p2 c1 c2 unlock(L1) lock(L1) p1 lock(L2) p2 c1 unlock(L1) c2 unlock(L2) lock(L1) r1 unlock(L1) Here is a case of related patch in afix, we have created critical sections separately give two triplets of pcr, and we notice here that p2 is in the critical section of p1-c1, so we are going to merge the patches. In order to merge the patch, we simply merge the two protected node sets, and adding lock operations based on the new set. We then use the same lock variable for all critical sections lock(L1) r2 unlock(L1) lock(L2) r2 unlock(L2)

The merged patch for the example
c1 p2 r1 c2, r2 void buf_write() { int tmp = buf_len + str_len; if (tmp > MAX) { return; } memcpy(buf[buf_len], str, str_len); buf_len = tmp; p1 p1 c1 p2 c1,p2 Here is the result of patch merging for the previous example, and we only have one lock operation and two unlock operations now. This merged patch improves correctness and performance as it does not introduce new deadlocks. It also improves simplicity as the number of synchronization operations are reduced. r1 c2,r2 c2,r1,r2

CFix: run-time support
Fix-Strategy Design Synchronization Enforcement Patch Testing & Selection To understand whether there is a deadlock underlying time-out Low-overhead, and suitable for production runs Mention timed op in enforcing and testing CFix patch is not error-free after all these steps. There could be time-out in our inserted synchronizations caused by a masked deadlock. we add a low-overhead run-time monitoring into the program to help understand whether there is indeed a deadlock underlying a time-out. Actually our run-time has near-zero performance impact if there is no time-out. Due to time limit, I am not going to discuss further. Patch Merging Run-time Support

Evaluation methodology
APP. PBZIP2 x264 FFT HTTrack Mozilla-1 transmission ZSNES Apache MySQL-1 MySQL-2 Mozilla-2 Cherokee Mozilla-3 AV OV RA DU Detector Detector Detector Detector Next let’s see how it works. Cfix aims to automate the fixing for a wide variety of concurrency bugs. To understand how effective cfix is on real-world concurrency-bugs, we select 13 real world concurrency bugs that have been used to evaluate bug detectors. To understand whether cfix is able to work with different detectors, we used four different representative bug-detectors as mentioned earlier.

Evaluation result APP. PBZIP2 x264 FFT HTTrack Mozilla-1 transmission
ZSNES Apache MySQL-1 MySQL-2 Mozilla-2 Cherokee Mozilla-3 AV Detector OV RA DU ü û # of Ops 5 7 2 3 9 We apply bug detectors to each bug case, and cfix patches the program based on bug reports from each detector independently. These four columns show the overall patch quality. A blank entry means the bug detector does not report anything. A check means that cfix fixes the root cause of the bug case based on bug reports from that detector with a overhead of less than 1%. You can see most non-blanks entries are checked. The patches for each bug case based on bug reports from different bug detectors are generally identical or have only trivial differences. A cross means that cfix fails to fix the bug, and we only have one such case. In that case, cfix rejected all patches. Although the problem is not fixed, cfix does not make things worse. Manual inspection of cfix patches also shows good simplicity that the numbers of added synchronization operations are small, benefiting from our simplicity optimizations and merging. Except 7 and 9 on two bug cases, others are all no more than 5. The synchronization operations added by cfix are all necessary under our fixing strategy. note that before our simplification optimization and merging, some of them has more than one hundred synchronization operations Among these 13 cases, the top 7 of them require order enforcement, and the bottom 6 of them require mutual exclusion enforcement. Other than the second detector, the other three detectors generate bug reports which does not reflect root cause as highlighted, and cfix is still able to correctly fix the root cause based on these imperfect bug reports.

Comparison with manual patches
APP. PBZIP2 x264 FFT HTTrack Mozilla-1 transmission ZSNES Apache MySQL-1 MySQL-2 Mozilla-2 Cherokee Mozilla-3 Manual Patch CFix patches have similar correctness and performance Manual patches integrate better with existing code Order with pthread_join Order with pthread_join Order with pthread_join N/A Order with lock We also compared manual patches with our patches. On correctness, cfix patches are doing great here, that cfix patches are enforcing the same type of synchronization semantics as manual patches On performance, cfix patches overhead are similar to manual patches However, manual patch integrate better with existing code. For example, on cases we enforce first AB order relationship, manual patch may change the order of certain statements, on cases where we enforce mutual exclusion with a new lock variable, manual patch may leverage an existing one. This is one thing currently working on, and hopefully I can finish that before graduation. Move before pthread_create Move before pthread_create New lock in structure Existing lock and variable Existing lock Make the variable local Existing lock Customized synchronization

CFix summary CFix uses some heuristics, with good results in practice
A combination of mutual exclusion and order enforcement Use testing to select the best patch Fix root cause without requiring detectors to report it Small overhead and good simplicity Concurrency bugs are feasible to be fixed automatically By removing bad interleavings Must be careful in the details Hopefully I have convinced you that concurrency bugs are indeed feasible to be fixed automatically Through our design and evaluation of cfix.

Outline Motivation and Overview CFix: Automated Concurrency-Bug Fixing
CCI/PBI/LXR: Multi-threaded Program Failure Diagnosis Future Work and Conclusion Next, I will briefly talk about my work on production run failure diagnosis.

Production run failure diagnosis
Performance impact has to be small for production runs Existing concurrency bug detectors are not suitable Diagnosis needs to be informative for concurrency bugs Existing failure diagnosis tools, e.g., CBI, are not helpful Cooperative concurrency bug failure diagnosis Follow the high-level philosophy of statistical debugging Design monitoring schemes that reflect interleaving abnormity Design sampling strategy for each new monitoring scheme Software does fail under production runs. On windows, the users have the option to send some error report back to microsoft. With such run-time information, developers or automated tools can try to figure out which part of the software caused the specific failure. To collect such information from production-runs, the performance impact has to be small. This rules almost all existing bug finding tools out from being directly used for failure diagnosis, since their overhead is usually more than 10X slowdown. The state of art production failure diagnosis tool leverage the fact that real-world software is usually deployed very widely, and spread the overhead of information collection among all these deployment sites. This is called cooperative failure diagnosis. Previous work on cooperative failure diagnosis has not looked at concurrency bugs, and their monitoring scheme design cannot explain failure caused by concurrency bugs. My work follows the high-level approach, provides a set of monitoring schemes that can explain a wide variety of concurrency bug failures. For each predicate scheme, there is also a sampling strategy to lower the monitoring overhead at each deployment site.

 CCI overview [OOPSLA’10]
Three different types of predicates Each predicate has its supporting sampling strategy Predicates Sampler True in most failure runs, false in most correct runs. I have worked on one pure software solution and two hardware-assisted solutions. This is the overview of CCI, which is our pure software solution given the source code, CCI statically instruments to collect certain run-time predicates that might help explain software failures, Such as whether the previous access to the same memory location comes from the same thread // such as whether a branch takes the true branch or the false branch, whether a function returns 0 or not, etc. To lower the run-time overhead, CCI uses sampling on its predicate evaluation and collection. During the production-run, the instrumented software will send back feedback, that includes predicates profile and whether this run failed. CCI’s statistical debugging will identify those predicates that have high statistical correlation with failures as failure predictors. Intuitively, a good predictor is a predicate that is true in most failure runs and false in most correct runs. This finish the failure diagnosis. If you are familiar with previous work CBI, cooperative bug isolation, you may notice the same architecture. But CBI will not work on concurrency bug failures because it is designed mainly for sequential programs. CCI’s contribution here is on the predicate design and sampling scheme design. Program Source Compiler Statistical Debugging Counts & J/L Predictors

 PBI/LXR overview [ASPLOS’13, 14]
PBI: performance counter based LXR: last execution record based Hardware assisted True in most failure runs, false in most correct runs. PBI/LBX has a similar structure, but differs on predicate and sampling strategy design. They both works directly on compiled binaries and leverage hardware support. In particular, PBI uses performance counter and LXT uses last execution record Binary Statistical Debugging Counts & J/L Predictors

Future work Bug Finding Bug Fixing Failure Diagnosis Better tools
Improve patch quality and bug finding capability Integrate failure diagnosis with automated fixing Broader scope Performance and security issues causes by concurrency Automated programming assignment fixing Emerging platforms Mobile systems and cloud Bug Finding Bug Fixing Going forward, I will keep my work on providing better tool support for programmers. In the short term, I plan to improve the quality of the tool chain for concurrency bug handling. In particular, on bug fixing, I will target patch quality improvement to make it even closer to manual patches. I plan to first conduct fixing-oriented bug study focusing how they are fixed, and use the results to guide fixing tool improvement. On bug finding, the goal of finding more bugs will remain. I will leverage my experience on bug fixing tools to increase the scopes of bugs that can be found and then fixed. Finally, for our failure diagnosis tools. In the current cfix evaluation, it is not used as one bug understanding tool, the reason is that the information gather during production run is limited. I plan to work on how to gather amplified information offline based on the limited information collected online. So that fixing tools can then fixed the bugs based on the amplified information, eventually I would like to combine failure diagnosis tools and bug fixing tools to get a self-healing system for concurrency bugs I expect that interplay among various tools can improve each individual tool, and the toolchain as a whole. Going beyond correctness issues caused by concurrency. I will also look into performance and security issues caused by concurrency. Our performance bug study already confirms that there are quite some performance bugs are caused by over-synchronization, and I will take it as the starting point. I am also very interested in building tools for automated programming assignment fixing, partly because of my TA experience. I expect multi-core machines are here to stay, so the need to tools on concurrency will remain. There are other new platforms now. I plan to look into mobile systems and cloud systems, and work on tools that can help developers to write correct and efficient software. Failure Diagnosis

Questions Bug Detection and software testing: ConSeq Automated
[ASPLOS’11] Automated Concurrency-Bug Fixing: CFix Wrap up With that, I am happy to take questions [PLDI’11*, OSDI’12] *Received a SIGPLAN CACM nomination Production-Run Failure Diagnosis: CCI/PBI/LXR [OOPSLA’10, ASPLOS’13 & 14]

p and c Adjustment p and c adjustment when they are in different functions Observation: people put lock and unlock in one function Find the longest common prefix of p’s and c’s stack traces always exists Adjust p and c accordingly Now let’s extend the algorithm when p and c are in the different functions. The solution we take is inspired by the observation that people usually put lock and unlock in the same function. To achieve this, we study the calling contexts of p and c. We first find the longest common prefix for both calling context, since p and c comes from the same thread, it always exists. In this example from mysql, there is a function newlog, which will access the same variable through close and open. The two accesses have to be atomic with respect to some r access. The developers failed to do so, and Ctrigger reports a triples with their corresponding call stacks. Here the longest common prefix ends with newlog Then we adjust p and c to be the instructions that call to the next function in the call stack, so that they are in the same function now and the algorithm described previously can be used again. void newlog() { … p: close(); c: open(); } void newlog() { … close(); open(); } p: c: void close() { … log = CLOSE; } void open() { log = OPEN; close() newlog() … open() newlog() … 13

Put r into a critical section
Can we reach r while in the p–c critical section? Yes: do nothing No: lock-acquisition before r, lock-release after r fpc() { lock(L1) p ... r … c unlock(L1) } case 1 fpc() { lock(L1) p ... foo() {…r} … c unlock(L1) } case 2 The second step is to put r into a critical section. It is possible that we could reach r from the p-c critical section. If this is the case, we are done here, Note there are two ways to reach r from the pc critical section. Case 1 is the r instruction is in the pc critical section directly. Case 2 is that the function for pc critical section is part of r’s call stack, and the call to the next function in the stack is in pc’s critical section otherwise, we add lock before r, and unlock after r. r’s call stack: … fpc foo …r 14

Select or introduce a lock
Use the same lock for the critical sections Lock type: Lock with timeout : if critical section has potential blocking operations Reentrant lock : if recursion is possible within critical section Otherwise: normal lock Lock instance: Global lock instances are easy to reuse In practice, always need a new lock instance Now, we have one or two critical sections. Then we need to decide which lock to use. It is obvious that we should use the same lock if there are two critical sections For the lock, we have two properties need to be decided: The first is lock type: when the critical section has no potential blocking operations, we know there is no danger of new deadlock. otherwise, we use lock with timeout. We also check whether the lock operations could be called recursively, and use reentrant lock if recursion is possible. In all other cases, we just use normal pthread lock type. The next is on lock instance. We hope we could reuse some existing locks when possible. But only global lock instances are easy to reuse. We check whether reusable global lock exists, however, we found none in our benchmarks. 15

CCI/PBI/LXR summary CCI/PBI/LXR is capable and suitable to diagnose many production-run failures caused by concurrency bugs CCI overhead is small on most evaluated applications, and PBI/LXR can achieve even smaller overhead LXR further improves diagnosis latency We have evaluated all these tools, and they are capable and suitable for production-run failures diagnosis. CCI’s overhead is good for many applications, and PBI/LXR further improve the performance aspect because of hardware support. LXR further address the long diagnosis latency problem of CCI and PBI as it does not require sampling.

Redundant patch: a case of OFix
Merge if one allA-B relationship is enforced by the other while (1) { mutex_lock(L); // A1 if (. . .) { OFixSignal1; OFixSignal3; mutex_unlock(L); // A2 OFixSignal2; OFixSignal_m; return; } . . . mutex_unlock(L); // A3 A1 OFixSignal1 A2 OFixSignal2 . . . A3 OFixSignal3 A1 A2 OFixSignal_m . . . A3 Here is a case of redundant patch in ofix. In order relationship enforcement, it is common that one order is enforced once another order is enforced. In this example, we have three A operations which all share the same B operations. Separately we are going to add signal operations before and after A2. After merging only the signal operation after A2 will be kept.

Diagnosing and Fixing Concurrency Bugs

Similar presentations

Presentation on theme: "Diagnosing and Fixing Concurrency Bugs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Diagnosing and Fixing Concurrency Bugs

Similar presentations

Presentation on theme: "Diagnosing and Fixing Concurrency Bugs"— Presentation transcript:

Similar presentations

About project

Feedback