Automated Atomicity-Violation Fixing

Automated Atomicity-Violation Fixing
Jeremy Bo Tanima Lingmei

Problem Fixing concurrency bugs is challenging
Root cause Error-prone patches Serialization bottlenecks Many ideas for detection, but few for fixing

Related Work Identification Bug Fixing
Force interleaving to identify problems Bug Fixing Concurrent program synthesis Infer synchronization based on user input Steer execution to avoid failure Hardware watch points

Contributions Implement tool AFix which detects and fixes single variable atomicity violations Improve code readability Fix bugs without large performance degradation Minimal deadlock risk

Atomicity Violation: Example
void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1],str,str_len); buf_len=tmp; }

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O buf_len = 2 MAX = 7

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O Input: str = JOE str_len = 3 buf_len = 2 MAX = 7

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 tmp = 5

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O J E Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 tmp = 5

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str,str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O J E Input: str = JOE str_len = 3 buf_len = 5 MAX = 7 tmp = 5

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; }

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O buf_len = 2 MAX = 7

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O Input: str = JOE str_len = 3 buf_len = 2 MAX = 7

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 tmp = 5

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 Input: str = SAM str_len = 3

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O Input: str = SAM str_len = 3 buf_len = 2 MAX = 7 tmp = 5

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O S A M Input: str = SAM str_len = 3 buf_len = 2 MAX = 7 tmp = 5

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O S A M Input: str = SAM str_len = 3 buf_len = 5 MAX = 7 tmp = 5

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1],str,str_len); buf_len=tmp; } Input: str = JOE str_len = 3 buf_len = 5 MAX = 7 Input: str = SAM str_len = 3

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } B O S A M J 1 2 3 4 5 6 7 Input: str = JOE str_len = 3 buf_len = 5 MAX = 7 tmp = 5 ERROR

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; }

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O J E Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 tmp = 5

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } Input: str = JOE str_len = 3 buf_len = 2 MAX = 7 Input: str = HI str_len = 2

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O J E H I Input: str = HI str_len = 2 buf_len = 4 MAX = 7 tmp = 4

void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; } Input: str = JOE str_len = 3 buf_len = 4 MAX = 7 Input: str = HI str_len = 3

Buffer void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len], str, str_len); buf_len=tmp; } 1 2 3 4 5 6 7 B O J E H I Input: str = HI str_len = 2 buf_len = 5 MAX = 7 tmp = 5

Background: CTrigger In-house tool to detect potential single variable atomicity violations Identifies potential violations via static analysis Prunes infeasible interleaving Ranks possible violations based on probability of occurrence Attempt to expose bugs via delays at specific points

Background: CTrigger Terminology
(p, c, r) triple: Preceding, current, remote two consecutive accesses (p, c) to the same variable in the same thread are interleaved by a remote access (r) by another thread Causes execution to differ from serial execution

Background: CTrigger

Background: CTrigger Modifications Limitations
Output all possible combinations for c instructions Return complete call stack for p, c, and r Limitations Exposed bugs should not always use separate patches May not expose the root problem

Naïve Fix Require lock before p and r, release after c and r Lock Lock
Unlock Unlock

Naïve Fix void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; }

Naïve Fix void buf_write(…){ Lock int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); Unlock buf_len=tmp; }

Naïve Fix void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; }

Naïve Fix void buf_write(…){ int tmp = buf_len+str_len; if(tmp > MAX) return Lock memcpy(buf[buf_len+1], str, str_len); buf_len=tmp; Unlock }

Contributions & Methodology

Fixing Bug Report (30s) Thanks Geremy. Now I'm going to introduce how to fix the atomicity violations by AFix.

Fixing Strategy Fix one bug report Fix multiple bug report
(1') First, I'll talk about how AFix fixes a single atomicity violation detected by CTrigger. Then we will discuss how to fix multiple bug reports As Geremy has mentioned, it may cause failure when an instruction r non-serializably interleave two other instructions p and c due to a single CTrigger bug report. In order to fix this bug, we need to change the code to ensure the code region from p to c is mutually exclusive with r. Fix one bug report a single atomicity violation Fix multiple bug report multiple bugs reported by bug detectors

How AFix changes the code
critical region 1 critical region 2 (1'30) Afix realize this change in four steps: 1. put p and c into a critical region. we should guarantee that p and c are inside one critical region under all possible control flows, without introducing new bugs. potential new bugs inclding double lock, double unlock, unlock without lock, deadlocks. 2. put r into another critical region. Just Add a lock-aquisition operation before r and a lock-release after r. 3. Make the above two regions mutually exclusive. We will also discuss hidden traps and how to avoid them. 4. select or introduce a lock to protect the p-c critical regions. p c r avoid introducing new bugs? mutually exclusive

Single-Function Operation
Condition 1: assume the function is not recursive. the set of protected nodes = P∩C={p,c,m,q}∩{p,c,m,n,q}={p,c,m,q} (4') condition 1: p&c are in the same function --abstract an instruction as a node in directed control-flow graph. --get the set of CFG nodes that are on any path that starts from p and ends at c, without touching p or c in between. So here are some examples to explain how we get the protected nodes. (It seems similar to blackbox-testing when people could only manipulate the beginning and the end. In blackbox the beggining and the end are inputs and outputs while here are lock from unprotected nodes to protected nodes and unlock in contrast.) There are three steps. --First is depth or breadth-first forward search starting from p. We add the conception of node degree to be better understood. In each searching path, the in-degree(c)<=1 and the out-degree(c)=0. That means c could only go through at most once and be regarded as the terminal if it is reachable. --Second is backward search starting from c. Here we don't care the degree of c any more. In each path, the in-degree(p)=0 and the out-degree(p)<=1. Three paths examples. --Compute the intersection of the two set P and C, finally get the set of protected nodes. p q m c n (2) Search backword from c, for each searching path, let out-degree(p)≤1 and in-degree(p)=0 C={p,c,m,n,q} (1) Search forward from p, for each searching path, let out-degree(c)=0 and in-degree(c)≤1 P = {p,c,m,q}

unreleased lock / potential deadlock
if(gPtr){ puts(gPtr); unlock; } else{ ... (1') Let's look at some examples AFix could solve. In the left graph, if we go through the right branch, we may be trapped in unreleased lock or potential deadlock later. After the fixing of AFix, the grey nodes are protected nodes and the others are unprotected. What we need to do here is just to insert unlock from protected nodes to unprotected nodes (from p to a), adding unlock in else statement. (AFix inserts lock-acquisition operations on each edge that crosses from an unprotected node to protcted node and inserts lock-lease operation on each edge that crosses from a protected node to an unprotected node) a b unlock; insert unlock from protected nodes to unprotected nodes

double lock and unlock without lock
(1') Another example is to solve double lock and unlock without lock. In the first graph --If we go through the right branch and are trapped in a-p loop, then it may cause double lock. --If we go through the left branch directly, it may end with unlock without lock. After the fixing of AFix, the three nodes are all protected nodes and we only need to add lock at the loop entrance and unlock at the exit. while(...){ lock; ptr = aPtr; } puts(ptr); unlock; lock; while(...){ a a insert lock from unprotected nodes to protected nodes

Deadlock Analysis and Avoidance
critical region (1'30) 1. AFix statically analyzes each critical region to determine whether it includes any potentially-blocking operations. 2. If this analysis finds no potentially-blocking operations within the critical region, then there is no risk of deadlock, use usual lock. If it finds pontentially-blocking operations (lock acquisition, ad-hoc spin loops), use another lock. 3. If it is unable to get lock even after the maximum delay, it will time out and have some defects. We will talk about it later. If it can acquire lock, then comtinue the program. (To identify ad-hoc spin loops, AFix checks whether there is loop inside the critical region, and whether heap or global variables are accessed inside the loop.) pthread_mutex_timelock to acquire locks AFix statically analysis acuqire lock after maximum delay? potentially-blocking operations? Yes Yes No No pthread_mutex_lock to acquire locks continue program time out

Single-Function Operation
Condition 2: the function is recursive. (1') When we meet the recursive condition, we use reentrant function to solve it. AFix implements reentrant locks by associating a mutex.count counter and a mutex.owner thread-ID with each reentrant lock mutex. mutex.owner: records the thread-ID of the lock's current owner mutex.count: records the current nesting level in the owner thread now we don't need to use different locks, we can use the same reentrant lock in recursion. void foo(){ lock; ptr = aPtr; if (...) foo(); puts(ptr); unlock; } reentrant_lock; reentrant_unlock; mutex.owner: thread-ID mutex.count: nesting level reentrant lock avoids double lock

Multiple-Function Operation
when p and c come from different functions (1') After we talk about the Single-Function Operation, we will briefly discuss Multiple-Function Operation. That means when p and c come from different functions, newlog() can recognize close(), close can recognize insert(), but there is no atomicity violation detection between newlog() and insert(). So the solution is easier, just substitute p&c with the real call node in the innermost function. The function newlog() is more like a coat. could be ignored in the log. (identify the last innermost function f on the common prefix of the two call-stack chain.) Problems: no detection? Solution: substitute p and c with the real call node in the innermost function.

Harmonizing two critical regions -mutually exclusive
Remenber the third step that AFix changes the code? We should make any two critical regions mutually exclusive. That means if r is subsumed within p-c, it is redundant and we should remove its lock. In the example, it's clear to see that we should remove the critical region around the code in line 8. (make the two critical regions harmonized so that they cooperate without introducing new bugs) p ... (r) c r if any of the calls in f (a function containing p&c) which lead to r are themselves inside the p-c critical region, then the lock operation of r is redundant and should be removed

Assessing AFix patch not introduce new bugs
Pay attention to the second disavantage, we have mentioned in the deadlock analysis before. If we find potentially-blocking operations and use pthread-mutex-timelock, it may time out. Even thouth it can not bring deadlock ,it will certainly lose the guarantee of atomicity. The third disadvantage is rare to happen in practical. not introduce new bugs manage to avoid introducing uncessary performance degradation always end a critical region immediately when there is no hope to reach c could cause temporary circular wait a lock inside may time out and no longer guarantee atomicity p-c pair protection does not work due to more than one instances of p or c

Fixing Strategy Fix one bug report Fix multiple bug report
a single atomicity voilation Fix multiple bug report multiple bugs reported by bug detectors Now we are going to talk about multiple bug reports briefly.

refine single bug final patch 9/11/2018 (30s)
AFix first designs patch for each bug independently Before applying these patches to the software, AFix considers all patches together, Now matter removing the redundancy or merging.

Fix multiple bug reports
two patches have overlapping critical regions (30s) There are two situations when considering all patches together. One is that one patch subsumes to another, the other is two patches overlap each other. (bug detectors ofter report multiple bugs that should be fixed by one patch.) one patch subsumes to another

Remove redundant patches
Condition 1: patch 1 patch 2 p q m c n patch 2 is subsumed by patch 1 if and only if CFG(patch 2) ⊆ CFG（patch 1） (1') The way to solve the first situation is to remove redundant patches. Remember that to realize mutually-exclusive between two critical regions, we have discussed the methods to remove redundant critical region. The previous one is mainly to remove r region from p-c region, here is to remove one p-c region due to another. Actually we should consider the triple (p,c,r) and (p',c',r'), however since r is extremely short in this region, AFix chooses to discard the subsumed p-c critical region in this situation. Under the AFix's lock policy, a critical region patch 2 is subsumed by another critical region patch 1 if and only if the set of CFG nodes in patch 2 critical region is a subset of those in patch 1.

Remove redundant patches
example 9/11/2018 Now we can delete patch 2 (lock(L2) & unlock(L2)).

Merging critical regions
Condition 2: step 1: update the positions of lock and unlock operations put unlock on exit edges and lock on enter edges step 2: unite lock variables. 9/11/2018 (30s) step 1: This has the effect of removing all redundant lock and unlock operations among merged patches. step 2: Unite lock variables. AFix arbitratily chooses on lock variable to use and puts this variable into every lock and unlock operation performed by the merged patch.

Merging critical regions
example (30s) 1. update the position of lock and unlock (delete old and add new) 2. unite variables lock(L1) p1 lock(L2) p2 c1 unlock(L1) c2 unlock(L2) lock() p1 p2 c1 c2 unlock() lock(L) p1 p2 c1 c2 unlock(L)

Experiments

Run-time Monitoring and Feedback
Developers might split the big critical section to reduce lock contention and improve performance Additional run-time information to refine the patches Deadlocks caused by AFix patches can manifest as lock time-outs AFix implements two deadlock detection algorithms AFix also monitors performance by measuring the wait time and number of time-outs

Deadlock Detection: First Scheme
Suitable for in-house patch testing Maintains a resource graph from the beginning of the execution and looks for cycles when an AFix-added lock times out Overhead can be very high if the density of lock and unlock is very high

Deadlock Detection: Second Scheme
Suitable for production-run deployment AFix starts monitoring and analysis after the lock times out AFix recovers the resource graph after the time out by looking for lock() and unlock() since the timeout AFix relies on the developers to refine the patches

Experiments Goal: To evaluate the effectiveness of AFix in generating patches for the concurrency bug Atomicity violation for single shared variable Determine the patch quality from three aspects Correctness Performance Code readability

Methodology Implementation: using LLVM
Platform: Intel Xeon machine with 8 cores and Red Hat Linux with kernel Workload: Eight real-world bugs from six open source applications FFT, Apache, MySQL, Mozilla, Cherokee

Methodology AFix generated several versions of the programs
Buggy version: Original Patched versions: Naïve Manual Unmerged Merged

Overall Results Fixed, no new bugs, small performance degradation
Patches generated with merging are highly competitive with manual, for mysql2 and mozilla2, its even better Fixed, no new bugs, small performance degradation Incomplete, does not hurt performance new bugs, no significant failure reduction, high performance degradation

Correctness: Methodology
For each bug: same noise injection for all versions of the program Same program region Same sleep probability Same sleep length Slightly different sleep probability and length for different bugs Executed each version 100 times

Correctness: Results Actual Reasons other than Aviolation
Non-deterministic deadlock fig 7c Merges six critical sections Naïve/: inter-procedutal locks for lysql2, mozilla1

Correctness Applied CTrigger on the patched version and manually checked the patches All the bugs were fixed No new bugs

Performance: Execution time
Pbzip: bug was parent thread destroying shared object before worker is done, Ctrigger wrong prediction as Aviolation and cause the entire worker inside a CR Pbzip: unmerged is higher than merged because it suffers from deadlock time out Methodology Compare time across different versions

Performance: Patch generation time
Methodology Time to complete static analyses Result No more than one second to analyze and develop patches

Readability Methodology Result Case studies Manual inspection
Merging technique improves patch readability and maintainability For Cherokee bug, merging removed 3 global lock variables, 5 lock and 6 unlock operations

Pros？

Pros Well organized Nice examples：figures and footnotes
One bug > multiple bugs One function -> multiple functions Nice examples：figures and footnotes Real-world bugs Considering factors other than correctness: Code readability, Performance, Deadlocks Testing patch by detector and random test (100 times) Nice explanations for experiments Post-time-out deadlock-detection algorithm Provide Feedback

Cons？

Cons Not completely automatic (spinlock.etc)
Eliminated potential concurrency Implemented and tested based on Ctrigger at the same time Testing set is small Using average statistics for results without error bars or confidential interval Lack of explanation for failure rate increase between original and unmerged patches for FFT

Next Step？

Next Steps Study Ctrigger Deal with p-c pair that is not consecutive
figure out the cause of incorrect report identify root cause from side efforts Deal with p-c pair that is not consecutive Extend Afix using other synchronization mechanism Refine merging policy

Automated Atomicity-Violation Fixing

Similar presentations

Presentation on theme: "Automated Atomicity-Violation Fixing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automated Atomicity-Violation Fixing

Similar presentations

Presentation on theme: "Automated Atomicity-Violation Fixing"— Presentation transcript:

Similar presentations

About project

Feedback