Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew

Outline Motivation Approach & Implementation Results Related Work Conclusion

Motivation Programmers develop “low-lock” code for better performance  lock is expensive  data race are deliberately employed  require sequential consistency (SC) model Such code might fail in relaxed consistency (RC) models  E.g. Double Checked Locking (DCL) for lazy initialized singleton

Example 1 (a) ： Lazy initialized singleton Object::Object() { this.field = 100; } Object Object::getInstance() { if (!_instance) _instance = new Object(); return _instance; } Object Object::getInstance() { lock(l); if (!_instance) _instance = new Object(); unlock(l); return _instance; } work only for single thread work for multi-thread, but is expensive... void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField(); }

(b): Double Checked Locking for lazy initialized singleton Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) _instance = new Object(); unlock(l); } return _instance; } If the architecture is SC, then it works correctly, with better performance than (a). But, how about running on RC models that allows write-write reorder?

A possible execution interleave…correct! Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); A1: temp->field = 100; A2: _instance = temp; } unlock(l); } return _instance; } B1: if (!_instance) {…} … B2: read _instance->field; Initializer Thread (T1)Reader Thread (T2) Data races are employed, since these accesses are improperly synchronized

But, how about reorder write-write? Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; A2: _instance = temp; A1: temp->field = 100; } … B1: if (!_instance) {…} … B2: read _instance->field; Initializer Thread (T1)Reader Thread (T2) Get Un-initialized value of instance->field Violate Sequential Consistency

bug pattern: Potential Violation of Sequential Consistency (PVSC), - since these defects might cause SC violation. How to detect and eliminate PVSC bugs? - Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.

our scheme (1) Construct Race Graph (2) Find cycles in it  A cycle in race graph corresponds to a PVSC bug (3) Compute delay set (4) Insert memory ordering fences

Constructing Race Graph For all the instructions that executed in a particular execution of a program P:  Add program order edge for instructions in each thread.  Add race edge for each data race. wr a wr b rd b rd a Thread 1Thread 2 Race edge Program order edge

A: wr a B: wr b C: rd b D: rd a Example 1. Race Graph for DCL … lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; _instance = temp; } unlock(l); } if (!_instance) {…} … read _instance->field;

Find cycles in race graph Theorem 1. A cycle in race graph corresponds to a PVSC bug.  Proof: If a cycle is found in race graph, then it is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non- SC execution E={B->C, D->A} from the cycle A- >B->C->D->A in previous example.

Compute delay set Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir] Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D.  Proof: Omitted

Insert memory ordering fences A fence instruction delays the issue of an instruction until all previous instructions completed. Insert a fence for each delay in D. Then D can be enforced, and, Detected PVSC can be eliminated.

Thread 2Thread 1 Examples for above 3 steps… wr a wr b rd a rd b Fig. 1 ： No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don ’ t need fences.)

Thread 1Thread 2Thread 3 A: a=1 C: b = 1 D: if (b) B: if (a) Fig. 2 ： contains a cycle A->B->C->D->E->A, PVSC. It’s possible to get the execution {A->B, C -> D,E->A} which violates SC and results in {a=1,b=1, R1=0}. If we insert fences between A and B, C and D, then PVSC is eliminated. E: R1=a Initially a = b = 0

Fig. 3: Corrected version of DCL for lazy initialized singleton. Object getInstance() { Object *tmp = _instance; Fence(); if (!tmp) { lock(l); tmp = _instance; if (!tmp) tmp = new Object(); Fence(); _instance = tmp; unlock(l); } return _instance; }

Optimization To handle real-world applications with  Long execution time  Many threads We convert the race graph into PC race graph  Combine nodes with same PC into one node. The graph contains N nodes, where N equals the number of race access instructions.  Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug Can introduce false negatives.

Results Detected PVSC bugs Performance loss after fence insertion Cost of PVSC detection over race detection

Part of detected bugs MySQL 5.0.x sql/slave.c, handle_slave_io() Assertion in slave shutdown. mi->slave_running=0 could be visible to other threads before the cleanup is completed. Thus causes assertion during slave shutdown. httpd 2.2.xmodules/cache/ mod_cache.c, cache_store_content() store_header() might be visible to other threads before store_body(), thus mod_cache might provide old content despite new content has been fetched. httpd 2.2.xprefork/prefork.c, ap_mpm_run() restart_pending = shutdown_pending = 0; might be visible to child threads after set_singal(), thus if httpd receives SIGTERM, it will be ignored while child processes are being spawned.

Performance loss of SPLASH-2 Figure 10: Performance on Intel Itanium SMP

Cost over data race detection Figure 13: Cost of PVSC detection over different race detecting algorithm

Related Work Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue] Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr] Data race detection: Do not concern with the problem of SC violation. [many] Other concurrency bugs : Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.

An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs.  Easy to be ported to the matured data race detection tools.  Retain the performance after PVSC elimination.  Scalable and low-cost. Current limitation  Dynamic data race detection limitations: false positive and false negative.  Can be addressed with the progress in data race detection  Loop

Thanks! Suggestion?

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.

Similar presentations

Presentation on theme: "Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.

Similar presentations

Presentation on theme: "Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew."— Presentation transcript:

Similar presentations

About project

Feedback