Download presentation
1
RaceMob : Crowd-sourced Data Race Detection
Baris Kasikci, Cristian Zamfir and George Candea EPFL, Switzerland To appear in the Symposium on Operating Systems Principles (SOSP), November 2013 Presented by : Abhay Rao Bhadriraju
2
Introduction Data Races – the perfect “corner case”
Rarely affect users – occur in low-probability thread inter- leavings, may not have visibly harmful effects But when they do, effects are catastrophic Compiler optimizations can turn benign races into harmful ones Isolating/Fixing takes large amounts of time Data Race Detectors can help locate data races quickly
3
Introduction Dynamic Data Race Detectors :
Instrument binaries to monitor memory accesses and synchronization operations at runtime Pros – low #false positives Cons – false negatives, high overhead (30x for Google ThreadSanitizer), miss races not seen in executions (Large number of executions) Static Data Race Detectors : Establish relationships between accesses to isolate concurrent accesses Happens-before based, Lockset based Pros - Fast and Scalable Cons – Large number of False positives
4
Background Challenges for data race detectors :
Runtime Overhead : Dynamic detectors monitor each read/write access and synchronization operation Sampling based detectors try to reduce this, but introduce false negatives. Eg. PACER Thread escape analysis – overhead still high.eg. Goldilocks False Negatives : Dynamic Detectors can only detect races in executions they witness May infer incorrect happens-before relationships from a particular interleaving False Positives : Static Detectors like RELAY generate large number of false positives (84%)
5
Background Hybrid detectors
3 main sources of false positives in static detectors program contexts are multithreaded or not handle lock/unlock primitives but not other primitives, eg. barriers, semaphores, or wait/notify constructs. memory access aliasing Methods to cope - unsound filtering - introduces false negatives. Eg. RacerX Hybrid detectors Combine happens-before and lockset algorithms – infer data races Generate false positives due to imprecise lockset analysis Cannot explore consequences of races Eg. TSAN
6
RaceMob - Overview Combines static analysis (accuracy) and low- overhead dynamic detection (precision) A 2 phase static-dynamic data race detector Low overhead - “on-demand” dynamic data race validation and “Always-on” detection Use a “hive” of computers : to conduct static analysis interact with instrumentation in binaries deployed at production users Crowd-sourcing framework to tap real user executions to detect races
7
Design - Overview Phase 1 – Static Analysis
Phase 2 – Dynamic Validation
8
Phase 1 – Static Analysis
Can use any static data race detector Used RELAY – lockset based Bottom-up problem on the control flow graph Computes function summaries for variable accesses, locks Flagged when >= 2 accesses & at-least 1 write access & no common lock (ie. empty lockset) RELAY is complete – no assembly, no pointer arithmetic allowed (can be accounted for) RaceMob instruments the suspected accesses, all synchronization operations
9
Phase 2 – Dynamic Validation
Instrumented binaries downloaded and run in production by users Instrumentation commanded by hive to perform “on-demand” data race detection Uniformly distributes validation – Sends (1 candidate, 1 desired order of accesses) 1 desired order of accesses – Primary or Alternate 3 stages – Dynamic Context Inference On-demand data race detection Schedule Steering Hive promotes a candidate race through states as it passes each stage
10
Phase 2 – Dynamic Validation
3 stages – Dynamic Context Inference For racing accesses T1:r1:a1 and T2:r2:a2 r1:a1, r2:a2 ; a1 = a2 ? T1 != T2 ? Motivation : many false positives belong to these categories On-demand data race detection If race is promoted, try to establish a thread-level happens- before relationship Either relationship found, (“NoRace”) or 2nd access occurs (“Race”) Mitigates drawback of sampling-based detection – track synchronization operations only if necessary
11
Phase 2 – Dynamic Validation
Synchronization operations tracked for a happens-before relationship Use a dynamic vector-clock algorithm (maintain clock for each thread and synchronization operations) Need only run algorithm till the first synchronization operation which provides exclusion between r1 and r2 Schedule Steering Tries to enforce order provided by hive using wait() with increasing timeout every time order is violated Can be turned off by user Previously used to detect failures due to known races
12
Crowd-sourcing Validation
Motivation : access to real user executions for input-dependent races Overhead distributed, Per-user overhead minimal Low overhead – small timeout for schedule steering Assignment Policy: Initial : random assignment (1 per user) Distributed to additional users if time bound exceeded, reshuffling if multiple timeouts Uses time spent on race as detection workload metric Other policies : by severity, distribute same to all
13
Verdict on Data Races True race : Likely False Positive : Unknown :
no happens-before in primary or alternate executions Definitive Likely False Positive : Atleast 1 Timeout atleast 1 NoRace, can be tested ad infinitum Unknown : no results received from validation No Timeouts, no NoRace NotSeen : User program terminates without completing task User feedback
14
Implementation Low contention with application
Static analysis : RELAY, etc Instrumentation : LLVM Optimizations : empty-body loops with potential racing entry condition directly reported
15
Evaluation Effectiveness, Efficiency Comparison to state-of-the-art
Scalability as #threads go up Notable application : Apache Server, Memcached, SQLite, Aget Setup : 1754 simulated user sites Evaluating for False Negatives : Phase 1 – manual checking for known sources of false negatives in RELAY Phase 2 – schedule steering timeouts
16
Evaluation input dependent races – eg. 2 in Aget
schedule steering - memcached, pfscan ( 1 each) Unknown : not encountered at runtime, functions never run
17
Efficiency Phase 1Overheads: Phase 2 Overheads: Offline
3 minutes – 1 hour (Apache, SQLite) Phase 2 Overheads: Frequency of Instrumentation code execution Instrumentation : small – always-on Detection (DCI) : also small – can be on for all executions Removal of instrumentation from empty loop bodies significant
18
Evaluation Main comparison : RaceMob vs. TSAN (actively maintained, freely available) RELAY : Static, TSAN : Dynamic, RaceMob : Hybrid TSAN : Dynamic Detector Hybrid disabled (no static analysis –> fewer false positives) No schedule steering No crowdsourcing – no access to real executions “given benefit of all executions” #TSAN executions = #RaceMob crowd-sourced executions
19
Evaluation Concurrency Testing – Schedule Steering
RaceFuzzer – hybrid detector with random scheduler RaceMob vs TSAN ( Hybrid mode + RaceFuzzer_random_scheduler) Setup : Detect races for 4 benches for 3 known inputs Existing tools use similar approach – RaceFuzzer,etc. High overhead – not suitable for production
20
Evaluation - Scalability
Application workload varied (eg. Concurrent File requests to Apache server) Runtime overhead remains low as #threads increases Overhead peaks when #threads = #cores
21
Discussion May make some lives harder (production failures) – may be mitigated with rewards for users RaceMob augments in-house testing by providing real user executions Search space for race detector reduced to code which is used in real executions Per-user vs aggregate overhead trade-off Data race bugs may have exorbitant cost if left undetected/unpatched Users may lose data due to detection – additional per-user cost
22
Limitations RaceMob has to be aware of all synchronization constructs
Malicious Users and false reports (mitigated by cross-verification) Privacy – sending execution information to the hive Enabling application to be remotely controlled by developer
23
Related Work Static & Dynamic Data Race Detection
Hybrid Data Race Detection & thread escape analysis Sampling Based Race Detection Schedule Steering in to verify failures due to known races Cooperative Bug Isolation – collects information about runs to determine causes for failures Eg. Windows Error Reporting, CCI
24
Opinions Skeptical about production user’s willingness to allow failures (though they can turn off schedule steering) – be “guinea pigs” Combines existing techniques effectively Extensive experiments and performance comparisions Detects real data races in production code – valuable information Detection performance better than state-of- the-art, emphasis on always-on detection
25
Questions ?
26
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.