RaceMob : Crowd-sourced Data Race Detection

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.

An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.

Mutual Exclusion.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.

An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.

Mayur Naik Alex Aiken John Whaley Stanford University Effective Static Race Detection for Java.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2

Computer System Architectures Computer System Software

15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.

Computer Security and Penetration Testing

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

CY2003 Computer Systems Lecture 04 Interprocess Communication.

Dataflow Analysis for Concurrent Programs using Datarace Detection Ravi Chugh, Jan W. Voung, Ranjit Jhala, Sorin Lerner LBA Reading Group Michelle Goodstein.

CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.

Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.

13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.

Detecting Atomicity Violations via Access Interleaving Invariants

Threads-Process Interaction. CONTENTS  Threads  Process interaction.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.

Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.

Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.

Synchronization Questions answered in this lecture: Why is synchronization necessary? What are race conditions, critical sections, and atomic operations?

Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

Distributed Systems Lecture 6 Global states and snapshots 1.

Real-Time Operating Systems RTOS For Embedded systems.

Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

Java Thread Programming

Process Management Deadlocks.

Optimistic Hybrid Analysis

Optimizing Distributed Actor Systems for Dynamic Interactive Services

More Security and Programming Language Work on SmartPhones

Processes and threads.

Jacob R. Lorch Microsoft Research

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Operating Systems (CS 340 D)

Processes and Threads Processes and their scheduling

Chapter 8 – Software Testing

Lecture 21 Concurrency Introduction

Effective Data-Race Detection for the Kernel

Lazy Diagnosis of In-Production Concurrency Bugs

Capriccio – A Thread Model

High Coverage Detection of Input-Related Security Faults

Chapter 9: Virtual-Memory Management

Reference-Driven Performance Anomaly Identification

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.

Process Description and Control

Threads Chapter 4.

Background and Motivation

Multiprocessor and Real-Time Scheduling

Multithreaded Programming

Concurrency: Mutual Exclusion and Process Synchronization

Process Description and Control

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization

Outline System architecture Current work Experiments Next Steps

Eraser: A dynamic data race detector for multithreaded programs

Pointer analysis John Rollinson & Kaiyuan Li

Presentation transcript:

RaceMob : Crowd-sourced Data Race Detection Baris Kasikci, Cristian Zamfir and George Candea EPFL, Switzerland To appear in the Symposium on Operating Systems Principles (SOSP), November 2013 Presented by : Abhay Rao Bhadriraju

Introduction Data Races – the perfect “corner case” Rarely affect users – occur in low-probability thread inter- leavings, may not have visibly harmful effects But when they do, effects are catastrophic Compiler optimizations can turn benign races into harmful ones Isolating/Fixing takes large amounts of time Data Race Detectors can help locate data races quickly

Introduction Dynamic Data Race Detectors : Instrument binaries to monitor memory accesses and synchronization operations at runtime Pros – low #false positives Cons – false negatives, high overhead (30x for Google ThreadSanitizer), miss races not seen in executions (Large number of executions) Static Data Race Detectors : Establish relationships between accesses to isolate concurrent accesses Happens-before based, Lockset based Pros - Fast and Scalable Cons – Large number of False positives

Background Challenges for data race detectors : Runtime Overhead : Dynamic detectors monitor each read/write access and synchronization operation Sampling based detectors try to reduce this, but introduce false negatives. Eg. PACER Thread escape analysis – overhead still high.eg. Goldilocks False Negatives : Dynamic Detectors can only detect races in executions they witness May infer incorrect happens-before relationships from a particular interleaving False Positives : Static Detectors like RELAY generate large number of false positives (84%)

Background Hybrid detectors 3 main sources of false positives in static detectors program contexts are multithreaded or not handle lock/unlock primitives but not other primitives, eg. barriers, semaphores, or wait/notify constructs. memory access aliasing Methods to cope - unsound filtering - introduces false negatives. Eg. RacerX Hybrid detectors Combine happens-before and lockset algorithms – infer data races Generate false positives due to imprecise lockset analysis Cannot explore consequences of races Eg. TSAN

RaceMob - Overview Combines static analysis (accuracy) and low- overhead dynamic detection (precision) A 2 phase static-dynamic data race detector Low overhead - “on-demand” dynamic data race validation and “Always-on” detection Use a “hive” of computers : to conduct static analysis interact with instrumentation in binaries deployed at production users Crowd-sourcing framework to tap real user executions to detect races

Design - Overview Phase 1 – Static Analysis Phase 2 – Dynamic Validation

Phase 1 – Static Analysis Can use any static data race detector Used RELAY – lockset based Bottom-up problem on the control flow graph Computes function summaries for variable accesses, locks Flagged when >= 2 accesses & at-least 1 write access & no common lock (ie. empty lockset) RELAY is complete – no assembly, no pointer arithmetic allowed (can be accounted for) RaceMob instruments the suspected accesses, all synchronization operations

Phase 2 – Dynamic Validation Instrumented binaries downloaded and run in production by users Instrumentation commanded by hive to perform “on-demand” data race detection Uniformly distributes validation – Sends (1 candidate, 1 desired order of accesses) 1 desired order of accesses – Primary or Alternate 3 stages – Dynamic Context Inference On-demand data race detection Schedule Steering Hive promotes a candidate race through states as it passes each stage

Phase 2 – Dynamic Validation 3 stages – Dynamic Context Inference For racing accesses T1:r1:a1 and T2:r2:a2 r1:a1, r2:a2 ; a1 = a2 ? T1 != T2 ? Motivation : many false positives belong to these categories On-demand data race detection If race is promoted, try to establish a thread-level happens- before relationship Either relationship found, (“NoRace”) or 2nd access occurs (“Race”) Mitigates drawback of sampling-based detection – track synchronization operations only if necessary

Phase 2 – Dynamic Validation Synchronization operations tracked for a happens-before relationship Use a dynamic vector-clock algorithm (maintain clock for each thread and synchronization operations) Need only run algorithm till the first synchronization operation which provides exclusion between r1 and r2 Schedule Steering Tries to enforce order provided by hive using wait() with increasing timeout every time order is violated Can be turned off by user Previously used to detect failures due to known races

Crowd-sourcing Validation Motivation : access to real user executions for input-dependent races Overhead distributed, Per-user overhead minimal Low overhead – small timeout for schedule steering Assignment Policy: Initial : random assignment (1 per user) Distributed to additional users if time bound exceeded, reshuffling if multiple timeouts Uses time spent on race as detection workload metric Other policies : by severity, distribute same to all

Verdict on Data Races True race : Likely False Positive : Unknown : no happens-before in primary or alternate executions Definitive Likely False Positive : Atleast 1 Timeout atleast 1 NoRace, can be tested ad infinitum Unknown : no results received from validation No Timeouts, no NoRace NotSeen : User program terminates without completing task User feedback

Implementation Low contention with application Static analysis : RELAY, etc Instrumentation : LLVM Optimizations : empty-body loops with potential racing entry condition directly reported

Evaluation Effectiveness, Efficiency Comparison to state-of-the-art Scalability as #threads go up Notable application : Apache Server, Memcached, SQLite, Aget Setup : 1754 simulated user sites Evaluating for False Negatives : Phase 1 – manual checking for known sources of false negatives in RELAY Phase 2 – schedule steering timeouts

Evaluation input dependent races – eg. 2 in Aget schedule steering - memcached, pfscan ( 1 each) Unknown : not encountered at runtime, functions never run

Efficiency Phase 1Overheads: Phase 2 Overheads: Offline 3 minutes – 1 hour (Apache, SQLite) Phase 2 Overheads: Frequency of Instrumentation code execution Instrumentation : small – always-on Detection (DCI) : also small – can be on for all executions Removal of instrumentation from empty loop bodies significant

Evaluation Main comparison : RaceMob vs. TSAN (actively maintained, freely available) RELAY : Static, TSAN : Dynamic, RaceMob : Hybrid TSAN : Dynamic Detector Hybrid disabled (no static analysis –> fewer false positives) No schedule steering No crowdsourcing – no access to real executions “given benefit of all executions” #TSAN executions = #RaceMob crowd-sourced executions

Evaluation Concurrency Testing – Schedule Steering RaceFuzzer – hybrid detector with random scheduler RaceMob vs TSAN ( Hybrid mode + RaceFuzzer_random_scheduler) Setup : Detect races for 4 benches for 3 known inputs Existing tools use similar approach – RaceFuzzer,etc. High overhead – not suitable for production

Evaluation - Scalability Application workload varied (eg. Concurrent File requests to Apache server) Runtime overhead remains low as #threads increases Overhead peaks when #threads = #cores

Discussion May make some lives harder (production failures) – may be mitigated with rewards for users RaceMob augments in-house testing by providing real user executions Search space for race detector reduced to code which is used in real executions Per-user vs aggregate overhead trade-off Data race bugs may have exorbitant cost if left undetected/unpatched Users may lose data due to detection – additional per-user cost

Limitations RaceMob has to be aware of all synchronization constructs Malicious Users and false reports (mitigated by cross-verification) Privacy – sending execution information to the hive Enabling application to be remotely controlled by developer

Related Work Static & Dynamic Data Race Detection Hybrid Data Race Detection & thread escape analysis Sampling Based Race Detection Schedule Steering in to verify failures due to known races Cooperative Bug Isolation – collects information about runs to determine causes for failures Eg. Windows Error Reporting, CCI

Opinions Skeptical about production user’s willingness to allow failures (though they can turn off schedule steering) – be “guinea pigs” Combines existing techniques effectively Extensive experiments and performance comparisions Detects real data races in production code – valuable information Detection performance better than state-of- the-art, emphasis on always-on detection

Questions ?

Thank You