Lazy Diagnosis of In-Production Concurrency Bugs

Slides:



Advertisements
Similar presentations
Comparing and Optimising Parallel Haskell Implementations on Multicore Jost Berthold Simon Marlow Abyd Al Zain Kevin Hammond.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Automated Diagnosis of Software Configuration Errors
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
(1) Kernel Execution ©Sudhakar Yalamanchili and Jin Wang unless otherwise noted.
Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.
Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Soft Timers : Efficient Microsecond Software Timer Support for Network Processing - Mohit Aron & Peter Druschel CS533 Winter 2007.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
SketchVisor: Robust Network Measurement for Software Packet Processing
RaceMob : Crowd-sourced Data Race Detection
Processes and threads.
EMERALDS Landon Cox March 22, 2017.
Jacob R. Lorch Microsoft Research
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
FlowRadar: A Better NetFlow For Data Centers
Chapter 8 – Software Testing
Effective Data-Race Detection for the Kernel
Reactive Synchronization Algorithms for Multiprocessors
runtime verification Brief Overview Grigore Rosu
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Hyperthreading Technology
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
EE 193: Parallel Computing
Continuous Performance Engineering
Capriccio – A Thread Model
Instruction Level Parallelism and Superscalar Processors
Computer Architecture: Multithreading (I)
CPSC 531: System Modeling and Simulation
Department of Computer Science University of California, Santa Barbara
Reference-Driven Performance Anomaly Identification
Smita Vijayakumar Qian Zhu Gagan Agrawal
Yiannis Nikolakopoulos
Multiprocessor and Real-Time Scheduling
Declarative Transfer Learning from Deep CNNs at Scale
Hardware Counter Driven On-the-Fly Request Signatures
Getting to the root of concurrent binary search tree performance
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CS703 – Advanced Operating Systems
©Sudhakar Yalamanchili and Jin Wang unless otherwise noted
Request Behavior Variations
Andy Wang Operating Systems COP 4610 / CGF 5765
Department of Computer Science University of California, Santa Barbara
Andy Wang Operating Systems COP 4610 / CGF 5765
Sarah Diesburg Operating Systems COP 4610
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Lazy Diagnosis of In-Production Concurrency Bugs Baris Kasikci, Weidong Cui, Xinyang Ge, Ben Niu

Why Does In-Production Bug Diagnosis Matter? Potential to fix bugs that impact users Short release cycles make in-house testing challenging Release cycles can be as frequent as a few times a day1 [1] https://code.facebook.com/posts/270314900139291/rapid-release-at-massive-scale

Concurrency Bug Diagnosis W Atomicity Violation Thread 1 Thread 2 Time Thread 1 Thread 2 if (*x) { y = *x; } free(x); x = NULL; Concurrency bug diagnosis requires knowing the order of key events (e.g., memory accesses)

Challenges of Concurrency Bug Diagnosis Diagnosis requires reproducing bugs [PBI, ASPLOS’13] [Gist, SOSP’15] Practitioners report that they can fix reproducible bugs [PLATEAU’14] It may not be possible to reproduce in-production concurrency bugs Inputs for reproducing bugs may not be available Exposing bugs in production may incur high overhead [RaceMob, SOSP’13]

In theory, ΔT can be on the order of a nanosecond Record/Replay Tracing fine-grained interleavings incurs high overhead State-of-the-art record/replay has 28% overhead [DoublePlay, ASPLOS’11] R W Atomicity Violation Time Thread 1 Thread 2 ΔT1 ΔT2 In theory, ΔT can be on the order of a nanosecond

Coarse Interleaving Hypothesis Study with 54 bugs in 13 systems Smallest ΔT is 91 microseconds R W Atomicity Violation Time Thread 1 Thread 2 ΔT1 ΔT2 91 us ~1ns ~ 10^5 A lightweight, coarse-grained time tracking mechanism can help infer ordering

Lazy Diagnosis Snorlax Leverages the coarse interleaving hypothesis Hybrid dynamic/static root cause diagnosis technique Snorlax Lazy Diagnosis Prototype Fully Accurate Concurrency Bug Diagnosis (11 bugs in 7 systems) Low overhead (always below < 2%)

Outline Usage model Design Evaluation

Current Bug Diagnosis Model Root cause diagnosis

Lazy Diagnosis Usage Model Root cause + Control- flow trace & Timing Info Root cause diagnosis Control flow trace speeds up static analysis Coarse-grained timing information helps determine ordering

Outline Usage model Design Evaluation

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Hybrid Points-to Analysis FAILURE (CRASH) I1 store i32* %21, %bufSize store %Queue* %1, %q IF I2 load %Queue*, %fifo Finds instructions with operands pointing to the same location as the failing instruction’s operand

Hybrid Points-To Analysis Uses the control flow traces to limit the scope of static analysis Runs fast, scales to large programs (e.g., httpd, MySQL) Lazy Control flow traces trigger the analysis Interprocedural Bug patterns may span multiple functions Flow-insensitive Discards execution order of instructions for scalability

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Type-Based Ranking load %Queue*, %fifo FAILURE (CRASH) 1 2 store i32* %21, %bufSize store i32* %21, %bufSize Type-based Ranking store %Queue* %1, %q store %Queue* %1, %q Highly ranks instructions operating on types that match the failing instruction's operand type

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Bug Pattern Computation Thread 1 Thread 2 Bug Pattern I FAILURE load %Queue*, %fifo load %Queue*, %fifo load %Queue*, %fifo Bug Pattern Computation Bug Pattern Computation store %Queue* %1, %q store %Queue* %1, %q store i32* %21, %bufSize store i32* %21, %bufSize Thread 1 Thread 2 Bug Pattern II

Bug Pattern Computation Our implementation uses timing packets in Intel Processor Trace Granularity of a few 10s of microseconds We measured the smallest ΔT between key events as 91 microseconds Leverages the coarse interleaving hypothesis to establish instruction orders

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Lazy Diagnosis Hybrid Bug Pattern Statistical Type-based Points-to Analysis Type-based Ranking Bug Pattern Computation Statistical Diagnosis

Statistical identification of failure predicting patterns store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 FAILURE (CRASH) store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 SUCCESS store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 SUCCESS store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 SUCCESS store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 SUCCESS store %Queue* %1, %q load %Queue*, %fifo Thread 1 Thread 2 FAILURE (CRASH) Statistical identification of failure predicting patterns

Outline Usage model Design Evaluation

Evaluation of Snorlax Is Snorlax effective? Is Snorlax accurate? Is Snorlax efficient? How does Snorlax compare to its competition?

Experimental Setup Real-world C/C++ programs 11 concurrency bugs Workloads from program’s test cases and test cases by other researchers

Snorlax’s Effectiveness Snorlax correctly identified the root causes of 11 bugs Determined after manual investigation of developer fixes A single failure recurrence is enough for root cause diagnosis In practice, for concurrency bugs, “event orders” = “root cause” Snorlax can effectively diagnose concurrency bugs

All stages of Lazy Diagnosis are necessary for full accuracy Snorlax’s Accuracy Contribution Accuracy All stages of Lazy Diagnosis are necessary for full accuracy

Snorlax has low runtime performance overhead (always below 2%) Snorlax’s Efficiency Percentage Overhead 0.97% Snorlax has low runtime performance overhead (always below 2%)

Snorlax vs. Gist 39% Percentage Overhead 3% 0.9% 1.9% Snorlax scales better than Gist with the increasing number of application threads

Lazy Diagnosis Snorlax Leverages the coarse interleaving hypothesis Hybrid dynamic/static root cause diagnosis technique Snorlax Lazy Diagnosis Prototype Fully Accurate Concurrency Bug Diagnosis (11 bugs in 7 systems) Low overhead (always below < 2%) Scales well with the increasing number of threads Michigan is hiring!