Multicore Acceleration of Priority-Based Schedulers for Concurrency Bug Detection Santosh Nagarakatte, Sebastian Burckhardt, Milo Martin, Madan Musuvathi.

Slides:



Advertisements
Similar presentations
Lazy Asynchronous I/O For Event-Driven Servers Khaled Elmeleegy, Anupam Chanda and Alan L. Cox Department of Computer Science Rice University, Houston,
Advertisements

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Using Matrices in Real Life
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
1 Term 2, 2004, Lecture 6, TransactionsMarian Ursu, Department of Computing, Goldsmiths College Transactions 3.
1 Processes and Threads Creation and Termination States Usage Implementations.
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
Solve Multi-step Equations
HardBound: Architectural Support for Spatial Safety of the C Programming Language Joe Devietti *, Colin Blundell, Milo Martin, Steve Zdancewic * University.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 10 Arrays and Tile Mapping Starting Out with Games & Graphics.
Configuration management
Randomized Algorithms Randomized Algorithms CS648 1.
Outline Introduction Assumptions and notations
Multicore Programming Skip list Tutorial 10 CS Spring 2010.
@ Carnegie Mellon Databases Data-oriented Transaction Execution VLDB 2010 Ippokratis Pandis Ryan Johnson Nikos Hardavellas Anastasia Ailamaki Carnegie.
Page Replacement Algorithms
Online Algorithm Huaping Wang Apr.21
Chapter 10: Virtual Memory
Software Testing and Quality Assurance
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Lecture plan Transaction processing Concurrency control
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Lecture plan Outline of DB design process Entity-relationship model
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
© 2004, D. J. Foreman 1 Scheduling & Dispatching.
25 seconds left…...
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 15 2 k Factorial Experiments and Fractions.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
The University of Adelaide, School of Computer Science
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
OORPT Object-Oriented Reengineering Patterns and Techniques X. CHAPTER Prof. O. Nierstrasz.
CP — Concurrent Programming X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
12. eToys. © O. Nierstrasz PS — eToys 12.2 Denotational Semantics Overview:  … References:  …
Computational Sprinting on a Hardware/Software Testbed Arun Raghavan *, Laurel Emurian *, Lei Shao #, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Presentation transcript:

Multicore Acceleration of Priority-Based Schedulers for Concurrency Bug Detection Santosh Nagarakatte, Sebastian Burckhardt, Milo Martin, Madan Musuvathi University of Pennsylvania Microsoft Research

2NeedlePoint – Santosh Nagarakatte – PLDI 2012 This work licensed under the Creative Commons Attribution-Share Alike 3.0 United States License You are free: to Share to copy, distribute, display, and perform the work to Remix to make derivative works Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to: Any of the above conditions can be waived if you get permission from the copyright holder. Apart from the remix rights granted under this license, nothing in this license impairs or restricts the author's moral rights.

3 Overview Context: Testing multithreaded programs Many policies proposed to explore interleavings NeedlePoint: Interleaving Exploration Tool Test and evaluate real world code with policies Separate mechanisms from policy, infers blocking Probabilistic Concurrency Testing (PCT) works well Detected bugs at least once in 1000 executions Unfortunately, it runs one thread at a time Serious bottleneck with real code Parallel Probabilistic Concurrency Testing (PPCT) Runs multiple threads Effective as PCT in theory and practice NeedlePoint – Santosh Nagarakatte – PLDI 2012

4 Testing Multithreaded Programs Two aspects of the testing problem Detection of bugs Use a test harness Use pattern detectors like data race detectors Driving & exploring the interleaving Explore all scheduling choices at each point NeedlePoint – Santosh Nagarakatte – PLDI 2012 Focus of this talk

5NeedlePoint – Santosh Nagarakatte – PLDI 2012 Space of Interleavings (p = 0, bal = 10, q = 0) (p = 5, bal = 1000, q = 7) T1: w(y) q = 7 t = bal; bal = t- y T2:d(x) bal += x p = 5

6NeedlePoint – Santosh Nagarakatte – PLDI 2012 Space of Interleavings (p = 0, bal = 10, q = 0) (p = 5, bal = 1000, q = 7) T1: w(y) q = 7 t = bal; bal = t- y T2:d(x) bal += x p = 5

7NeedlePoint – Santosh Nagarakatte – PLDI 2012 Space of Interleavings (p = 0, bal = 10, q = 0) (p = 5, bal = 1000, q = 7) T1: w(y) q = 7 t = bal; bal = t- y T2:d(x) bal += x p = 5

8NeedlePoint – Santosh Nagarakatte – PLDI 2012 Space of Interleavings (p = 0, bal = 10, q = 0) Numerous interleavings in the same program (p = 5, bal = 1000, q = 7) T1: w(y) q = 7 t = bal; bal = t- y T2:d(x) bal += x p = 5

9 Interleaving Exploration Policies Best-effort exploration Random sleep: actively introduce delays AtomFuzzer: delays to trigger atomicity violations PreemptionAlways: a preemption at every point Exploration with coverage guarantees Preemption bounding: introduce bounded number of preemptions PCT: schedule threads according to priorities NeedlePoint – Santosh Nagarakatte – PLDI 2012

10NeedlePoint – Santosh Nagarakatte – PLDI 2012 Test and understand the interleaving exploration schemes with real world code

11 NEEDLEPOINT NeedlePoint – Santosh Nagarakatte – PLDI 2012

12 NeedlePoint Framework Unified framework for controlled scheduling Separation of concerns between mechanisms & policy Three mechanisms for controlled scheduling 1.Introduction of callbacks to the scheduler Real programs use a wide range of synchronization 2.Support for identifying blocked threads To provide coverage, need to explore the interleaving when the synchronization blocks 3.Support for starvation freedom Spin loops & busy wait synchronization livelocks NeedlePoint – Santosh Nagarakatte – PLDI 2012 Modeling each synchronization is error-prone & real programs define ad-hoc synchronization

13 Calls to the NeedlePoint Framework at Schedule Points NeedlePoint – Santosh Nagarakatte – PLDI 2012 Lock (l); bal += x; Unlock(l); Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); t = bal; Unlock(l); while(true){ schedulingPolicy(); if(state[tid] == Running) break; else yield(); } while(true){ schedulingPolicy(); if(state[tid] == Running) break; else yield(); } Synchronization library calls, atomic operations, shared memory accesses TIDState 1Running 2Ready TIDState 1Running

14 TIDState 1Running 2Ready NeedlePoint – Santosh Nagarakatte – PLDI 2012 Inferring Blocked Threads … Lock (l); bal += x; Unlock(l); … Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); …. Lock (l); t = bal; Unlock(l); …. while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[tid]++; if(timeout[id] > T) state[rid] = BLOCKED } reset_timeout_counters() while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[tid]++; if(timeout[id] > T) state[rid] = BLOCKED } reset_timeout_counters() while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[id] ++; if(timeout[id] > T) state[rtid] = BLOCKED } reset_timeout_counters(); while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[id] ++; if(timeout[id] > T) state[rtid] = BLOCKED } reset_timeout_counters(); Infers whether the thread is blocked How to identify which threads block?

15 TIDState 1Running 2Ready TIDState 1Ready 2Running NeedlePoint – Santosh Nagarakatte – PLDI 2012 Inferring Blocked Threads … Lock (l); bal += x; Unlock(l); … Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); …. Lock (l); t = bal; Unlock(l); …. 0 0 Timeout Clock while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[tid]++; if(timeout[id] > T) state[rid] = BLOCKED } reset_timeout_counters(); while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[tid]++; if(timeout[id] > T) state[rid] = BLOCKED } reset_timeout_counters(); while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[id] ++; if(timeout[id] > T) state[rtid] = BLOCKED } reset_timeout_counters(); while(true){ schedulingPolicy(); if(state[tid] == Running) break; else{ yield(); timeout[id] ++; if(timeout[id] > T) state[rtid] = BLOCKED } reset_timeout_counters(); TIDState 1Running 2Blocked Allows the lock to execute, acquire operation blocks

16 Evaluation of Techniques NeedlePoint – Santosh Nagarakatte – PLDI 2012 Executions with bug triggered PCT detects the bug at least once in 1000 executions

17 Challenges with PCT PCT was effective in detecting all bugs However, one thread at a time is a bottleneck Cant exploit multi-core Interacts poorly with NeedlePoints blocking inference Result: Unable to test highly parallel applications More than 64X slowdown with Chrome browser NeedlePoint – Santosh Nagarakatte – PLDI 2012 Can we run multiple threads with PCT and maintain its effectiveness in theory & practice?

18 PCT BACKGROUND NeedlePoint – Santosh Nagarakatte – PLDI 2012

19 Probabilistic Concurrency Testing (PCT) [ASPLOS 2010] Key Idea: Make a few choices to assign priorities Schedule the highest priority thread that is not blocked Bug depth Number of ordering constraints an interleaving need to exhibit to trigger the bug More constraints means more things have to go just right to trigger the bug. NeedlePoint – Santosh Nagarakatte – PLDI 2012 PCT explores the program to trigger bugs of particular depth

20 Bug Depth Illustration NeedlePoint – Santosh Nagarakatte – PLDI 2012 … p = &a; … p->f ++; … Bug of depth 1: Ordering Violation Bug of depth 2: Atomicity Violation Lock(I) t = bal Unlock(I) Lock(I) bal += x; Unlock(I) Lock(I) bal = t – y Unlock(I);

21 PCT Algorithm for Bug Depth d At the beginning of the execution Assign unique priorities to the threads Reserve d lower priorities Choose d-1 priority change points in the execution Schedule the threads according to priorities Run the highest priority non-blocked thread At a change point, lower the priorities to the reserved priorities NeedlePoint – Santosh Nagarakatte – PLDI 2012

22 PCT for a Bug of Depth 2 NeedlePoint – Santosh Nagarakatte – PLDI 2012 (p = 0, bal = 10, q = 0) (p = 5, bal = 1000, q = 7) Pri = 3 Pri = 2 Change Point: Step#2 Change Point: Priority of deposit thread changed to 1 1 T1: w(y) q = 7 t = bal; bal = t- y T2: d (x) bal += x p = 5

23NeedlePoint – Santosh Nagarakatte – PLDI 2012 The PCT Probabilistic Bounds Given a program with n threads (~tens) k steps (~millions) a bug of depth d (1,2) Each run PCT triggers a bug with a probability (p) of at least (this is a worst-case guarantee)

24 PARALLEL PCT NeedlePoint – Santosh Nagarakatte – PLDI 2012

25 Parallel PCT (PPCT) Key Idea: Serialize only d threads The d lower priority threads are important to trigger the bug Maintain two sets: higher priority set (HS) & lower priority set (LS) Insert priority d into LS and others into HS PPCT operates in 2 phases PCT mode: run the highest priority non-blocked thread Parallel mode: run any unblocked thread with priority > d NeedlePoint – Santosh Nagarakatte – PLDI 2012 Key difference between PCT and PPCT is the number of choices it provides the adversary PCT mode Parallel mode All threads with priority > d blocked Yes No PPCT provides same bounds as PCT

26 PARALLEL PCT EVALUATION NeedlePoint – Santosh Nagarakatte – PLDI 2012

27 Bug Detection with PPCT using NeedlePoint NeedlePoint – Santosh Nagarakatte – PLDI 2012 PPCT detects bug as effectively as PCT in practice Executions with bug triggered

28 Slowdown with PCT & PPCT NeedlePoint – Santosh Nagarakatte – PLDI 2012 PPCT reduces the overhead from 34X to 6X on a 8-core machine Slowdown over native execution (times) PPCT reduces the overhead even on a single core Avoids poor interaction with blocking inference

29 Conclusions NeedlePoint framework Separation of mechanisms and exploration policy Inferring blocked threads Proposed Parallel Probabilistic Concurrency Testing For bugs of depth d, runs all but d threads Provides same coverage guarantees as PCT Significant speedups both on single & multiple cores NeedlePoint – Santosh Nagarakatte – PLDI 2012

30NeedlePoint – Santosh Nagarakatte – PLDI Subscribe to