Download presentation
Presentation is loading. Please wait.
Published byRoland Wood Modified over 9 years ago
1
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan
2
Data Races Benjamin Wester - ASPLOS '132 TIME t1t1 t2t2 lock( l ) x = 1 unlock( l ) x = 3 lock( l ) x = 2 unlock( l )
3
Race Detection Benjamin Wester - ASPLOS '13 Normal run time TIME With race detector 3 Instrumentation + Analysis Detection is slow! 8x–200x Managed vs. binary Granularity Completeness Debug info gathered Matches app concurrency
4
Race Detection Benjamin Wester - ASPLOS '13 Normal run time TIME With race detector 4 Parallel race detector Detection is slow! 8x–200x Managed vs. binary Granularity Completeness Debug info gathered Matches app concurrency
5
How to use parallel hardware? Scale program? – Performance tied to app Parallel algorithm? – Lots of fine-grained dependencies Uniparallelism – Converts execution into parallel pipeline – Works for instrumentation Benjamin Wester - ASPLOS '135 [Wallace CGO’07, Nightingale ASPLOS’08]
6
Uniparallelism Benjamin Wester - ASPLOS '13 TIME 6
7
Uniparallelism Benjamin Wester - ASPLOS '13 TIME 7 Epoch-parallel Execution Epoch-sequential Execution
8
Uniparallelism Scales well: Epochs are independent execution units More efficient: Synchronization Benjamin Wester - ASPLOS '138 TIME
9
Add Detector… Scales well: Epochs are independent execution units More efficient: Synchronization & Lock elision Race detection depends on state! Benjamin Wester - ASPLOS '139 TIME Here!
10
Architecture Benjamin Wester - ASPLOS '1310 Epoch-sequential Epoch-parallel Commit
11
Moving Work Identify meaningful fast subset of analysis state – Low frequency, lightweight, self-contained – Run in Epoch-Sequential phase – Predicted and usable in parallel phase Symbolic execution – Replace missing state with symbols – Defer some computation to commit phase – Commit: replace symbols with concrete values Benjamin Wester - ASPLOS '1311
12
Architecture Benjamin Wester - ASPLOS '1312 Epoch-sequential Epoch-parallel Commit Instrument and predict fast subset of analysis state Perform deferred work on concrete values Full instrumentation Symbolic execution
13
Fast Slow State: Vector clocks – e.g. ⟨0, 1, 0⟩ – for each: – Thread – Lock – Variable x 2: last read(s), last write Example computation: Check( read_x ≤ thread_i ); write_x = thread_i Transitive reduction Use knowledge of happens-before relationship Parallel FastTrack Benjamin Wester - ASPLOS '1313
14
Fast Slow Parallel Eraser State: – Set of locks held by thread (lockset) – Lockset for each variable – Position in state machine for each variable Example computation: If state_x == SHARED then ls_x = ls_x ∩ {L, M} Lockset factorization Factor out common behavior Benjamin Wester - ASPLOS '1314
15
Parallel Performance Efficiency Scalability 2 worker threads as baseline 8-CPU Platform Benchmarks: SPLASH-2 (water, lu, ocean, fft, radix) Parallel app: pbzip2 Benjamin Wester - ASPLOS '1315
16
Efficiency Benjamin Wester - ASPLOS '1316 Exactly 2 cores
17
Efficiency Benjamin Wester - ASPLOS '1317 Median 13% faster Median 8% faster Exactly 2 cores
18
Scaling Parallel FastTrack Benjamin Wester - ASPLOS '1318 2 Worker Threads
19
Scaling Parallel FastTrack Benjamin Wester - ASPLOS '1319 Median 4.4x with 8 cores 2 Worker Threads
20
Scaling Parallel Eraser Benjamin Wester - ASPLOS '1320 Median 3.3x with 8 cores 2 Worker Threads
21
Conclusion 3-Phase Uniparallel architecture – Parallel FastTrack – Parallel Eraser Parallel algorithms: – Are more efficient – Scale better Benjamin Wester - ASPLOS '1321
22
Questions? Benjamin Wester - ASPLOS '1322
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.