Download presentation
Presentation is loading. Please wait.
Published byCleopatra McKinney Modified over 8 years ago
1
Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015
2
Akshat Alexander Björn Pedro RodrigoUmut EkinRafael Flavio
3
Common workflow: Rerun the same application with evolving input Scientific computing Reactive Systems Data analytics Small input change small work to update the output 3 Goal: Efficient execution of apps in successive runs
4
Existing approaches (-) Inefficient (+) Easy to design Re-compute from scratch Design application- specific algorithms Incremental computation (+) Efficient (-) Hard to design 4 Automatic + Efficient
5
To enable transparent, practical, and efficient incremental computation in real-world parallel and distributed systems My thesis research 5
6
My projects @ MPI-SWS Incoop Incremental batch processing [HotCloud’11, SoCC’11] Incoop Incremental batch processing [HotCloud’11, SoCC’11] 6 Shredder Incremental storage [FAST’12] Shredder Incremental storage [FAST’12] Slider Incremental stream processing [Middleware’14, Hadoop Summit’15] Slider Incremental stream processing [Middleware’14, Hadoop Summit’15] Conductor* Incremental job deployment [NSDI’12, LADIS’10, PODC’10] Conductor* Incremental job deployment [NSDI’12, LADIS’10, PODC’10] iThreads Incremental multithreading [ASPLOS’15] iThreads Incremental multithreading [ASPLOS’15] * 2 nd author
7
Incremental Multithreading iThreads@mpi-sws.org Joint work with: Pedro Fonseca & Björn Brandenburg (MPI-SWS) Rodrigo Rodrigues (Nova University of Lisbon) Umut Acar (CMU)
8
Well-studied topic in the PL community Compiler and language-based approaches Target sequential programs! Auto “Incrementalization” 8
9
Proposals for parallel incremental computation Hammer et al. [DAMP’07] Burckhardt et al. [OOPSLA’11] Limitations Require a new language with special data-types Restrict programming model to strict fork-join Existing multi-threaded programs are not supported! Parallelism 9
10
1.Transparency Target unmodified pthreads based programs 2.Practicality Support the full range of synchronization primitives 3.Efficiency Design parallel infrastructure for incremental computation Design goals 10
11
iThreads $ LD_PRELOAD=“iThreads.so” $./myProgram (initial run) $ echo “ ” >> changes.txt $./myProgram (incremental run) 11 Speedups w.r.t. pthreads up to 8X
12
Motivation Design Evaluation Outline 12
13
Behind the scenes Computation Sub-computations step#1 Divide step#2 Build step#3 Perform Dependence graph Change propagation 13 Initial run Incremental run
14
A simple example lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); shared variables: x, y, and z 14
15
Step # 1 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 15 How do we divide a computation into sub- computations? How do we divide a computation into sub- computations?
16
Sub-computations 16 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Entire thread? Entire thread? “Coarse-grained” Small change implies recomputing the entire thread “Coarse-grained” Small change implies recomputing the entire thread Single Instruction? Single Instruction? “Fine-grained” Requires tracking individual load/store instructions “Fine-grained” Requires tracking individual load/store instructions
17
Sub-computation granularity 17 Single instruction Single instruction Entire thread Entire thread Coarse-grainedExpensive A sub-computation is a sequence of instructions between pthreads synchronization points Release Consistency (RC) memory model to define the granularity of a sub-computation
18
Sub-computations 18 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Sub-computation for thread-1 Sub-computation for thread-1 Sub- computations for thread-2
19
Step # 2 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 19 How do we build the dependence graph? How do we build the dependence graph?
20
Example: Changed schedule 20 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z different schedule Read={y} Write={y,z} (1): Record happens-before dependencies
21
Example: Same schedule 21 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={y} Write={y,z} Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z Same schedule
22
Example: Changed input 22 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={y} Write={y,z} Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z different input y’ Read={y} Write={y,z} Read={x, z } Write={y} (2): Record data dependencies
23
Vertices: sub-computations Edges: Happens-before order b/w sub-computations Intra-thread: Totally ordered based on execution Inter-thread: Partially ordered based on sync primitives Data dependency b/w sub-computations If they can be ordered based on happens-before, & Antecedent writes the data that is read by precedent Dependence graph 23
24
Step # 3 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 24 How do we perform change propagation? How do we perform change propagation?
25
Change propagation Dirty set {Changed input} For each sub-computation in a thread Check validity in the recorded happens-before order If (Read set Dirty set) – Re-compute the sub-computation – Add write set to the dirty set Else – Skip execution of the sub-computation – Apply the memoized effects of the sub-computation 25
26
Motivation Design Evaluation Outline 26
27
Evaluating iThreads 1.Speedups for the incremental run 2.Overheads for the initial run Implementation (for Linux) 32-bit dynamically linkable shared library Platform Evaluated on Intel Xeon 12 cores running Linux Evaluation 27
28
1. Speedup for incremental run 28 Better Worst Speedups vs. pthreads up to 8X
29
Memoization overhead 29 Space overheads w.r.t. input size Write a lot of intermediate state
30
2. Overheads for the initial run 30 Runtime overheads vs. pthreads Worst Better
31
A case for parallel incremental computation iThreads: Incremental multithreading Transparent: Targets unmodified programs Practical: Supports the full range of sync primitives Efficient: Employs parallelism for change propagation Usage: A dynamically linkable shared library Summary 31
32
Incremental Systems Transparent + Practical + Efficient bhatotia@mpi-sws.org Thank you all! 32
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.