Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015.

Similar presentations


Presentation on theme: "Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015."— Presentation transcript:

1 Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015

2 Akshat Alexander Björn Pedro RodrigoUmut EkinRafael Flavio

3 Common workflow: Rerun the same application with evolving input Scientific computing Reactive Systems Data analytics Small input change  small work to update the output 3 Goal: Efficient execution of apps in successive runs

4 Existing approaches (-) Inefficient (+) Easy to design Re-compute from scratch Design application- specific algorithms Incremental computation (+) Efficient (-) Hard to design 4 Automatic + Efficient

5 To enable transparent, practical, and efficient incremental computation in real-world parallel and distributed systems My thesis research 5

6 My projects @ MPI-SWS Incoop Incremental batch processing [HotCloud’11, SoCC’11] Incoop Incremental batch processing [HotCloud’11, SoCC’11] 6 Shredder Incremental storage [FAST’12] Shredder Incremental storage [FAST’12] Slider Incremental stream processing [Middleware’14, Hadoop Summit’15] Slider Incremental stream processing [Middleware’14, Hadoop Summit’15] Conductor* Incremental job deployment [NSDI’12, LADIS’10, PODC’10] Conductor* Incremental job deployment [NSDI’12, LADIS’10, PODC’10] iThreads Incremental multithreading [ASPLOS’15] iThreads Incremental multithreading [ASPLOS’15] * 2 nd author

7 Incremental Multithreading iThreads@mpi-sws.org Joint work with: Pedro Fonseca & Björn Brandenburg (MPI-SWS) Rodrigo Rodrigues (Nova University of Lisbon) Umut Acar (CMU)

8 Well-studied topic in the PL community Compiler and language-based approaches Target sequential programs!  Auto “Incrementalization” 8

9 Proposals for parallel incremental computation Hammer et al. [DAMP’07] Burckhardt et al. [OOPSLA’11] Limitations Require a new language with special data-types Restrict programming model to strict fork-join Existing multi-threaded programs are not supported! Parallelism 9

10 1.Transparency Target unmodified pthreads based programs 2.Practicality Support the full range of synchronization primitives 3.Efficiency Design parallel infrastructure for incremental computation Design goals 10

11 iThreads $ LD_PRELOAD=“iThreads.so” $./myProgram (initial run) $ echo “ ” >> changes.txt $./myProgram (incremental run) 11 Speedups w.r.t. pthreads up to 8X

12 Motivation Design Evaluation Outline 12

13 Behind the scenes Computation Sub-computations step#1 Divide step#2 Build step#3 Perform Dependence graph Change propagation 13 Initial run Incremental run

14 A simple example lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); shared variables: x, y, and z 14

15 Step # 1 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 15 How do we divide a computation into sub- computations? How do we divide a computation into sub- computations?

16 Sub-computations 16 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Entire thread? Entire thread? “Coarse-grained” Small change implies recomputing the entire thread “Coarse-grained” Small change implies recomputing the entire thread Single Instruction? Single Instruction? “Fine-grained” Requires tracking individual load/store instructions “Fine-grained” Requires tracking individual load/store instructions

17 Sub-computation granularity 17 Single instruction Single instruction Entire thread Entire thread Coarse-grainedExpensive A sub-computation is a sequence of instructions between pthreads synchronization points Release Consistency (RC) memory model to define the granularity of a sub-computation

18 Sub-computations 18 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Sub-computation for thread-1 Sub-computation for thread-1 Sub- computations for thread-2

19 Step # 2 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 19 How do we build the dependence graph? How do we build the dependence graph?

20 Example: Changed schedule 20 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z different schedule Read={y} Write={y,z} (1): Record happens-before dependencies

21 Example: Same schedule 21 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={y} Write={y,z} Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z Same schedule

22 Example: Changed input 22 lock(); z = ++y; unlock(); thread-1 thread-2 lock(); x++; unlock(); local_var++; lock(); y = 2*x + z; unlock(); Read={y} Write={y,z} Read={x} Write={x} Read={ local_var } Write={ local_var } Read={x,z} Write={y} shared variables: x, y, and z different input y’ Read={y} Write={y,z} Read={x, z } Write={y} (2): Record data dependencies

23 Vertices: sub-computations Edges: Happens-before order b/w sub-computations Intra-thread: Totally ordered based on execution Inter-thread: Partially ordered based on sync primitives Data dependency b/w sub-computations If they can be ordered based on happens-before, & Antecedent writes the data that is read by precedent Dependence graph 23

24 Step # 3 Computation Sub-computations Divide BuildPerform Dependence graph Change propagation 24 How do we perform change propagation? How do we perform change propagation?

25 Change propagation Dirty set  {Changed input} For each sub-computation in a thread Check validity in the recorded happens-before order If (Read set Dirty set) – Re-compute the sub-computation – Add write set to the dirty set Else – Skip execution of the sub-computation – Apply the memoized effects of the sub-computation 25

26 Motivation Design Evaluation Outline 26

27 Evaluating iThreads 1.Speedups for the incremental run 2.Overheads for the initial run Implementation (for Linux) 32-bit dynamically linkable shared library Platform Evaluated on Intel Xeon 12 cores running Linux Evaluation 27

28 1. Speedup for incremental run 28 Better Worst Speedups vs. pthreads up to 8X

29 Memoization overhead 29 Space overheads w.r.t. input size Write a lot of intermediate state

30 2. Overheads for the initial run 30 Runtime overheads vs. pthreads Worst Better

31 A case for parallel incremental computation iThreads: Incremental multithreading Transparent: Targets unmodified programs Practical: Supports the full range of sync primitives Efficient: Employs parallelism for change propagation Usage: A dynamically linkable shared library Summary 31

32 Incremental Systems Transparent + Practical + Efficient bhatotia@mpi-sws.org Thank you all! 32


Download ppt "Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015."

Similar presentations


Ads by Google