Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slider Incremental Sliding Window Analytics Pramod Bhatotia MPI-SWS Umut Acar CMU Flavio Junqueira MSR Cambridge Rodrigo Rodrigues NOVA University of Lisbon.

Similar presentations


Presentation on theme: "Slider Incremental Sliding Window Analytics Pramod Bhatotia MPI-SWS Umut Acar CMU Flavio Junqueira MSR Cambridge Rodrigo Rodrigues NOVA University of Lisbon."— Presentation transcript:

1 Slider Incremental Sliding Window Analytics Pramod Bhatotia MPI-SWS Umut Acar CMU Flavio Junqueira MSR Cambridge Rodrigo Rodrigues NOVA University of Lisbon Middleware 2014

2 Data analytics systems 2 Raw data Data analytics system Information E.g. Web-crawl E.g. computing PageRank E.g. search SparkNaiadStormS4Hadoop

3 Design requirements 3 Recent trends Sliding window Streaming data Incremental updates + Incremental sliding window analytics for data stream

4 State-of-the-art: Stream processing 4 mutable state node 1 node 3 input records node 2 input records Batch-based systems Stream Batch# nBatch# 1 Batch# 2 …….. M M M M M M M M M M M M M M M M M M R R R R R R R R Output Input Single batch Classification based on programming model E.g. Storm, S4, NaiadE.g. D-Streams Trigger-based systems

5 Trade-offs for incremental updates 5 (+) efficient (-) hard to design (-) inefficient (+) easy to design Slider (require dynamic algorithms) (re-compute from scratch) Trigger-based systems Batch-based systems

6 Goals 1.Retain the advantages/simplicity of batch-based approach 2.Achieve the efficiency of incremental processing for sliding window analytics 6

7 Outline Motivation Basic design Slider design Evaluation 7

8 Our approach Take an unmodified data-parallel application written assuming unchanging data Automatically adapt it for incremental sliding window analytics 8

9 Behind the scenes 9 computation sub-computations dependence graph change propagation We follow this high-level approach for batch-based stream processing Step#1 divide Step#2 build Step #3 perform

10 Batch-based sliding window analytics 10 M M M M M M M M R R R R R R Stream.….. Window Step#1: Divide the computation Map & Reduce tasks Step#2: Build the dependence graph Data-flow graph of MapReduce

11 Step#3: Change propagation 11 B4B3B2B1 … … Stream M1 M2 M3 M4 R1 R2 R3 B5 added removed window M1 M5 Contraction tree # 3 Contraction tree # 3 Contraction tree # 1 Contraction tree # 1 Contraction tree # 2 Contraction tree # 2 Replace Reduce tasks with contraction trees

12 Outline Motivation Basic design Slider design Contraction tree Self-adjusting contraction tree Split processing Evaluation 12

13 Contraction tree What: Breaks down the work done by a Reduce task to allow fine-grained change propagation How: Leverages Combiners at the Reducer site 13

14 “Zoom IN” with a single Reducer 14 M2 M3 M4 M1 B4B3B2B1 B5 Stream window M1 removed added Contraction tree Replace M5 R R

15 Example of contraction tree 15 Reduce task Tree of combiners Map outputs

16 “Zoom IN” with a single Reducer 16 M2 M3 M4 M1 B4B3B2B1 B5 Stream window M1 removed added Contraction tree Replace M5 R R

17 Basic design w/ contraction tree 17 Pramod Bhatotia M2 M3 M4 M1 B4B3B2B1 B5 Stream window M1 removed added M5 Path affected by M1 Path affected by M5

18 Limitation of the contraction tree Naïve grouping of Combiner tasks may lead to sub-optimal reuse of the memoized result 18 Self-adjusting contraction tree

19 Outline Motivation Basic design Slider design Contraction tree Self-adjusting contraction tree Split processing Evaluation 19

20 Self-adjusting contraction tree The tree should have low depth (implies short path length for re-computation) Key ingredients: Balanced tree: sublinear updates w.r.t. window size Self-adjusting capability after change propagation 20

21 Self-adjusting contraction tree(s) 21 General case Fixed-width Append-only Different modes of operation Fixed-width

22 Fixed-width window slides 22

23 Rotating contraction tree 23

24 Rotating contraction tree 24 B4B4 Update path for bucket 4 Memoized results are reused

25 Outline Motivation Basic design Slider design Contraction tree Self-adjusting contraction tree Split processing Evaluation 25

26 Split processing 26 Background pre-processing Foreground processing Change propagation

27 Change propagation for bucket#4 27 Update path for bucket 4 Memoized results are reused

28 Split processing for bucket#4 28 Foreground processing Background pre-processing

29 Outline Motivation Basic design Slider design Evaluation 29

30 Evaluating Slider Goal: Determine how Slider works in practice 1.What are the performance benefits? 2.How effective is split processing? 3.What is the overhead for the initial run? Case studies @ MPI-SWS 30 more results in the paper

31 Q1: Performance gains 31 Speedup up to 3.8X w.r.t. basic contraction tree

32 Q2: Split processing 32 Foreground processing is faster by 30% on avg.

33 Q3: Performance overheads 33 Overheads 2% to 38% for the initial run

34 Case studies Online Social Networks [IMC’11] Information propagation in Twitter Networked Systems [NSDI’10] Glasnost: Detecting traffic shaping Hybrid CDNs [NSDI’12] Reliable client accounting 34 Details in the paper

35 Information propagation in Twitter 35 Speedup > 13X for ~ 5% window change

36 Summary Slider enables incremental sliding window analytics Transparently & efficiently Slider design includes Self-adjusting contractions trees for sub-linear updates Split processing for background pre-processing Multi-level trees for general data-flow programs (didn’t cover in the talk!) 36

37 Incremental Sliding Window Analytics Transparent + Efficient bhatotia@mpi-sws.org 37 Thanks!


Download ppt "Slider Incremental Sliding Window Analytics Pramod Bhatotia MPI-SWS Umut Acar CMU Flavio Junqueira MSR Cambridge Rodrigo Rodrigues NOVA University of Lisbon."

Similar presentations


Ads by Google