Presentation is loading. Please wait.

Presentation is loading. Please wait.

NOVA University of Lisbon

Similar presentations


Presentation on theme: "NOVA University of Lisbon"— Presentation transcript:

1 NOVA University of Lisbon
Slider Incremental Sliding Window Analytics Pramod Bhatotia MPI-SWS Umut Acar CMU Flavio Junqueira MSR Cambridge Rodrigo Rodrigues NOVA University of Lisbon Hadoop Summit 2015

2 Data analytics systems
Spark Naiad Storm S4 Hadoop Raw data Data analytics system Information E.g. Web-crawl E.g. computing PageRank E.g. search

3 Incremental sliding window analytics for data stream
System goals Streaming data Incremental updates Recent trends Sliding window + Incremental sliding window analytics for data stream

4 State-of-the-art: Stream processing
Batch# n Batch# 1 Batch# 2 …….. mutable state node 1 node 3 input records node 2 Classification based on programming model M R Output Input Single batch Trigger-based systems Batch-based systems E.g. Storm, S4, Naiad E.g. D-Streams

5 Trade-offs for incremental updates
Trigger-based systems Batch-based systems (require dynamic algorithms) (re-compute from scratch) (+) efficient (-) hard to design (-) inefficient (+) easy to design Slider

6 Goals Retain the advantages/simplicity of batch-based approach
Achieve the efficiency of incremental processing for sliding window analytics

7 Outline Motivation Basic design Slider design Evaluation

8 Our approach Take an unmodified data-parallel application written assuming unchanging data Automatically adapt it for incremental sliding window analytics

9 Behind the scenes We follow this high-level approach for
Step#1 divide Step#2 build Step #3 perform computation sub-computations dependence graph change propagation We follow this high-level approach for batch-based stream processing

10 Batch-based sliding window analytics
Stream .….. M Step#1: Divide the computation Map & Reduce tasks Step#2: Build the dependence graph Data-flow graph of MapReduce R

11 Step#3: Change propagation
B5 added removed window B4 B3 B2 B1 Stream M5 M1 M1 M2 M3 M4 M5 Replace Reduce tasks with contraction trees Contraction tree # 3 tree # 1 tree # 2 R1 R2 R3

12 Outline Motivation Basic design Slider design Evaluation
Contraction tree Self-adjusting contraction tree Split processing Evaluation

13 Contraction tree What: Breaks down the work done by a Reduce task to allow fine-grained change propagation How: Leverages Combiners at the Reducer site

14 “Zoom IN” with a single Reducer
removed added window B1 B2 B3 B4 B5 Stream M1 M2 M3 M4 M1 M5 Contraction tree Replace R

15 Example of contraction tree
Reduce task Tree of combiners Map outputs

16 “Zoom IN” with a single Reducer
removed added window B1 B2 B3 B4 B5 Stream M1 M2 M3 M4 M1 M5 Contraction tree Replace R

17 Basic design w/ contraction tree
removed window added B1 B2 B3 B4 B5 Stream M1 M2 M3 M4 M1 M5 Path affected by M1 Path affected by M5 Pramod Bhatotia

18 Limitation of the contraction tree
Naïve grouping of Combiner tasks may lead to sub-optimal reuse of the memoized result Self-adjusting contraction tree

19 Outline Motivation Basic design Slider design Evaluation
Contraction tree Self-adjusting contraction tree Split processing Evaluation

20 Self-adjusting contraction tree
The tree should have low depth (implies short path length for re-computation) Key ingredients: Balanced tree: sublinear updates w.r.t. window size Self-adjusting capability after change propagation

21 Self-adjusting contraction tree(s)
Different modes of operation General case Fixed-width Fixed-width Append-only

22 Fixed-width window slides

23 Rotating contraction tree

24 Rotating contraction tree
B4 Memoized results are reused Update path for bucket 4

25 Outline Motivation Basic design Slider design Evaluation
Contraction tree Self-adjusting contraction tree Split processing Evaluation

26 Split processing Background pre-processing Foreground processing
Change propagation

27 Change propagation for bucket#4
Memoized results are reused Update path for bucket 4

28 Split processing for bucket#4
Background pre-processing Foreground processing

29 Outline Motivation Basic design Slider design Evaluation

30 Evaluating Slider Goal: Determine how Slider works in practice
What are the performance benefits? How effective is split processing? What is the overhead for the initial run? Case MPI-SWS more results in the paper

31 Question 1: Performance gains
Speedup up to 3.8X w.r.t. basic contraction tree

32 Question 2: Split processing
Foreground processing is faster by 30% on avg.

33 Question 3: Performance overheads
Overheads 2% to 38% for the initial run

34 Case studies Online Social Networks [IMC’11]
Information propagation in Twitter Networked Systems [NSDI’10] Glasnost: Detecting traffic shaping Hybrid CDNs [NSDI’12] Reliable client accounting Details in the paper of …

35 Information propagation in Twitter
Speedup > 13X for ~5% window change

36 Summary Slider enables incremental sliding window analytics
Transparently & efficiently Slider design includes Self-adjusting contractions trees for sub-linear updates Split processing for background pre-processing Multi-level trees for general data-flow programs (didn’t cover in the talk!)

37 Thanks! Incremental Sliding Window Analytics Transparent + Efficient More details in the paper published at ACM/USENIX Middleware 2014


Download ppt "NOVA University of Lisbon"

Similar presentations


Ads by Google