Download presentation
Presentation is loading. Please wait.
Published byDamon Young Modified over 9 years ago
1
Bounds for Overlapping Interval Join on MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA 2 nd Algorithms and Systems for MapReduce and Beyond (BeyondMR) Brussels, Belgium (27 March 2015)
2
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 2
3
Outline Introduction – Interval and Overlapping Intervals – Interval Join – Reducer capacity and Mapping Schema Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 3
4
Introduction 4 Talk
5
Overlapping Intervals – Two intervals, say interval i and interval j are called overlapping intervals if the intersection of both the interval is nonempty Introduction 5 Non-overlapping intervals Overlapping intervals i j Talk Coffee break 10am 10:35am 10:30am 11am
6
Introduction 6 EmpIDNameDuration U1-Apr –1-June V1-May –1-July W1-Apr –1-July X1-Mar –1-June Y1-Mar –1-Aug PhaseDuration Requirement Analysis (RA)1-Mar – 1-May Design (D)1-Apr – 1-June Coding (C)1-May –1-Aug 1-Mar 1-Apr 1-May 1-June 1-July 1-Aug Project Employee Project Employee RA D C Overlapping Interval Join: an example Find all the employee that are involved in RA phase of the project
7
Reducer capacity – An upper bound on the total number of intervals that are assigned to the reducer – Example Reducer capacity to be the size of the main memory of the processors on which reducers run Communication cost – Total amount of data to be transferred from the map phase to reduce phase – Tradeoff between the reducer capacity and communication cost Introduction 7
8
Mapping schema for interval join An assignment of the set of intervals to some given reducers, such that – Respect the reducer capacity The total number of intervals assigned to a reducer must be less than or equal to the reducer capacity – Assignment of inputs For every output, it is required to assign every two corrosponding overlapping corrossponding intervals to at least one reducer in common 8 Reducer I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3
9
State-of-the-Art B. Chawda, H. Gupta, S. Negi, T.A. Faruquie, L.V. Subramaniam, and M.K. Mohania, “Processing Interval Joins On Map-Reduce,” EDBT, 2014. MapReduce-based 2-way and multiway interval join algorithms of overlapping intervals Not regarding the reducer capacity No analysis of a lower bound on replication of individual intervals No analysis of the replication rate of the algorithms offered therein 9
10
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 10
11
Interval join problem – Assign all the intervals that share at least one common point of time to at least one reduce in common for finding outputs Goal of Mapping Schema 11
12
An algorithm for variable-length intervals that can start at any time – Before this, we consider two simple cases of Unit-length and equally-spaced intervals and provide algorithm Variable-length and equally-spaced intervals and provide algorithm All the algorithms achieve almost matching upper bound on the replication rate to the lower bound Our Contribution 12
13
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 13
14
Unit-Length and Equally-Spaced Intervals 14 0.25.50.75 1 1.25 1.5 1.75 2 2.25 X Y n = 9 and k = 2.25, so spacing = 0.25
15
Divide the time-range from 0 to k into equal-sized partitions of length w (say P partitions are created) Arrange P reducers Assign all intervals of X that exist in a partition p i to i th reducer Assign all intervals of Y that have their starting or ending-point in partition p i to i th reducer Unit-Length and Equally- Spaced Intervals-Algorithm 15 0.25.50.75 1 1.25 1.5 1.75 2 2.25 X Y n = 9 and k = 2.25 1 partition 2 partition 3 partition 5 partition 4 partition
16
Unit-Length and Equally-Spaced Intervals 16
17
Does the algorithm work? – Count 1: How many intervals of Y overlap with an interval X in a partition of length w? Spacing is n/k, so at most 2wn/k intervals of Y can overlap with an interval of X – Count 2: How many intervals can have starting points after starting of x i and starting points before ending of x i. Intervals of X after starting point of x i = wn/k Intervals of X before starting point of x i = n/k – Count 3: Do not forget to count x i itself and an identical interval of Y i.e. y i. Unit-Length and Equally-Spaced Intervals 17 0.25.50.75 1 1.25 1.5 1.75 2 2.25 X Y n = 9 and k = 2.25 1 partition 2 partition 3 partition 5 partition 4 partition
18
Unit-Length and Equally-Spaced Intervals 18
19
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 19
20
Two types of intervals – Big and small intervals – Different length intervals Variable-Length and Equally- Spaced Intervals 20
21
Big and small intervals – All the intervals of X are of length l min – All the intervals of Y are of length l max – The previous algorithm will work here too – Note that an interval of X will be replicated to several reducers, while an interval of Y will be replicated to at most two reducers Variable-Length and Equally- Spaced Intervals 21 0.7 1.4 2.1 2.8 3.5 4.2 X Y n = 6 and spacing = 0.7
22
Variable-length intervals: A general case – All the restriction regarding length of an interval and spacing between two interval is removed – Intervals can begin at some time greater than or equal to 0 and end by time T – S: the total length of intervals in one relation Variable-Length and Equally- Spaced Intervals 22 0 s s+1 s+2 s+3 T X Y
23
Variable-Length and Equally- Spaced Intervals 23 0 s s+1 s+2 s+3 T X Y
24
Variable-Length and Equally- Spaced Intervals 24
25
Variable-Length and Equally- Spaced Intervals 25
26
Variable-Length and Equally- Spaced Intervals 26 Average length/how much length a reducer can hold
27
Variable-Length and Equally- Spaced Intervals 27
28
Outline Introduction Problem Statement and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 28
29
Conclusion 29 Proofs of lower and upper bounds on the replication rate are given in the paper
30
Foto Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece afrati@softlab.ece.ntua.gr 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 3 Department of Computer Science, Stanford University, USA ullman@cs.stanford.edu Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.