Bounds for Overlapping Interval Join on MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA 2 nd Algorithms and Systems for MapReduce and Beyond (BeyondMR) Brussels, Belgium (27 March 2015)
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 2
Outline Introduction – Interval and Overlapping Intervals – Interval Join – Reducer capacity and Mapping Schema Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 3
Introduction 4 Talk
Overlapping Intervals – Two intervals, say interval i and interval j are called overlapping intervals if the intersection of both the interval is nonempty Introduction 5 Non-overlapping intervals Overlapping intervals i j Talk Coffee break 10am 10:35am 10:30am 11am
Introduction 6 EmpIDNameDuration U1-Apr –1-June V1-May –1-July W1-Apr –1-July X1-Mar –1-June Y1-Mar –1-Aug PhaseDuration Requirement Analysis (RA)1-Mar – 1-May Design (D)1-Apr – 1-June Coding (C)1-May –1-Aug 1-Mar 1-Apr 1-May 1-June 1-July 1-Aug Project Employee Project Employee RA D C Overlapping Interval Join: an example Find all the employee that are involved in RA phase of the project
Reducer capacity – An upper bound on the total number of intervals that are assigned to the reducer – Example Reducer capacity to be the size of the main memory of the processors on which reducers run Communication cost – Total amount of data to be transferred from the map phase to reduce phase – Tradeoff between the reducer capacity and communication cost Introduction 7
Mapping schema for interval join An assignment of the set of intervals to some given reducers, such that – Respect the reducer capacity The total number of intervals assigned to a reducer must be less than or equal to the reducer capacity – Assignment of inputs For every output, it is required to assign every two corrosponding overlapping corrossponding intervals to at least one reducer in common 8 Reducer I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3
State-of-the-Art B. Chawda, H. Gupta, S. Negi, T.A. Faruquie, L.V. Subramaniam, and M.K. Mohania, “Processing Interval Joins On Map-Reduce,” EDBT, MapReduce-based 2-way and multiway interval join algorithms of overlapping intervals Not regarding the reducer capacity No analysis of a lower bound on replication of individual intervals No analysis of the replication rate of the algorithms offered therein 9
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 10
Interval join problem – Assign all the intervals that share at least one common point of time to at least one reduce in common for finding outputs Goal of Mapping Schema 11
An algorithm for variable-length intervals that can start at any time – Before this, we consider two simple cases of Unit-length and equally-spaced intervals and provide algorithm Variable-length and equally-spaced intervals and provide algorithm All the algorithms achieve almost matching upper bound on the replication rate to the lower bound Our Contribution 12
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 13
Unit-Length and Equally-Spaced Intervals X Y n = 9 and k = 2.25, so spacing = 0.25
Divide the time-range from 0 to k into equal-sized partitions of length w (say P partitions are created) Arrange P reducers Assign all intervals of X that exist in a partition p i to i th reducer Assign all intervals of Y that have their starting or ending-point in partition p i to i th reducer Unit-Length and Equally- Spaced Intervals-Algorithm X Y n = 9 and k = partition 2 partition 3 partition 5 partition 4 partition
Unit-Length and Equally-Spaced Intervals 16
Does the algorithm work? – Count 1: How many intervals of Y overlap with an interval X in a partition of length w? Spacing is n/k, so at most 2wn/k intervals of Y can overlap with an interval of X – Count 2: How many intervals can have starting points after starting of x i and starting points before ending of x i. Intervals of X after starting point of x i = wn/k Intervals of X before starting point of x i = n/k – Count 3: Do not forget to count x i itself and an identical interval of Y i.e. y i. Unit-Length and Equally-Spaced Intervals X Y n = 9 and k = partition 2 partition 3 partition 5 partition 4 partition
Unit-Length and Equally-Spaced Intervals 18
Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 19
Two types of intervals – Big and small intervals – Different length intervals Variable-Length and Equally- Spaced Intervals 20
Big and small intervals – All the intervals of X are of length l min – All the intervals of Y are of length l max – The previous algorithm will work here too – Note that an interval of X will be replicated to several reducers, while an interval of Y will be replicated to at most two reducers Variable-Length and Equally- Spaced Intervals X Y n = 6 and spacing = 0.7
Variable-length intervals: A general case – All the restriction regarding length of an interval and spacing between two interval is removed – Intervals can begin at some time greater than or equal to 0 and end by time T – S: the total length of intervals in one relation Variable-Length and Equally- Spaced Intervals 22 0 s s+1 s+2 s+3 T X Y
Variable-Length and Equally- Spaced Intervals 23 0 s s+1 s+2 s+3 T X Y
Variable-Length and Equally- Spaced Intervals 24
Variable-Length and Equally- Spaced Intervals 25
Variable-Length and Equally- Spaced Intervals 26 Average length/how much length a reducer can hold
Variable-Length and Equally- Spaced Intervals 27
Outline Introduction Problem Statement and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 28
Conclusion 29 Proofs of lower and upper bounds on the replication rate are given in the paper
Foto Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel 3 Department of Computer Science, Stanford University, USA Presentation is available at