Bounds for Overlapping Interval Join on MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University.

Slides:



Advertisements
Similar presentations
Xiaoming Sun Tsinghua University David Woodruff MIT
Advertisements

NETCOD 2013 June 7-9, Calgary Broadcast Erasure Channel with Feedback: the Two Multicast Case Algorithms and Bounds Efe Onaran 1, Marios Gatzianas 2 and.
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
MapReduce.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma 2 joint work with Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, and Jeffrey D.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
Chapter 3 Growth of Functions
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Bit Complexity of Breaking and Achieving Symmetry in Chains and Rings.
A Fixed-Delay Broadcasting Protocol for Video-on-Demand Jehan-Francois Paris Department of Computer Science University of Houston A Channel-Based Heuristic.
An Introduction to Black-Box Complexity
1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
1 CSC103: Introduction to Computer and Programming Lecture No 26.
Instructor Neelima Gupta
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.
CSC 201 Analysis and Design of Algorithms Lecture 03: Introduction to a CSC 201 Analysis and Design of Algorithms Lecture 03: Introduction to a lgorithms.
Meta-MapReduce A Technique for Reducing Communication in MapReduce Computations Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
Approximation schemes Bin packing problem. Bin Packing problem Given n items with sizes a 1,…,a n  (0,1]. Find a packing in unit-sized bins that minimizes.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Foto Afrati — National Technical University of Athens Anish Das Sarma — Google Research Semih Salihoglu — Stanford University Jeff Ullman — Stanford University.
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
Abtin Keshavarzian Yashar Ganjali Department of Electrical Engineering Stanford University June 5, 2002 Cell Switching vs. Packet Switching EE384Y: Packet.
Solving the Maximum Cardinality Bin Packing Problem with a Weight Annealing-Based Algorithm Kok-Hua Loh University of Maryland Bruce Golden University.
Parallelizing Video Transcoding Using Map-Reduce-Based Cloud Computing Speaker : 童耀民 MA1G0222 Feng Lao, Xinggong Zhang and Zongming Guo Institute of Computer.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Problem Statement How do we represent relationship between two related elements ?
Improving the Performance Competitive Ratios of Transactional Memory Contention Managers Gokarna Sharma Costas Busch Louisiana State University, USA WTTM.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Packet Classification Using Dynamically Generated Decision Trees
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
On Detecting Termination in Cognitive Radio Networks Shantanu Sharma 1 and Awadhesh Kumar Singh 2 1 Ben-Gurion University of the Negev, Israel 2 National.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.
Jeffrey D. Ullman Stanford University.  A real story from CS341 data-mining project class.  Students involved did a wonderful job, got an “A.”  But.
Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Advanced Database Aggregation Query Processing
Advanced Algorithms Analysis and Design
Upper and Lower Bounds on the cost of a Map-Reduce Computation
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
Haim Kaplan and Uri Zwick
Course Description Algorithms are: Recipes for solving problems.
Assignment Problems of Different-Sized Inputs in MapReduce
Random Testing.
Optimizing Interactive Analytics Engines for Heterogeneous Clusters
Private and Secure Secret Shared MapReduce
Algorithm.
Theory of MapReduce Algorithms
Objective of This Course
Mining Sequential Patterns
The Selection Problem.
Course Description Algorithms are: Recipes for solving problems.
Presentation transcript:

Bounds for Overlapping Interval Join on MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA 2 nd Algorithms and Systems for MapReduce and Beyond (BeyondMR) Brussels, Belgium (27 March 2015)

Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 2

Outline Introduction – Interval and Overlapping Intervals – Interval Join – Reducer capacity and Mapping Schema Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 3

Introduction 4 Talk

Overlapping Intervals – Two intervals, say interval i and interval j are called overlapping intervals if the intersection of both the interval is nonempty Introduction 5 Non-overlapping intervals Overlapping intervals i j Talk Coffee break 10am 10:35am 10:30am 11am

Introduction 6 EmpIDNameDuration U1-Apr –1-June V1-May –1-July W1-Apr –1-July X1-Mar –1-June Y1-Mar –1-Aug PhaseDuration Requirement Analysis (RA)1-Mar – 1-May Design (D)1-Apr – 1-June Coding (C)1-May –1-Aug 1-Mar 1-Apr 1-May 1-June 1-July 1-Aug Project Employee Project Employee RA D C Overlapping Interval Join: an example Find all the employee that are involved in RA phase of the project

Reducer capacity – An upper bound on the total number of intervals that are assigned to the reducer – Example Reducer capacity to be the size of the main memory of the processors on which reducers run Communication cost – Total amount of data to be transferred from the map phase to reduce phase – Tradeoff between the reducer capacity and communication cost Introduction 7

Mapping schema for interval join An assignment of the set of intervals to some given reducers, such that – Respect the reducer capacity The total number of intervals assigned to a reducer must be less than or equal to the reducer capacity – Assignment of inputs For every output, it is required to assign every two corrosponding overlapping corrossponding intervals to at least one reducer in common 8 Reducer I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3 I1I1 I1I1 I2I2 I2I2 I3I3 I3I3

State-of-the-Art B. Chawda, H. Gupta, S. Negi, T.A. Faruquie, L.V. Subramaniam, and M.K. Mohania, “Processing Interval Joins On Map-Reduce,” EDBT, MapReduce-based 2-way and multiway interval join algorithms of overlapping intervals Not regarding the reducer capacity No analysis of a lower bound on replication of individual intervals No analysis of the replication rate of the algorithms offered therein 9

Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 10

Interval join problem – Assign all the intervals that share at least one common point of time to at least one reduce in common for finding outputs Goal of Mapping Schema 11

An algorithm for variable-length intervals that can start at any time – Before this, we consider two simple cases of Unit-length and equally-spaced intervals and provide algorithm Variable-length and equally-spaced intervals and provide algorithm All the algorithms achieve almost matching upper bound on the replication rate to the lower bound Our Contribution 12

Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 13

Unit-Length and Equally-Spaced Intervals X Y n = 9 and k = 2.25, so spacing = 0.25

Divide the time-range from 0 to k into equal-sized partitions of length w (say P partitions are created) Arrange P reducers Assign all intervals of X that exist in a partition p i to i th reducer Assign all intervals of Y that have their starting or ending-point in partition p i to i th reducer Unit-Length and Equally- Spaced Intervals-Algorithm X Y n = 9 and k = partition 2 partition 3 partition 5 partition 4 partition

Unit-Length and Equally-Spaced Intervals 16

Does the algorithm work? – Count 1: How many intervals of Y overlap with an interval X in a partition of length w? Spacing is n/k, so at most 2wn/k intervals of Y can overlap with an interval of X – Count 2: How many intervals can have starting points after starting of x i and starting points before ending of x i. Intervals of X after starting point of x i = wn/k Intervals of X before starting point of x i = n/k – Count 3: Do not forget to count x i itself and an identical interval of Y i.e. y i. Unit-Length and Equally-Spaced Intervals X Y n = 9 and k = partition 2 partition 3 partition 5 partition 4 partition

Unit-Length and Equally-Spaced Intervals 18

Outline Introduction Goal of Mapping Schema and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 19

Two types of intervals – Big and small intervals – Different length intervals Variable-Length and Equally- Spaced Intervals 20

Big and small intervals – All the intervals of X are of length l min – All the intervals of Y are of length l max – The previous algorithm will work here too – Note that an interval of X will be replicated to several reducers, while an interval of Y will be replicated to at most two reducers Variable-Length and Equally- Spaced Intervals X Y n = 6 and spacing = 0.7

Variable-length intervals: A general case – All the restriction regarding length of an interval and spacing between two interval is removed – Intervals can begin at some time greater than or equal to 0 and end by time T – S: the total length of intervals in one relation Variable-Length and Equally- Spaced Intervals 22 0 s s+1 s+2 s+3 T X Y

Variable-Length and Equally- Spaced Intervals 23 0 s s+1 s+2 s+3 T X Y

Variable-Length and Equally- Spaced Intervals 24

Variable-Length and Equally- Spaced Intervals 25

Variable-Length and Equally- Spaced Intervals 26 Average length/how much length a reducer can hold

Variable-Length and Equally- Spaced Intervals 27

Outline Introduction Problem Statement and Our Contribution Unit-Length and Equally-Spaced Intervals Variable-Length and Equally-Spaced Intervals Conclusion 28

Conclusion 29 Proofs of lower and upper bounds on the replication rate are given in the paper

Foto Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel 3 Department of Computer Science, Stanford University, USA Presentation is available at