Presentation is loading. Please wait.

Presentation is loading. Please wait.

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

Similar presentations


Presentation on theme: "”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay."— Presentation transcript:

1 ”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters”
Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay

2 Overview MapReduce Hadoop & MapReduce HadoopDB & MapReduce
MapReduceMerge Implementations of Relational Operators

3 Overview MapReduce Hadoop & MapReduce HadoopDB & MapReduce
MapReduceMerge Implementations of Relational Operators

4 MapReduce 100 lines of the given document Word starting with A to H
Word starting with I to M Partition Reducer 1 Reducer 2 Reducer 3 split 2 split 1 split 4 split 3 Mapper 1 Mapper 2 Word Count Word starting with N to Z

5 MapReduce – Example (map)
map(“/path/wikipedia.org”, “to be or as”) Output of map function is (potentially many) key/value pairs. In our case, output (word, “1”) once per word in the document “to”, “1” “be”, “1” “or”, “1” “to”, “1” “be”, “1” “or”, “1”

6 MapReduce – Example (Reduce)
Each reducer read key/value pair from one partition, then sort and group the key/value pair In our case, compute the sum “be,1”, “as 1” “be 1” “be 1” “as 1” in partition 1 “or,1”, “to 1” “or 1” “to 1” “or 1” in partition 2 Reducer 1 – “be , [1,1]” “as, [1,1]” → “[be 2, as 2]” Reducer 2 – “or , [1,1]” “to, [1,1]” → “[or 2, to 2]”

7

8 MapReduce Schematic

9 Overview Hadoop & MapReduce MapReduce HadoopDB & MapReduce
MapReduceMerge Implementations of Relational Operators

10 Input Format Implementation
Hadoop & MapReduce Map Reduce Job Hadoop core HDFS Map Reduce Framework Name Node Job Tracker Master Node Input Format Implementation Worker Node Task Tracker Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Data Node

11 Input Format Implementation
Hadoop & MapReduce Sql Query Map Reduce Job Hive Hadoop core HDFS Map Reduce Framework Name Node Job Tracker Master Node Input Format Implementation Worker Node Task Tracker Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Data Node

12 Relational Data Processing using MapReduce
map(key emp_id, value bonus) reduce(key emp_id, value bonus_list)

13 Overview HadoopDB & MapReduce MapReduce Hadoop & MapReduce
MapReduceMerge Implementations of Relational Operators

14 HadoopDB & MapReduce

15 Overview MapReduceMerge MapReduce Hadoop & MapReduce
HadoopDB & MapReduce MapReduceMerge Implementations of Relational Operators

16 Why to extend MapReduce
Well suitable for un-structured data but not for structured data (RDBS) Join is inefficient in MapReduce Strict 2-Phase overflow Map - first phase Reduce – second phase

17 Extension to MapReduce
Add a merge phase to map reduce algorithm Allow efficient processing of multiple dataset (RDBMS) Merge is made up of several components Merge Function Processor Function Partition Selector Configurable Iterator

18 Components of Merge Merger → same principle as map and reduce
Processor →processes data from one source Partition Selector →select the data that should go to the merger Configurable Iterator →how to iterate through each list as the merging is done

19 Implementation Overview

20 MapReduceMerge Example
Join employee and department table and compute bonuses Employee Department

21 Example : Map(employee dataset)
B $150

22 Example : Map(department dataset)
B $150

23 Example : Reduce(employee dataset)
B $150

24 Example: Reduce(department dataset)

25 Example: Merge(employee, department)
B $150

26 Implementing Relational Algebra Operations
Projection Aggregation Selection Set perations: Union, Intersection, Difference Cartesian Product Rename Join

27 Projection All we have to do is emit a subset of the data passed in.
Just a mapper can do this map(key emp_id, value emp_info){ emit(emp_id); }

28 Aggregation By choosing appropriate keys, can implement “group by” and aggregate SQL operators in MapReduce. Reducer sort and group the intermediate key/value pair based on the key

29 Aggregation

30 Selection If selection condition involves only the attributes of one data source, can implement in mappers. map(key emp_id, value emp_info){ if(emp_info.dept_id = A) emit(emp_id, emp_info); }

31 Selection If it’s on aggregates or a group
of values contained in one data source, can implement in reducers. If it involves attributes or aggregates from both data sources, implement in mergers

32 Set Union Let each of the two MapReduces emit a sorted list of unique elements Merges just iterate simultaneously over the lists: store the lesser value and increment its iterator, if there is a lesser value if the two are equal, store one of the two, and increment both iterators

33 Set Intersection Let each of the two MapReduces emit a sorted list of unique elements Merges just iterate simultaneously over the lists: if there is a lesser value, increment its iterator if the two are equal, store one of the two, and increment both iterators

34 Sort-Merge Join Map: partition records into key ranges according to the values of the attributes on which you’re sorting, aiming for even distribution of values to mappers. Reduce: sort the data. Merge: join the sorted data for each key range.

35 Conclusion Map-Reduce-Merge adds the ability to execute arbitrary relational algebra queries Next steps: develop SQL-like interface and a query optimizer

36 References Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, OSDI '04 MapReduce: Simplified Data Processing on Large Clusters, SIGMOD '07 HadoopDB :An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. VLDB '09 Hive : A Warehousing Solution over a MapReduce Framework, VLDB 09


Download ppt "”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay."

Similar presentations


Ads by Google