”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters”
Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay

Overview MapReduce Hadoop & MapReduce HadoopDB & MapReduce
MapReduceMerge Implementations of Relational Operators

MapReduce 100 lines of the given document Word starting with A to H
Word starting with I to M Partition Reducer 1 Reducer 2 Reducer 3 split 2 split 1 split 4 split 3 Mapper 1 Mapper 2 Word Count Word starting with N to Z

MapReduce – Example (map)
map(“/path/wikipedia.org”, “to be or as”) Output of map function is (potentially many) key/value pairs. In our case, output (word, “1”) once per word in the document “to”, “1” “be”, “1” “or”, “1” “to”, “1” “be”, “1” “or”, “1”

MapReduce – Example (Reduce)
Each reducer read key/value pair from one partition, then sort and group the key/value pair In our case, compute the sum “be,1”, “as 1” “be 1” “be 1” “as 1” in partition 1 “or,1”, “to 1” “or 1” “to 1” “or 1” in partition 2 Reducer 1 – “be , [1,1]” “as, [1,1]” → “[be 2, as 2]” Reducer 2 – “or , [1,1]” “to, [1,1]” → “[or 2, to 2]”

MapReduce Schematic

Overview Hadoop & MapReduce MapReduce HadoopDB & MapReduce

Input Format Implementation
Hadoop & MapReduce Map Reduce Job Hadoop core HDFS Map Reduce Framework Name Node Job Tracker Master Node Input Format Implementation Worker Node Task Tracker Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Data Node

Input Format Implementation
Hadoop & MapReduce Sql Query Map Reduce Job Hive Hadoop core HDFS Map Reduce Framework Name Node Job Tracker Master Node Input Format Implementation Worker Node Task Tracker Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Data Node

Relational Data Processing using MapReduce
map(key emp_id, value bonus) reduce(key emp_id, value bonus_list)

Overview HadoopDB & MapReduce MapReduce Hadoop & MapReduce

HadoopDB & MapReduce

Overview MapReduceMerge MapReduce Hadoop & MapReduce
HadoopDB & MapReduce MapReduceMerge Implementations of Relational Operators

Why to extend MapReduce
Well suitable for un-structured data but not for structured data (RDBS) Join is inefficient in MapReduce Strict 2-Phase overflow Map - first phase Reduce – second phase

Extension to MapReduce
Add a merge phase to map reduce algorithm Allow efficient processing of multiple dataset (RDBMS) Merge is made up of several components Merge Function Processor Function Partition Selector Configurable Iterator

Components of Merge Merger → same principle as map and reduce
Processor →processes data from one source Partition Selector →select the data that should go to the merger Configurable Iterator →how to iterate through each list as the merging is done

Implementation Overview

MapReduceMerge Example
Join employee and department table and compute bonuses Employee Department

Example : Map(employee dataset)
B $150

Example : Map(department dataset)
B $150

Example : Reduce(employee dataset)
B $150

Example: Reduce(department dataset)

Example: Merge(employee, department)
B $150

Implementing Relational Algebra Operations
Projection Aggregation Selection Set perations: Union, Intersection, Difference Cartesian Product Rename Join

Projection All we have to do is emit a subset of the data passed in.
Just a mapper can do this map(key emp_id, value emp_info){ emit(emp_id); }

Aggregation By choosing appropriate keys, can implement “group by” and aggregate SQL operators in MapReduce. Reducer sort and group the intermediate key/value pair based on the key

Aggregation

Selection If selection condition involves only the attributes of one data source, can implement in mappers. map(key emp_id, value emp_info){ if(emp_info.dept_id = A) emit(emp_id, emp_info); }

Selection If it’s on aggregates or a group
of values contained in one data source, can implement in reducers. If it involves attributes or aggregates from both data sources, implement in mergers

Set Union Let each of the two MapReduces emit a sorted list of unique elements Merges just iterate simultaneously over the lists: store the lesser value and increment its iterator, if there is a lesser value if the two are equal, store one of the two, and increment both iterators

Set Intersection Let each of the two MapReduces emit a sorted list of unique elements Merges just iterate simultaneously over the lists: if there is a lesser value, increment its iterator if the two are equal, store one of the two, and increment both iterators

Sort-Merge Join Map: partition records into key ranges according to the values of the attributes on which you’re sorting, aiming for even distribution of values to mappers. Reduce: sort the data. Merge: join the sorted data for each key range.

Conclusion Map-Reduce-Merge adds the ability to execute arbitrary relational algebra queries Next steps: develop SQL-like interface and a query optimizer

References Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, OSDI '04 MapReduce: Simplified Data Processing on Large Clusters, SIGMOD '07 HadoopDB :An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. VLDB '09 Hive : A Warehousing Solution over a MapReduce Framework, VLDB 09

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

Similar presentations

Presentation on theme: "”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

Similar presentations

Presentation on theme: "”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay."— Presentation transcript:

Similar presentations

About project

Feedback