Based on Lin and Dryer’s text: Chapter 3.  Figure 2.6.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology.
Inverted Indexing for Text Retrieval Chapter 4 Lin and Dyer.
大规模数据处理 / 云计算 Lecture 4 – Mapreduce Algorithm Design 彭波 北京大学信息科学技术学院 4/24/2011 This work is licensed under a Creative.
Ch. 3 Lin and Dyer’s text Pages (39-69)
Cloud Computing Lecture #3 More MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, September 10, 2008 This work is licensed under a Creative.
Distributed Computations
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
CS506/606: Problem Solving with Large Clusters Zak Shafran, Richard Sproat Spring 2011 Introduction URL:
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Map/Reduce Programming Model
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MAP REDUCE BASICS CHAPTER 2 Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
HAMS Technologies 1
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Introduction to Search Engines Technology CS Technion, Winter 2013 Amit Gross Some slides are courtesy of: Edward Bortnikov & Ronny Lempel, Yahoo!
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
大规模数据处理 / 云计算 Lecture 5 – Mapreduce Algorithm Design 彭波 北京大学信息科学技术学院 7/19/2011 This work is licensed under a Creative.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
MAP REDUCE BASICS CHAPTER 2. Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
大规模数据处理 / 云计算 Lecture 3 – Mapreduce Algorithm Design 闫宏飞 北京大学信息科学技术学院 7/16/2013 This work is licensed under a Creative.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
Big Data Infrastructure Week 2: MapReduce Algorithm Design (2/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0.
MapReduce Basics Chapter 2 Lin and Dyer & /tutorial/
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
CCD-410 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera.
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Ch 8 and Ch 9: MapReduce Types, Formats and Features
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MR Application with optimizations for performance and scalability
MapReduce Algorithm Design
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Word Co-occurrence Chapter 3, Lin and Dyer.
Distributed System Gang Wu Spring,2018.
MR Application with optimizations for performance and scalability
Introduction to MapReduce
MapReduce Algorithm Design
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Word Co-occurrence Chapter 3, Lin and Dryer.
Presentation transcript:

Based on Lin and Dryer’s text: Chapter 3

 Figure 2.6

 A programmer has no control over: ◦ Where a mapper or reducer runs (i.e., on which node in the cluster). ◦ When a mapper or reducer begins or finishes. ◦ Which input key-value pairs are processed by a specific mapper. ◦ Which intermediate key-value pairs are processed by a specific reducer.

 Ability to:  Construct complex data types as keys and values for storage, processing and communications  Specify and execute initialization code before a map and/or reduce and the same for termination code after map and/or reduce.  To preserve state across multiple keys in map and/or in the reduce  To control sorting order of intermediate keys  To control partitioning of key space, and thus the set of keys a particular reduce will process

 Address the issues without creating bottleneck for scalability ◦ Golden standard that MR attempts is sheer linear scalability ◦ Storing and manipulating state has the potential of hindering scalability  How to improve performance? ◦ Make the functions efficient? ◦ Transfer of intermediate data efficient ◦ Aggregation of intermediate data is an important operation for efficiency ◦ Shrink the intermediate key space ◦ What else can we do?

 che/hadoop/mapreduce/Mapper.html che/hadoop/mapreduce/Mapper.html  che/hadoop/mapred/package-summary.html che/hadoop/mapred/package-summary.html  map-reduce-api map-reduce-api

class Mapper method Map(docid a, doc d) H ← new AssociativeArray for all term t ∈ doc d do H{t} ← H{t} + 1 //Tally counts for entire document for all term t ∈ H do Emit(term t, count H{t})

class Mapper method Initialize H ← new AssociativeArray method Map(docid a, doc d) for all term t ∈ doc d do H{t} ← H{t} + 1 Tally counts across documents method Close for all term t ∈ H do Emit(term t, count H{t})