MapReduce.

Slides:



Advertisements
Similar presentations
Dan Bassett, Jonathan Canfield December 13, 2011.
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
The hybird approach to programming clusters of multi-core architetures.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
COMP 2903 A34s – Google and the Wisdom of Clouds Danny Silver JSOCS, Acadia University.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Data-Parallel.
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Hadoop implementation of MapReduce computational model Ján Vaňo.
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Load Rebalancing for Distributed File Systems in Clouds.
Shared Nothing Architecture Allen Archer. What is Shared Nothing architecture? It is a distributed architecture in which each node is independent and.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Matrix Multiplication in Hadoop
BIG DATA/ Hadoop Interview Questions.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
MapReduce Compiler RHadoop
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
Map Reduce.
Introduction to MapReduce and Hadoop
Hadoop Clusters Tess Fulkerson.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
ITIS 1210 Introduction to Web-Based Information Systems
The Basics of Apache Hadoop
Hadoop Basics.
CS110: Discussion about Spark
Parallel Applications And Tools For Cloud Computing Environments
Lecture 16 (Intro to MapReduce and Hadoop)
CS 345A Data Mining MapReduce This presentation has been altered.
Distributed Systems CS
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce

Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the work across a large number of computers, then combine the results This has made MapReduce a very popular approach Hadoop is an open source implementation of MapReduce Unless you work for Google, you will probably use Hadoop

How it works List(a, b, c, …).map(x => f(x)) gives List(f(a), f(b), f(c),…) List(a, b, c, …).reduce((x, y) => x  y) gives a  b  c … where  is some binary operator

Another view http://www.cnblogs.com/sharpxiajun/p/3151395.html (in Japanese)

ForkJoin How does ForkJoin differ from MapReduce? Answers from stackoverflow: ForkJoin recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores MapReduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. MapReduce takes tens of seconds to start up, but scales well for much larger inputs (>100MB) on a compute cluster.

The End