Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Overview of MapReduce and Hadoop
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Map Reduce Allan Jefferson Armando Gonçalves Rocir Leite Filipe??
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Map Reduce Architecture
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Advanced Software Engineering Cloud Computing and Big Data Prof. Harold Liu.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
 Challenges:  How to distribute computation?  Distributed/parallel programming is hard  Map-reduce addresses all of the above  Google’s computational/data.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman,
MapReduce and the New Software Stack CHAPTER 2 1.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NETE4631 Network Information Systems (NISs): Big Data and Scaling in the Cloud Suronapee, PhD 1.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
 Much of the course will be devoted to large scale computing for data mining  Challenges:  How to distribute computation?  Distributed/parallel.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Part II NoSQL Database (MapReduce) Yuan Xue
MapReduce and Hadoop Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 10, 2014.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Large-scale file systems and Map-Reduce
Lecture 3. MapReduce Instructor: Weidong Shi (Larry), PhD
湖南大学-信息科学与工程学院-计算机与科学系
Map-Reduce framework -By Jagadish Rouniyar.
CS 345A Data Mining MapReduce This presentation has been altered.
5/7/2019 Map Reduce Map reduce.
Big Data Analysis MapReduce.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Jeffrey D. Ullman Stanford University

2 Chunking Replication Distribution on Racks

 Datasets can be very large.  Tens to hundreds of terabytes.  Cannot process on a single server.  Standard architecture emerging:  Cluster of commodity Linux nodes (compute nodes).  Gigabit Ethernet interconnect.  How to organize computations on this architecture?  Mask issues such as hardware failure.

Mem Disk CPU Mem Disk CPU … Switch Each rack contains nodes Mem Disk CPU Mem Disk CPU … Switch 1 Gbps between any pair of nodes in a rack 2-10 Gbps backbone between racks

 First order problem: if nodes can fail, how can we store data persistently?  Answer: Distributed File System.  Provides global file namespace.  Examples: Google GFS, Colossus; Hadoop HDFS.  Typical usage pattern:  Huge files.  Data is rarely updated in place.  Reads and appends are common.

 Chunk Servers.  File is split into contiguous chunks, typically 64MB.  Each chunk replicated (usually 2x or 3x).  Try to keep replicas in different racks.  Alternative: Erasure coding.  Master Node for a file.  Stores metadata, location of all chunks.  Possibly replicated.

7  Organized into racks.  Intra-rack connection typically gigabit speed.  Inter-rack connection faster by a small factor.

8 Racks of Compute Nodes File Chunks

9 3-way replication of files, with copies on different racks.

10 MapReduce Key-Value Stores SQL Implementations

11 Distributed File System MapReduce, e.g. Hadoop Object Store (key-value store), e.g., BigTable, Hbase, Cassandra SQL Implementations, e.g., PIG (relational algebra), HIVE

12  MapReduce (Google) and open-source (Apache) equivalent Hadoop.  Important specialized parallel computing tool.  Cope with compute-node failures.  Avoid restart of the entire job.

13  BigTable (Google), Hbase, Cassandra (Apache), Dynamo (Amazon).  Each row is a key plus values over a flexible set of columns.  Each column component can be a set of values.  Example: Structure of the Web.  Key is a URL.  One column is a set of URL’s – those linked to the page represented by the key.  A second column is the set of URL’s linking to the key.

14  PIG – Yahoo! implementation of relational algebra.  Translates to a sequence of map-reduce operations, using Hadoop.  Hive – open-source (Apache) implementation of a restricted SQL, called QL, over Hadoop.

15  Sawzall – Google implementation of parallel select + aggregation, but using C++.  Dremel – (Google) real restricted SQL, column oriented store.  F1 – (Google) row-oriented, conventional, but massive scale.  Scope – Microsoft implementation of restricted SQL.

16 Formal Definition Implementation Fault-Tolerance Examples: Word-Count, Join

 Input: a set of key/value pairs.  User supplies two functions:  map(k,v)  set(k1,v1)  reduce(k1, list(v1))  set(v2)  Technically, the input consists of key-value pairs of some type, but usually only the value is important.  (k1,v1) is an intermediate key/value pair.  Output is the set of (k1,v2) pairs.

 MapReduce job =  Map function (inputs -> key-value pairs) +  Reduce function (key and list of values -> outputs).  Map and Reduce Tasks apply Map or Reduce function to (typically) many of their inputs.  Unit of parallelism. 18

19  The Map tasks generate key-value pairs.  Each takes one or more chunks of input from the distributed file system.  The system takes all the key-value pairs from all the Map tasks and sorts them by key.  Then, it forms key-(list-of-associated-values) pairs and passes each key-(value-list) pair to one of the Reduce tasks.

20 Map tasks Reduce tasks Input from DFS Output to DFS “key”-value pairs

 We have a large file documents, which are sequences of words.  Count the number of times each distinct word appears in the file.

map(key, value): // key: document name; value: text of document FOR (each word w in value) emit(w, 1); reduce(key, value-list): // key: a word; value: an iterator over value-list result = 0; FOR (each count v on value-list) result += v; emit(result);

User Program Worker Master Worker fork assign map assign reduce read local write remote read, sort Output File 0 Output File 1 write Chunk 0 Chunk1 Chunk 2 Input Data

 Input and final output are stored in the distributed file system.  Scheduler tries to schedule Map tasks “close” to physical storage location of input data – preferably at the same node.  Intermediate results are stored on local file storage of Map and Reduce workers.

 Maintain task status: (idle, active, completed).  Idle tasks get scheduled as workers become available.  When a Map task completes, it sends the Master the location and sizes of its intermediate files, one for each Reduce task.  Master pushes location of intermediates to Reduce tasks.  Master pings workers periodically to detect failures.

 Rule of thumb: Use several times more Map tasks and Reduce tasks than the number of compute nodes available.  Minimizes skew caused by different tasks taking different amounts of time.  One DFS chunk per Map task is common.

 Often a Map task will produce many pairs of the form (k,v1), (k,v2), … for the same key k.  E.g., popular words in Word Count.  Can save communication time by applying Reduce function to values with the same key at the Map task.  Called a combiner.  Works only if Reduce function is commutative and associative.

 We need to assure that records with the same intermediate key end up at the same Reduce task.  System uses a default partition function e.g., hash(key) mod R, if there are R Reduce tasks.  Sometimes useful to override.  Example: hash(hostname(URL)) mod R ensures URLs from a host end up at the same Reduce task and therefore appear together in the output.

29  MapReduce is designed to deal with compute nodes failing to execute a task.  Re-executes failed tasks, not whole jobs.  Failure modes: 1.Compute-node failure (e.g., disk crash). 2.Rack communication failure. 3.Software failures, e.g., a task requires Java n; node has Java n-1.

30 1. Matrix-Matrix and Matrix-vector multiplication.  One step of the PageRank iteration was the original application. 2. Relational algebra operations.  We’ll do an example of the join. 3. Many other “embarrassingly parallel” operations.

 Map-Reduce job =  Map function (inputs -> key-value pairs) +  Reduce function (key and list of values -> outputs).  Map and Reduce Tasks apply Map or Reduce function to (typically) many of their inputs.  Unit of parallelism.  Mapper = application of the Map function to a single input.  Reducer = application of the Reduce function to a single key-(list of values) pair. 31

 Join of R(A,B) with S(B,C) is the set of tuples (a,b,c) such that (a,b) is in R and (b,c) is in S.  Mappers need to send R(a,b) and S(b,c) to the same reducer, so they can be joined there.  Mapper output: key = B-value, value = relation and other component (A or C).  Example: R(1,2) -> (2, (R,1)) S(2,3) -> (2, (S,3)) 32

33 Mapper for R(1,2) R(1,2)(2, (R,1)) Mapper for R(4,2) R(4,2) Mapper for S(2,3) S(2,3) Mapper for S(5,6) S(5,6) (2, (R,4)) (2, (S,3)) (5, (S,6))

 There is a reducer for each key.  Every key-value pair generated by any mapper is sent to the reducer for its key. 34

35 Mapper for R(1,2) (2, (R,1)) Mapper for R(4,2) Mapper for S(2,3) Mapper for S(5,6) (2, (R,4)) (2, (S,3)) (5, (S,6)) Reducer for B = 2 Reducer for B = 5

 The input to each reducer is organized by the system into a pair:  The key.  The list of values associated with that key. 36

37 Reducer for B = 2 Reducer for B = 5 (2, [(R,1), (R,4), (S,3)]) (5, [(S,6)])

 Given key b and a list of values that are either (R, a i ) or (S, c j ), output each triple (a i, b, c j ).  Thus, the number of outputs made by a reducer is the product of the number of R’s on the list and the number of S’s on the list. 38

39 Reducer for B = 2 Reducer for B = 5 (2, [(R,1), (R,4), (S,3)]) (5, [(S,6)]) (1,2,3), (4,2,3)