VLDB, August 2012 (to appear) Avi Shinnar, David Cunningham, Ben Herta, Vijay Saraswat.

Slides:



Advertisements
Similar presentations
MapReduce Simplified Data Processing on Large Clusters
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Introduction to Spark Shannon Quinn (with thanks to Paco Nathan and Databricks)
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Clydesdale: Structured Data Processing on MapReduce Jackie.
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
报告人:黄磊 缓冲溶液的积分缓冲容量. 缓冲指数的概念是 Vanslyke 在 1922 年提出 的,意义是当缓冲溶液改变一个单位时需 加入酸碱物质的量 即 这里的缓冲指数指的是微分缓冲容量,是 加酸碱物质的量随着 pH 值的变化率 1 ,微分缓冲容量.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Big Data Analytics with R and Hadoop
大规模数据处理 / 云计算 Lecture 3 – Hadoop Environment 彭波 北京大学信息科学技术学院 4/23/2011 This work is licensed under a Creative Commons.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Storage in Big Data Systems
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
大规模数据处理 / 云计算 Lecture 5 – Hadoop Runtime 彭波 北京大学信息科学技术学院 7/23/2013 This work is licensed under a Creative Commons.
MapReduce How to painlessly process terabytes of data.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Clustering Very Large Multi- dimensional Datasets with MapReduce 蔡跳.
Mining High Utility Itemset in Big Data
Spark and Scala Sheng QIAN The Berkeley Data Analytics Stack.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Chapter 3 Programming Languages Unit 1 The Development of Programming Languages.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Data Engineering How MapReduce Works
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
逻辑设计基础 1 第 7 章 多级与(或)非门电路 逻辑设计基础 多级门电路.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Big Data Infrastructure Week 3: From MapReduce to Spark (2/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
SCI 数据库检索练习参考 本练习完全依照 SCI 数据库实际检索过程而 实现。 本练习完全依照 SCI 数据库实际检索过程而 实现。 练习中,选择了可以举一反三的题目,读 者可以根据题目进行另外的检索练习,如: 可将 “ 与 ” 运算检索改为 “ 或 ” 、 “ 非 ” 运算检索 等等。 练习中,选择了可以举一反三的题目,读.
SME.USTB Human Factors 人机工程学 By Wei Dong Department of Industry Design, SME, USTB.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Hadoop MapReduce Framework
MapReduce Types, Formats and Features
Spark Presentation.
Introduction to HDFS: Hadoop Distributed File System
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Data processing with Hadoop
MAPREDUCE TYPES, FORMATS AND FEATURES
Lecture 29: Distributed Systems
Map Reduce, Types, Formats and Features
Presentation transcript:

VLDB, August 2012 (to appear) Avi Shinnar, David Cunningham, Ben Herta, Vijay Saraswat

 Hadoop Map Reduce engine  Posing a transformational effect on the practice of Big Data computing  Based on HDFS (a resilient distributed filesystem).  Automaticlly partition data across nodes and operations are applied in parallel.  The remarkable properties  Simple  Widely applicable  Parallelizable framework  Scalable framework  Resilient framework

 Design Point  Offline,long-lived,resilient computations  HMR API  Support only single-job execution.  Incure I/O and (de-)serialization cost.  Mappers and reducers for each job are started in new JVMs(JVMs typically have high startup cost).  An out-of-core shuffle implementation is used.  Pose a substatial effect on performance.  We need interactive analytics  Pose a substatial effect on performance.  We need interactive analytics

 M3R(Main Memory MapReduce)  It is a new implementation of the HMR API.  M3R/Hadoop  Implementation of HMR API using managed X10  Existing Hadoop applications just work.  Reuse HDFS (and some other parts of Hadoop)  In-memory: problem size must fit in cluster RAM  Not resilient: any node goes down lead to fail.  But considerably faster (closer to HPC speeds)

 A type-safe, objectoriented,multi-threaded, multi-node, garbage-collected programming language  X10 is built on the two fundamental notions of places and asynchrony.  Place  Also called Process. Supplies memory and worker-threads.  Collection of resident mutable data objects and activities that operate on data.  Asynchrony  Use asynchrony within a place and for communication across places

 Reducing Disk I/O  Reducing network communication  Reducing serialization/deserialization cost.  M3R affords significant benefits for job pipelines.

 HMR engine execution flows.  M3R engine execution flows.  EVALUATION  CONCLUSIONS  FUTURE WORK

8 Input (InputFormat/ RecordReader/ InputSplit) File System (HDFS) Map (Mapper) Reduce (Reducer) Output (OutputFormat/ RecordWriter OutputCommitter) Shuffle File System File System Network and disk i/os and deser cost disk i/o and seri cost Seri cost and disk i/o Network and disk i/o How can we eliminate these i/os? M3R How can we eliminate these i/os? M3R Network and disk i/os

 The general flow of M3R is similar to the flow of the HMR engine.  An M3R instance is associated with a fixed set of JVMs.  Significant benefits in avoiding network, file i/o and (de-)serialization costs.(job pipelines)  Input/Output Cache  Co-location  Partition Stability  DeDuplication.

 Introduce an in-memory key/value cache.  M3R caches the key/value pairs in memory before passing key/value pair to the mapper. before serializing it and write it to disk.  Bypass the required key/value sequence directly from the cache. As the data is stored in memory, there are no attendant (de)serialization costs or disk/network I/O activity.

11 Input (InputFormat/ RecordReader/ InputSplit) File System (HDFS) Map (Mapper) Reduce (Reducer) Output (OutputFormat/ RecordWriter OutputCommitter) Shuffle Cache Eliminate disk,network I/Os and (de)ser costs specially for shuffle Single job: Eliminate disk I/Os. Get rid of the file system backing for the two sides of the shuffle. No disk I/O Job pipelines:No network,disk I/Os,no (de)serili costs

 Shuffle 描述着数据从 map task 输出到 reduce task 输入的这段过程。  大部分 map task 与 reduce task 的执行是在不同的 节点上.  Reduce 执行时需要跨节点去拉取其它节点上的 map task 结果 ---network I/O  Shuffle 的目标:  完整地从 map task 端拉取数据到 reduce 端。  在跨节点拉取数据时,尽可能地减少对带宽的不必要 消耗。  减少磁盘 IO 对 task 执行的影响。 能优化的地方主要在于减少拉取数据的量及尽量使用内存而 不是磁盘。

 Co-location  Start multiple mappers and reducers in each place.  Some of the data a mapper is sending is destined for a reducer running in the same JVM.  The M3R engine guarantees that no network, or disk I/O is involved.

 We can’t avoid the time and space overhead of (de)serialization in shuffle.  The nodes need to communicate.  We can reduce the amount that needs to be communicated.

15 Mapper 1 Shuffle Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 Reducer 1 Reducer 2 Reducer 3 Reducer 4 Reducer 5 Reducer 6 Through the shuffle, the mappers send data to various reducers.

 M3R provides partition stability guarantee  The mapping from partitions to places is deterministic.  Allows job sequences to use a consistent partitioner to route data locally.  The reducer associated with a given partition number will always be run at the same place  Same place => Same memory  Can reuse existing data structures. Avoid a significant amount of communication

17 Mapper 1 Shuffle Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 Reducer 1 Reducer 2 Reducer 3 Reducer 4 Reducer 5 Reducer 6 int partitionNumber = getPartition(key, value); Partitioner

 M3R co-locate reducers  Coalesce duplicate keys and duplicate values, and only send one copy.  On deserialization,at the destination, there will be some aliases to that copy.  This also works if multiple mappers at a single place send the same data.

19 Mapper 1 Shuffle Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 Reducer 1 Reducer 2 Reducer 3 Reducer 4 Reducer 5 Reducer 6

20 Mapper 1 Shuffle Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 Reducer 1 Reducer 2 Reducer 3 Reducer 4 Reducer 5 Reducer 6

21 Mapper 1 Shuffle Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 Reducer 1 Reducer 2 Reducer 3 Reducer 4 Reducer 5 Reducer 6

22 Reducer (*) Map/ Pass (G) File System (HDFS) Map/ Bcast (V) Input (G) Input (V) Output V # Shuffle Map/ Pass (V # ) Input (V # ) Reducer (+) Output V’ Shuffle

23 Reducer (*) Map/ Pass (G) File System (HDFS) Map/ Bcast (V) Input (G) Input (V) Shuffle Map/ Pass (V # ) Reducer (+) Output V’ Shuffle Cache Do not communicate G Do no communication

 20 node cluster of IBM LS-22 blades connected by Gigabit Ethernet.  Each node has 2 quad-core AMD 2.3Ghz Opteron processors, 16 GB of memory, and is running Red Hat Enterprise Linux 6.2.  The JVM used is IBM J  When running M3R on this cluster, we used one process per host, using 8 worker threads to exploit the 8 cores.

No partition stability,no cache Every iteration takes the same amount of time No partition stability,no cache Every iteration takes the same amount of time Performance changes drastically according to the amount of remote shuffling Performance changes drastically according to the amount of remote shuffling

 Sacrifice resilience and out-of-core execution  Gain performance.  Used X10 to build a fast map/reduce engine  Used X10 features to implement distributed cache  Avoid serialization, disk, network I/O costs.  50x faster for Hadoop app designed for M3R