+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108.

Slides:

Advertisements

Similar presentations

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.

Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.

Cloud Computing Other Mapreduce issues Keke Chen.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

HADOOP ADMIN: Session -2

On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.

Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)

MapReduce VS Parallel DBMSs

Introduction to Hadoop 趨勢科技研發實驗室. Copyright Trend Micro Inc. Outline Introduction to Hadoop project HDFS (Hadoop Distributed File System) overview.

MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …

H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

HDFS Hadoop Distributed File System

1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.

HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

A Comparison of Approaches to Large-Scale Data Analysis Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, Michael.

IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.

C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.

HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

By: Joel Dominic and Carroll Wongchote 4/18/2012.

1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.

BIG DATA/ Hadoop Interview Questions.

Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.

Hadoop Aakash Kag What Why How 1.

An Open Source Project Commonly Used for Processing Big Data Sets

Introduction to MapReduce and Hadoop

Hadoop Clusters Tess Fulkerson.

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Ministry of Higher Education

The Basics of Apache Hadoop

Charles Tappert Seidenberg School of CSIS, Pace University

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

李智宇、林威宏、施閔耀

+ Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 李智宇、林威宏、施閔耀

+ What is Hadoop ? open-source software framework process and store big data Easy to use and implement, economic, flexible lots of nodes(server) written in JAVA free license created by Doug Cutting and Mike Cafarella in 李智宇、林威宏、施閔耀

+ Advantages of Interpreted Language Cross-platform(ex: Windows, Ubuntu, Mac OS X) smaller executable program size easier to modify during both development and execution 李智宇、林威宏、施閔耀 4

+ Architecture of Hadoop 李智宇、林威宏、施閔耀 5

+ Hadoop in Enterprise 李智宇、林威宏、施閔耀 The Dell representation of the Hadoop ecosystem.

+ Hadoop in Enterprise 李智宇、林威宏、施閔耀

+ Who is using Hadoop ? more than half of the Fortune 50 uses Hadoop by 李智宇、林威宏、施閔耀

+ HDFS Hadoop Distributed File System Client: user name node: manage and store metadata, namespace of files Data node: store files each data node sends its status to name node periodically 李智宇、林威宏、施閔耀 9

+ HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64 or 128MB), and have three copies in different data nodes. Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one, then the data node will send the file to the rest node. When above operation done, data node will send “done” to name node 李智宇、林威宏、施閔耀 10

+ HDFS: Reading data in HDFS Client send filename to the name node, then the name node will send a list of the blocks of files sorted by distance. Client use the list to get the file from data node 李智宇、林威宏、施閔耀 11

+ HDFS: failure node failure communication failure data corruption 李智宇、林威宏、施閔耀 12

+ HDFS: handle failure Handle writing failure: name node will skip the data node without an ACK. Handle reading failure: recall that when reading a file, client will get a list of data node content the file 李智宇、林威宏、施閔耀 13

+ HDFS: handle failure Name node handle node failure : name node will find out the data the failure node have, and copy those data from others and restore them to other data node. Note that HDFS can’t guarantee at least one copy of data is alive 李智宇、林威宏、施閔耀 14

+ MapReduce similar to divide-and-conquer First, use “Map” to divide tasks Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “ Third, use “Reduce” to “execute the user- defined reduce function to produce the final output data. “ 李智宇、林威宏、施閔耀 15

+ MapReduce-Map 李智宇、林威宏、施閔耀 16

+ MapReduce-shuffle 李智宇、林威宏、施閔耀 17

+ MapReduce-Reduce 李智宇、林威宏、施閔耀 18

+ MapReduce 李智宇、林威宏、施閔耀 19

+ Comparison 李智宇、林威宏、施閔耀 20

+ Comparison 李智宇、林威宏、施閔耀 21

+ Why Hadoop? technically 李智宇、林威宏、施閔耀 22 Comparison of Grep Task Result with Vertica and DBMS-X

+ Why Hadoop? Simple structure vs. Optimization Transaction time not minimized Lower performance with same number of nodes No compelling reason to choose Hadoop technically 李智宇、林威宏、施閔耀 23

+ Why Hadoop? commercially 李智宇、林威宏、施閔耀 24

+ Why Hadoop Cheap (Buy more servers to beat DBMS) Flexible (Both in design and deployment) Easier to design Easier to scale up Combine with other system to achieve better performance commercially 李智宇、林威宏、施閔耀 25

+ Conclusion Hadoop is much easier for users to implement and more economic MapReduce advocates should study the techniques used in parallel DBMSs Hybrid systems are also popular With improvement of performance, we believe Hadoop will lead the trend of big data computing 李智宇、林威宏、施閔耀 26

+ Reference x768/522903b7/Yahoo_Logo.png x768/522903b7/Yahoo_Logo.png content/uploads/2013/09/google.jpg content/uploads/2013/09/google.jpg 李智宇、林威宏、施閔耀

+ Reference York_Times_logo.png York_Times_logo.png Documents/hadoop-introduction.pdf Documents/hadoop-introduction.pdf e.pdf e.pdf b&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud. org%2Fcloud%2Fraw- attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf& ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8 v3_kuTYg 李智宇、林威宏、施閔耀

+ Reference e-Hadoop-Deployment-Comparison-Study.pdf e-Hadoop-Deployment-Comparison-Study.pdf &cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu %2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMIN G.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal- tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg mapreduce.html mapreduce.html 李智宇、林威宏、施閔耀

+ Reference A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden m m money.jpg 李智宇、林威宏、施閔耀 30