Hadoop Introduction Wang Xiaobo 2011-12-8. Outline Install hadoop HDFS MapReduce WordCount Analyzing Compile image data TeleNav Confidential.

Slides:



Advertisements
Similar presentations
Hadoop Programming. Overview MapReduce Types Input Formats Output Formats Serialization Job g/apache/hadoop/mapreduce/package-
Advertisements

Intro to Map-Reduce Feb 21, map-reduce? A programming model or abstraction. A novel way of thinking about designing a solution to certain problems…
Running Hadoop. Hadoop Platforms Platforms: Unix and on Windows. – Linux: the only supported production platform. – Other variants of Unix, like Mac OS.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Stephen Tak-Lon Wu Indiana University Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels- Slettvet, Google Distributed.
Parallel and Distributed Computing: MapReduce Alona Fyshe.
Google MapReduce Framework A Summary of: MapReduce & Hadoop API Slides prepared by Peter Erickson
CS246 TA Session: Hadoop Tutorial Peyman kazemian 1/11/2011.
Intro to Map-Reduce Feb 4, map-reduce? A programming model or abstraction. A novel way of thinking about designing a solution to certain problems…
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
Poly Hadoop CSC 550 May 22, 2007 Scott Griffin Daniel Jackson Alexander Sideropoulos Anton Snisarenko.
An Introduction to MapReduce: Abstractions and Beyond! -by- Timothy Carlstrom Joshua Dick Gerard Dwan Eric Griffel Zachary Kleinfeld Peter Lucia Evan May.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 3: Mapreduce and Hadoop All slides © IG.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
Introduction to Google MapReduce WING Group Meeting 13 Oct 2006 Hendra Setiawan.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
大规模数据处理 / 云计算 Lecture 3 – Hadoop Environment 彭波 北京大学信息科学技术学院 4/23/2011 This work is licensed under a Creative Commons.
HADOOP ADMIN: Session -2
Introduction to Big Data, mostly from Course in big data, Spring 2014 Ruoming Jin.
MapReduce Programming Yue-Shan Chang. split 0 split 1 split 2 split 3 split 4 worker Master User Program output file 0 output file 1 (1) fork (2) assign.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VI: 2014/04/14.
Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨.
Introduction to Hadoop and HDFS
HAMS Technologies 1
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
IBM Research ® © 2007 IBM Corporation INTRODUCTION TO HADOOP & MAP- REDUCE.
大规模数据处理 / 云计算 Lecture 5 – Hadoop Runtime 彭波 北京大学信息科学技术学院 7/23/2013 This work is licensed under a Creative Commons.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Parallel Data Mining and Processing with Hadoop/MapReduce CS240A/290N, Tao Yang.
Big Data for Relational Practitioners Len Wyatt Program Manager Microsoft Corporation DBI225.
MapReduce design patterns Chapter 5: Join Patterns G 진다인.
Writing a MapReduce Program 1. Agenda  How to use the Hadoop API to write a MapReduce program in Java  How to use the Streaming API to write Mappers.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Before we start, please download: VirtualBox: – The Hortonworks Data Platform: –
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
Team3: Xiaokui Shu, Ron Cohen CS5604 at Virginia Tech December 6, 2010.
Big Data Infrastructure Week 2: MapReduce Algorithm Design (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Cloud Computing Mapreduce (2) Keke Chen. Outline  Hadoop streaming example  Hadoop java API Framework important APIs  Mini-project.
Filtering, aggregating and histograms A FEW COMPLETE EXAMPLES WITH MR, SPARK LUCA MENICHETTI, VAG MOTESNITSALIS.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, January 31, 2013 Session 2: Hadoop Nuts and Bolts This work is licensed.
Airlinecount CSCE 587 Spring Preliminary steps in the VM First: log in to vm Ex: ssh vm-hadoop-XX.cse.sc.edu -p222 Where: XX is the vm number assigned.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Parallel Data Processing with Hadoop/MapReduce
HADOOP Priyanshu Jha A.D.Dilip 6 th IT. Map Reduce patented[1] software framework introduced by Google to support distributed computing on large data.
Hadoop&Hbase Developed Using JAVA USE NETBEANS IDE.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
MAPREDUCE Massive Data Processing (I). Outline MapReduce Introduction Sample Code Program Prototype Programming using Eclipse.
Sort in MapReduce. MapReduce Block 1 Block 2 Block 3 Block 4 Block 5 Map Reduce Output 1 Output 2 Shuffle/Sort.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Introduction to Google MapReduce
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Map Reduce Program September 25th 2017 Kyung Eun Park, D.Sc.
Lecture 17 (Hadoop: Getting Started)
Central Florida Business Intelligence User Group
Ministry of Higher Education
Airlinecount CSCE 587 Fall 2017.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Hadoop MapReduce Types
인공지능연구실 이남기 ( ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )
Lecture 18 (Hadoop: Programming Examples)
Chapter X: Big Data.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Presentation transcript:

Hadoop Introduction Wang Xiaobo

Outline Install hadoop HDFS MapReduce WordCount Analyzing Compile image data TeleNav Confidential

Install hadoop Download and unzip Hadoop Install JDK 1.6 or higher version SSH Key Authentication master/salves Config hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.6.0_16 core-site.xml/hdfs-site.xml/mapred-site.xml Startup/Shutdown sh start-all.sh sh stop-all.sh

Install hadoop Monitor Hadoop Shell commands hadoop dsf -ls hadoop jar../hadoop examples.jar wordcount input/ output/

HDFS

Single namenode Block storage (64M) Replication Big file Not suit for low latency App Not suit for large numbers of small file 150 millions files need 32G memory Single user write

MapReduce

InputFormat InputSpliter RecordReader Combiner Same as Reducer , but run in Map local machine Partitioner Control the load of each reducer, default is even Reducer RecodWriter OutputFormat

WrodCount public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, “word count”); // 设置一个用户定义的 job 名称 job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); // 为 job 设置 Mapper 类 job.setCombinerClass(IntSumReducer.class); // 为 job 设置 Combiner 类 job.setReducerClass(IntSumReducer.class); // 为 job 设置 Reducer 类 job.setOutputKeyClass(Text.class); // 为 job 的输出数据设置 Key 类 job.setOutputValueClass(IntWritable.class); // 为 job 输出设置 value 类 FileInputFormat.addInputPath(job, new Path(otherArgs[0])); // 为 job 设置输入路 径 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));// 为 job 设置输出 路径 System.exit(job.waitForCompletion(true) ? 0 : 1); // 运行 job }

WrodCount public static class TokenizerMapper extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }

WrodCount Input the Apache Hadoop software library is a framework that allows for the… Map … Reducer Output

WrodCount Input the Apache Hadoop software library is a framework that allows for the… Map … Reducer Output

Use Hadoop to compile image data  Old compiler

Use Hadoop to compile image data

data.prepare.job write.to.txd.job traffic.jobwrite.traffic.to.txd.job collision.detection.job0 write.to.label.job collision.detection.job5 collision.detection.job1 collision.detection.job3 write.to.largelabel.jobcollision.detection.job6 write.to.dpoi.job collision.detection.job4

Use Hadoop to compile image data Reduce compile time from 5 days to 5 hours

Q&A Thanks ! TeleNav Confidential