Hadoop Installation and Setup on Ubuntu

Slides:



Advertisements
Similar presentations
Platforms: Unix and on Windows. Linux: the only supported production platform. Other variants of Unix, like Mac OS X: run Hadoop for development. Windows.
Advertisements

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
1 Hadoop HDFS Install Hadoop HDFS with Ubuntu
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Presented by John Dougherty, Viriton 4/28/2015 Infrastructure and Stack.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
1 HBase Intro 王耀聰 陳威宇
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Apache Mahout. Prerequisites for Building MAHOUT Java JDK 1.6 Maven 3.0 or higher ( ). Subversion (optional)
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Copyright © 2015, SAS Institute Inc. All rights reserved. THE ELEPHANT IN THE ROOM SAS & HADOOP.
Nov 2006 Google released the paper on BigTable.
강호영 Contents Storm introduction – Storm Architecture – Concepts of Storm – Operation Modes : Local Mode vs. Remote(Cluster) Mode.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Distributed Computing using CloudLab
Oozie – Workflow Engine
Hadoop Architecture Mr. Sriram
Introduction to Distributed Platforms
Apache hadoop & Mapreduce
Unit 2 Hadoop and big data
An Open Source Project Commonly Used for Processing Big Data Sets
Presented By, Sasikumar Venkatesh, ME-CSE
Tutorial: Big Data Algorithms and Applications Under Hadoop
Chapter 10 Data Analytics for IoT
Hadoop Developer.
MSBIC Hadoop Series Processing Data with Pig
Hadoopla: Microsoft and the Hadoop Ecosystem
Presented by: - Yogesh Kumar
Hadoop.
Understanding Hadoop Mr. Sriram
Getting Data into Hadoop
Introduction to HDFS: Hadoop Distributed File System
Three modes of Hadoop.
Hadoop Clusters Tess Fulkerson.
The master node shows only one live data node when I am running multi node cluster in Big data.
Central Florida Business Intelligence User Group
Ministry of Higher Education
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Hadoop Technopoints.
Introduction to Apache
Hadoop install.
Setup Sqoop.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Introduction to Hadoop and Apache Spark
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Hadoop Installation Fully Distributed Mode
02 | Getting Started with HDInsight
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Pig Hive HBase Zookeeper
Presentation transcript:

Hadoop 2.2.0 Installation and Setup on Ubuntu 12.04.3 CT Yang Department of Computer Science Tunghai University

Hadoop Document, http://hadoop.apache.org/docs/r2.2.0/ http://en.wikipedia.org/wiki/Apache_Hadoop Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. 2019/1/13

Other Hadoop-related projects at Apache Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications. 2019/1/13

OS: Ubnutu 12.04.03 LTS MyHadoop-master 192.168.159.50 MyHadoop-node01 192.168.159.51 MyHadoop-node02 192.168.159.52

修改hosts sudo vim /etc/hosts

修改hostname sudo vim /etc/hostname sudo service hostname start 重新登入

安裝Java JDK sudo apt-get -y install openjdk-7-jdk sudo ln -s /usr/lib/jvm/java-7-openjdk-amd64 /usr/l ib/jvm/jdk

新增hadoop使用者 sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo

建立SSH免密碼登入 ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys scp –r ~/.ssh MyHadoop-node01:~/

下載hadoop cd ~ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/c ommon/hadoop-2.2.0/hadoop-2.2.0.tar.gz tar zxf hadoop-2.2.0.tar.gz mv hadoop-2.2.0.tar.gz hadoop

新增環境變數 vim .bashrc export JAVA_HOME=/usr/lib/jvm/jdk/ export HADOOP_INSTALL=/home/hduser/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL

設定hadoop config cd hadoop/etc/hadoop vim hadoop-env.sh 將export JAVA_HOME這一行做修改

設定hadoop config(cont.) vim core-site.xml <property> <name>fs.default.name</name> <value>hdfs://MyHadoop-master:9000</value> </property>

設定hadoop config(cont.) vim yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <name>yarn.resourcemanager.hostname</name> <value>MyHadoop-master</value>

設定hadoop config(cont.) cp mapred-site.xml.template mapred-site.xml vim mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>

設定hadoop config(cont.) mkdir -p ~/mydata/hdfs/namenode mkdir -p ~/mydata/hdfs/datanode vim hdfs-site.xml <property> <name>dfs.replication</name> <value>2</value> </property> <name>dfs.namenode.name.dir</name> <value>/home/hduser/mydata/hdfs/namenode</value> <name>dfs.datanode.data.dir</name> <value>/home/hduser/mydata/hdfs/datanode</value>

設定hadoop config(cont.) vim slaves MyHadoop-node01 MyHadoop-node02

複製hadoop給所有node scp -r /home/hduser/hadoop MyHadoop-node01:/home/hd user

格式化HDFS hdfs namenode -format

啟動Hadoop start-all.sh

使用jps查看java正在運行的程式 jps

Hadoop監控網頁 MyHadoop-master:8088

範例程式 cd /home/hduser/hadoop hadoop jar share/hadoop/mapreduce/hadoop-mapreduce- examples-2.2.0.jar pi 2 5

停止hadoop 服務 stop-all.sh

XML預設資料 http://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-common/core-default.xml http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce- client/hadoop-mapreduce-client-core/mapred-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- common/yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-hdfs/hdfs-default.xml