Hadoop install.

Slides:



Advertisements
Similar presentations
Platforms: Unix and on Windows. Linux: the only supported production platform. Other variants of Unix, like Mac OS X: run Hadoop for development. Windows.
Advertisements

1 Hadoop HDFS Install Hadoop HDFS with Ubuntu
Hadoop Setup. Prerequisite: System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments.
Chien-Chung Shen Google Compute Engine Chien-Chung Shen
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
GETTING STARTED USING LINUX UBUNTU FOR A MULTI-USER SYSTEM Team 4 Lab Coordinator Manager Presentation Prep Webmaster Document Prep Faculty Facilitator.
Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.
Tutorial on Hadoop Environment for ECE Login to the Hadoop Server Host name: , Port: If you are using Linux, you could simply.
Using Opal to deploy a real scientific application as a Web service Sriram Krishnan
One to One instructions Installing and configuring samba on Ubuntu Linux to enable Linux to share files and documents with Windows XP.
Servlets Environment Setup. Agenda:  Setting up Java Development Kit  Setting up Web Server: Tomcat  Setting up CLASSPATH.
Example: Jena and Fuseki
Cassandra Installation Guide and Example Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
– Introduction to the Shell 10/1/2015 Introduction to the Shell – Session Introduction to the Shell – Session 2 · Permissions · Users.
File Permissions. What are the three categories of users that apply to file permissions? Owner (or user) Group All others (public, world, others)
AE6382 Secure Shell Usually referred to as ssh, the name refers to both a program and a protocol. The program ssh is one of the most useful networking.
自由軟體實驗室 設置 Hadoop 環境 王耀聰 陳威宇 國家高速網路與計算中心 (NCHC)
Apache Mahout. Prerequisites for Building MAHOUT Java JDK 1.6 Maven 3.0 or higher ( ). Subversion (optional)
City Cluster Quickstart Lien-Chi Lai, COLA Lab, Department of Mathematics, NTU 2010/05/11.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Java Programming, Second Edition Appendix A Working with Java SDK 1.4.
CVS Concurrent Versions System LI-WEN CHEN
1 Network Simulator 2 Install Chao-Ying Chiu. 2 Outline n Install Cygwin n Install NS2 n Test NS2 Example.
在 FreeBSD 中打造 JSP 環 境與 java application 的 概念 JDK1.4+Tomcat J2 sdk.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
SQOOP INSTALLATION GUIDE Lecturer : Prof. Kyungbaek Kim Presenter : Zubair Amjad.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
下載 hadoop 1: 安裝 jdk 1: sudo add-apt-repository ppa:webupd8team/java.
® IBM Software Group © 2006 IBM Corporation Rational Asset Manager v7.2 Using Scripting Tutorial for using command line and scripting using Ant Tasks Carlos.
Tomcat Setup BCIS 3680 Enterprise Programming. One-Click Tomcat Setup 2  This semester we’ll try to set up Tomcat with a PowerShell script.  Preparation.
Settings MySQL Database and JDBC configuration Instructor: Sergey Goldman.
Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.
Ns2 Installations and Basics Abdul Razaque. How to install Ubuntu on windows-7 & 8 Download the Ubuntu ISO file. You can get the ISO file from the Ubuntu.
Advanced Computing Facility Introduction
Distributed Computing using CloudLab
EE516: Embedded Software Project 1
Copy from Windows to Ubuntu
Oozie – Workflow Engine
Hadoop Architecture Mr. Sriram
Install external command line softwares
Unit 2 Hadoop and big data
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Set up environment for mapreduce developing on Hadoop
軟體貨櫃主機 新增 Data Node.
Presented By, Sasikumar Venkatesh, ME-CSE
C151 Multi-User Operating Systems
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Presented by: - Yogesh Kumar
Getting Data into Hadoop
Using Linux and Lab Setup OS Lab 1
Hands-On Hadoop Tutorial
The Linux Operating System
Getting SSH to Work Between Computers
Launchpad & Bazaar Use Launchpad to work on team projects
Three modes of Hadoop.
Lab 1 introduction, debrief
INSTALLING AND SETTING UP APACHE2 IN A LINUX ENVIRONMENT
XỬ LÝ DỮ LIỆU SONG SONG & PHÂN TÁN VỚI HADOOP
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Hands-On Hadoop Tutorial
Run Java file with Window cmd
Creating ODP regional node from scratch
Hadoop Installation and Setup on Ubuntu
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Hyperledger Fabric and Composer Install
Pentaho Data Integration
Hola Hadoop.
Hadoop Installation Fully Distributed Mode
02 | Getting Started with HDInsight
Presentation transcript:

Hadoop install

安裝相關軟體與Hadoop版本 虛擬機軟體:VMplayer 作業系統:Ubuntu Java版本:JDK 7 版本:ubuntu-12.04.3-desktop-amd64 Java版本:JDK 7 Hadoop版本:Hadoop 1.1.2

VM建構與環境設定

P.S.Num lock記得打開

紅色字體表示指令 紫色字體表示加入檔案中的文字 安裝Java 紅色字體表示指令 紫色字體表示加入檔案中的文字

Open a terminal window with Ctrl + Alt + T Install Java JDK 7 a. Download the Java JDK (https://www.dropbox.com/s/h6bw3tibft3gs17/jdk- 7u21-linux-x64.tar.gz) b. Unzip the file cd Downloads tar -xvf jdk-7u21-linux-x64.tar.gz c. Now move the JDK 7 directory to /usr/lib sudo mkdir -p /usr/lib/jvm sudo mv ./jdk1.7.0_21 /usr/lib/jvm/jdk1.7.0

d. Now run sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1 sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1 sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.7.0/bin/javaws" 1

sudo chmod a+x /usr/bin/java sudo chmod a+x /usr/bin/javac e. Correct the file ownership and the permissions of the executables: sudo chmod a+x /usr/bin/java sudo chmod a+x /usr/bin/javac sudo chmod a+x /usr/bin/javaws sudo chown -R root:root /usr/lib/jvm/jdk1.7.0 f. Check the version of you new JDK 7 installation: java -version

紅色字體表示指令 紫色字體表示加入檔案中的文字 安裝Hadoop 紅色字體表示指令 紫色字體表示加入檔案中的文字

11. Install SSH Server sudo apt-get install openssh-client sudo apt-get install openssh-server 12. Configure SSH su - hduser ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost

1. Download Apache Hadoop 1. 1. 2 (https://www. dropbox 1.Download Apache Hadoop 1.1.2 (https://www.dropbox.com/s/znonl6ia1259by3/hadoop-1.1.2.tar.gz) and store it the Downloads folder 2. Unzip the file (open up the terminal window) cd Downloads sudo tar xzf hadoop-1.1.2.tar.gz cd /usr/local sudo mv /home/hduser/Downloads/hadoop-1.1.2 hadoop sudo addgroup hadoop sudo chown -R hduser:hadoop hadoop

3.Open your .bashrc file in the extended terminal (Alt + F2) gksudo gedit .bashrc

4. Add the following lines to the bottom of the file:

# Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop export PIG_HOME=/usr/local/pig export PIG_CLASSPATH=/usr/local/hadoop/conf # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/jdk1.7.0/ # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed file from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # Requires installed 'lzop' command. lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less } # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$PIG_HOME/bin

5.Save the .bashrc file and close it 6.Run gksudo gedit /usr/local/hadoop/conf/hadoop-env.sh 7.Add the following lines # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/jdk1.7.0/ 8. Save and close file

9. In the terminal window, create a directory and set the required ownerships and permissions sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp 10. Run gksudo gedit /usr/local/hadoop/conf/core-site.xml 11. Add the following between the <configuration> … </configuration> tags

<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>

12.Save and close file 13. Run gksudo gedit /usr/local/hadoop/conf/mapred-site.xml 14. Add the following between the <configuration> … </configuration> tags

<property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>

15. Save and close file 16. Run gksudo gedit /usr/local/hadoop/conf/hdfs-site.xml 17. Add the following between the <configuration> … </configuration> tags

<property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

34. Format the HDFS /usr/local/hadoop/bin/hadoop namenode -format 35. Press the Start button and type Startup Applications 36. Add an application with the following command: /usr/local/hadoop/bin/start-all.sh 37. Restart Ubuntu and login

紅色字體表示指令 紫色字體表示加入檔案中的文字 DEMO 紅色字體表示指令 紫色字體表示加入檔案中的文字

1. Get the Gettysburg Address and store it on your Downloads folder (https://www.dropbox.com/s/w6yvyg1p57sf6sh/gettysburg.txt) 2. Copy the Gettysburg Address from the name node to the HDFS cd Downloads hadoop fs -put gettysburg.txt /user/hduser/getty/gettysburg.txt 3. Check that the Gettysburg Address is in HDFS hadoop fs -ls /user/hduser/getty/

4. Delete Gettysburg Address from your name node rm gettysburg.txt 5. Download a jar file that contains a WordCount program into the Downloads folder (https://www.dropbox.com/s/gp6t7616wsypkdo/chiu-wordcount2.jar) 6. Execute the WordCount program on the Gettysburg Address (the following command is one line) hadoop jar chiu-wordcount2.jar WordCount /user/hduser/getty/gettysburg.txt /user/hduser/getty/out

7. Check MapReduce results hadoop fs -cat /user/hduser/getty/out/part-r-00000