Hadoop 2.2.0 Installation and Setup on Ubuntu 12.04.3 CT Yang Department of Computer Science Tunghai University
Hadoop Document, http://hadoop.apache.org/docs/r2.2.0/ http://en.wikipedia.org/wiki/Apache_Hadoop Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. 2019/1/13
Other Hadoop-related projects at Apache Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications. 2019/1/13
OS: Ubnutu 12.04.03 LTS MyHadoop-master 192.168.159.50 MyHadoop-node01 192.168.159.51 MyHadoop-node02 192.168.159.52
修改hosts sudo vim /etc/hosts
修改hostname sudo vim /etc/hostname sudo service hostname start 重新登入
安裝Java JDK sudo apt-get -y install openjdk-7-jdk sudo ln -s /usr/lib/jvm/java-7-openjdk-amd64 /usr/l ib/jvm/jdk
新增hadoop使用者 sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo
建立SSH免密碼登入 ssh-keygen -t rsa -f ~/.ssh/id_rsa -P "" cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys scp –r ~/.ssh MyHadoop-node01:~/
下載hadoop cd ~ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/c ommon/hadoop-2.2.0/hadoop-2.2.0.tar.gz tar zxf hadoop-2.2.0.tar.gz mv hadoop-2.2.0.tar.gz hadoop
新增環境變數 vim .bashrc export JAVA_HOME=/usr/lib/jvm/jdk/ export HADOOP_INSTALL=/home/hduser/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL
設定hadoop config cd hadoop/etc/hadoop vim hadoop-env.sh 將export JAVA_HOME這一行做修改
設定hadoop config(cont.) vim core-site.xml <property> <name>fs.default.name</name> <value>hdfs://MyHadoop-master:9000</value> </property>
設定hadoop config(cont.) vim yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <name>yarn.resourcemanager.hostname</name> <value>MyHadoop-master</value>
設定hadoop config(cont.) cp mapred-site.xml.template mapred-site.xml vim mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
設定hadoop config(cont.) mkdir -p ~/mydata/hdfs/namenode mkdir -p ~/mydata/hdfs/datanode vim hdfs-site.xml <property> <name>dfs.replication</name> <value>2</value> </property> <name>dfs.namenode.name.dir</name> <value>/home/hduser/mydata/hdfs/namenode</value> <name>dfs.datanode.data.dir</name> <value>/home/hduser/mydata/hdfs/datanode</value>
設定hadoop config(cont.) vim slaves MyHadoop-node01 MyHadoop-node02
複製hadoop給所有node scp -r /home/hduser/hadoop MyHadoop-node01:/home/hd user
格式化HDFS hdfs namenode -format
啟動Hadoop start-all.sh
使用jps查看java正在運行的程式 jps
Hadoop監控網頁 MyHadoop-master:8088
範例程式 cd /home/hduser/hadoop hadoop jar share/hadoop/mapreduce/hadoop-mapreduce- examples-2.2.0.jar pi 2 5
停止hadoop 服務 stop-all.sh
XML預設資料 http://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-common/core-default.xml http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce- client/hadoop-mapreduce-client-core/mapred-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- common/yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-project- dist/hadoop-hdfs/hdfs-default.xml