Hadoop Demo Presented by: Imranul Hoque 1
Topics Hadoop running modes – Stand alone – Pseudo distributed – Cluster Running MapReduce jobs Status/logs Sample MapReduce code 2
Required Software Hadoop (release ) – /hadoop tar.gz /hadoop tar.gz Java Development Kit (jdk 1.6.0_01) – Ant (ant 1.7.1) – -ant bin.tar.gz -ant bin.tar.gz 3
Setup NameNode: sherpa01JobTracker: sherpa02 DataNode/TaskTracker: sherpa05, sherpa06 4
Assumptions ssh must be installed and sshd must be running Shared home directory (nfs) across all nodes in the cluster (makes life easier) 5
Steps Install JDK, ant Passphraseless ssh Compiling Hadoop Setting up config parameters Starting up Hadoop Running jobs Job status 6
Passphraseless ssh SourceDestination 1.Generate private-public key-pair 2.~/.ssh/id_dsa and ~/.ssh/id_dsa.pub 3.Send the public key to Destination 3.Add the public key to the authorized key list ~/.ssh/authorized_keys 7
Passphraseless ssh (2) NFS 1.ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 2.cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys (four times) 3.Modify hostname in authorized_keys sherpa01sherpa02sherpa05sherpa06 Add “StrictHostKeyChecking no” in /etc/ssh/ssh_config to turn off prompt 8
Setting the PATH JAVA_HOME=/usr/java/jdk1.6.0_01 ANT_HOME=~/ant PATH=/usr/java/jdk1.6.0_01/bin:$PATH PATH=~/ant/bin:$PATH 9
Installing and Configuring Hadoop Extract Build (ant) Modify conf/hadoop-env.sh: – export JAVA_HOME=/usr/java/jdk1.6.0_01 Inform Hadoop of the Masters and Slaves – conf/masters – conf/slaves Modify conf/hadoop-site.xml 10
Rack Awareness topology.script.file.name conf/fakedns.sh In fakedns.sh: – echo /rack_id 11
Staring Hadoop Format Namenode FS (sherpa01): – bin/hadoop namenode -format From NameNode (sherpa01): – bin/start-dfs.sh From JobTracker (sherpa02): – bin/start-mapred.sh 12
Running MapReduce Copy data to HDFS – bin/hadoop dfs -copyFromLocal ~/data gutenberg Run MapReduce – bin/hadoop jar hadoop examples.jar wordcount -r 6 gutenberg gutenberg-output Some HDFS commands – copyToLocal, cat, cp, rm, du, ls, etc. 13
Job/Node Status NameNode: – DataNode: – Also look at the logs: – logs/ 14
WordCount.java src/examples/org/apache/hadoop/examples/ WordCount.java – Map function – Reduce function – Driver function 15
Shutdown From NameNode (sherpa01): – bin/stop-dfs.sh From JobTracker (sherpa02): – bin/stop-mapred.sh 16
Conclusion For more details: – –