Working with Hadoop
Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the field, including father of Hadoop)
Start the Virtual Machine
Inside the Virtual machine CentOS 6.4 JDK Hadoop Eclipse (Juno)
Basics of HDFS (routine) 5 With Terminal –hadoop –hadoop version –hadoop jar –hadoop fs … –hadoop fs -ls : List all file in HDFS –hadoop fs –put / -get / -mkdir / -rmdir...
Copy Files from Windows to VM WinSCP (see Demo at bin\scp_ssh\winscp575) –Protocol scp –Hostname (Get from ifconfig in Terminal) –Username/Passoword = cloudera/cloudera 6
Copy Files from VM (CentOS) to HDFS hadoop fs -put localfiles /user/cloudera 7
Copy Files from Windows to HDFS Via HUE services 8
Using web server – port 8888 (File manager)
Hadoop Administration 10
WordCount Example in Hadoop #1: Via guidelines in Cloudera website #2: Directly in Eclipse (Preferred)
WordCount in Cloudera Website /hadoop-tutorial/CDH5/Hadoop- Tutorial/ht_wordcount1.html Source code downloaded from Source code details and explanations: /hadoop-tutorial/CDH5/Hadoop- Tutorial/ht_wordcount1_source.html 12
WordCount in Cloudera Website Create directory in HDFS –$ hadoop fs -mkdir /user/cloudera –$ hadoop fs -chown cloudera /user/cloudera –$ hadoop fs -mkdir /user/cloudera/wordcount /user/cloudera/wordcount/input Create sample text –1: Directly in CentOS $ $ echo "Hadoop is an elephant" > file0 $ echo "Hadoop is as yellow as can be" > file1 $ echo "Oh what a yellow fellow is Hadoop" > file2 And then move to HDFS $ hadoop fs -put file* /user/cloudera/wordcount/input –2: Create in Windows and Copy to HDFS via HUE 13
WordCount in Cloudera Website Compilation error 14
WordCount Example in Hadoop #1: Via guidelines in Cloudera website #2: Directly in Eclipse (Preferred)
WordCount in Eclipse environment mapreduce-example-in-hadoop single-node-cluster-in- ubuntu bit/ (Some parts are different for ClouderaVM) 16
18
19
Update source codes (from website) 20
Adding JAR files to Project 21
usr/lib/hadoop; usr/lib/hadoop/lib; usr/lib/hadoop-mapreduce; usr/lib/hadoop-mapreduce/lib 22
Run Config Run Run Configurations 23
File Export 24
25
Update Properties in jar file 26
Prepare for run Make HDFS directory 27
Copy sample input to HDFS (via HUE) 28
Run the example (in.jar folder) (Make sure to remove output folder before use) 29
View the result 30
Other sources Very nice 31