Download presentation
Presentation is loading. Please wait.
Published byAmbrose Owens Modified over 8 years ago
1
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson
2
Operating System and Network Configuration The first thing we did was install Ubuntu 10.4 LTS on every machine. After all of the nodes were up and running Ubuntu, we connected all of them to the switch
3
Basic Configuration - Java After we got all of the machines connected to the switch, we had to install some of the packages we needed for Hadoop. The Ubuntu installation did not come with Java, so we installed Java on each machine and then configured the PATH variable for each machine so they would be able to discover the Java binary.
4
Basic Configuration - Hadoop After getting the Java Development Kit installed, we installed the Hadoop files and setup the PATH variable for HADOOP_HOME. We then had to create a Hadoop user account and group on each node and change the ownership of the Hadoop files over to that new user.
5
Basic Configuration - SSH After setting up the Hadoop accounts on each node, we had to setup the authorized_keys for the master node so it could shell into the Hadoop accounts on the other nodes.
6
File System Configuration On each node, we had to configure the XML files that were used for the distributed file system configuration. After setting up the DFS configuration, we had to format the namenode (master node). Once all configuration was done, we started the distributed file system and tasktracker scripts and got the datanodes and jobtrackers running on all of the slaves.
7
Test Run For our test run, we gave Hadoop seven different books to run against the word counting program provided with the installation. The first time we ran the test, the cluster successfully mapped all of the work, but failed to reduce. The problem ended up being caused by an error in the /etc/hosts configuration.
8
Test Run When the node running the reducer went to look for the output of its own maps, it would reference its own IP address to communicate with the task tracker it was running. What we did not realize was that the nodes were referencing themselves using an entry in /etc/hosts that was setup by the Ubuntu installation which pointed to 127.0.1.1 (nodeName-desktop) We changed the IP of this entry, on each node, to that specific node’s static IP address. This resolved the fetch failure issue we were having with the maps.
9
Test Run Once the problem was resolved, our Hadoop cluster successfully counted the occurrence of each word in the input files.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.