Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.

Slides:



Advertisements
Similar presentations
How to monitor the $H!T out of Hadoop Developing a comprehensive open approach to monitoring hadoop clusters.
Advertisements

Platforms: Unix and on Windows. Linux: the only supported production platform. Other variants of Unix, like Mac OS X: run Hadoop for development. Windows.
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
Developing a MapReduce Application – packet dissection.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Poly Hadoop CSC 550 May 22, 2007 Scott Griffin Daniel Jackson Alexander Sideropoulos Anton Snisarenko.
Cloud Computing project NSYSU Sec. 2 Demo. NSYSU EE IT_LAB2 Parse & Index  Parse:  截出抓取文件內文字字元,並進行過濾、文 字處理。  Index:  將文字字元依順序排列並建立字元與文件關 係之連結。
Hadoop Setup. Prerequisite: System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments.
Hadoop Demo Presented by: Imranul Hoque 1. Topics Hadoop running modes – Stand alone – Pseudo distributed – Cluster Running MapReduce jobs Status/logs.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Integrating HADOOP with Eclipse on a Virtual Machine Moheeb Alwarsh January 26, 2012 Kent State University.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.
One to One instructions Installing and configuring samba on Ubuntu Linux to enable Linux to share files and documents with Windows XP.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HAMS Technologies 1
Big Data Analytics Training
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
HAMS Technologies 1
ZhangGang Since the Hadoop farm has not successfully configured at CC, so I can not do some test with HBase. I just use the machine named.
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
O.S.C.A.R. Cluster Installation. O.S.C.A.R O.S.C.A.R. Open Source Cluster Application Resource Latest Version: 2.2 ( March, 2003 )
Advanced Topics StratusLab Tutorial (Orsay, France) 28 November 2012.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Weekly Report By: Devin Trejo Week of June 14, 2015-> June 20, 2015.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Programming in Hadoop Guangda HU Huayang GUO
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Installing 9.6 BDE binaries on hadoop data nodes Snapshots captured from Cloudera sandbox.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Hadoop & Neptune Feb 김형준.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information.
Configuring Your First Hadoop Cluster On Amazon EC2 Benjamin Wootton
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Unit 2 Hadoop and big data
StratusLab Tutorial (Bordeaux, France)
Presented by: - Yogesh Kumar
Big Data Analytics: HW#3
Software Engineering Introduction to Apache Hadoop Map Reduce
The master node shows only one live data node when I am running multi node cluster in Big data.
Central Florida Business Intelligence User Group
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
How to monitor the $H!T out of Hadoop
The Basics of Apache Hadoop
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Introduction to Apache
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Lecture 16 (Intro to MapReduce and Hadoop)
Hadoop Installation Fully Distributed Mode
Basic Setup Internet Firewall Master 7 Nodes Gigabit switch
Presentation transcript:

Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson

Operating System and Network Configuration  The first thing we did was install Ubuntu 10.4 LTS on every machine.  After all of the nodes were up and running Ubuntu, we connected all of them to the switch

Basic Configuration - Java  After we got all of the machines connected to the switch, we had to install some of the packages we needed for Hadoop.  The Ubuntu installation did not come with Java, so we installed Java on each machine and then configured the PATH variable for each machine so they would be able to discover the Java binary.

Basic Configuration - Hadoop  After getting the Java Development Kit installed, we installed the Hadoop files and setup the PATH variable for HADOOP_HOME.  We then had to create a Hadoop user account and group on each node and change the ownership of the Hadoop files over to that new user.

Basic Configuration - SSH  After setting up the Hadoop accounts on each node, we had to setup the authorized_keys for the master node so it could shell into the Hadoop accounts on the other nodes.

File System Configuration  On each node, we had to configure the XML files that were used for the distributed file system configuration.  After setting up the DFS configuration, we had to format the namenode (master node).  Once all configuration was done, we started the distributed file system and tasktracker scripts and got the datanodes and jobtrackers running on all of the slaves.

Test Run  For our test run, we gave Hadoop seven different books to run against the word counting program provided with the installation.  The first time we ran the test, the cluster successfully mapped all of the work, but failed to reduce.  The problem ended up being caused by an error in the /etc/hosts configuration.

Test Run  When the node running the reducer went to look for the output of its own maps, it would reference its own IP address to communicate with the task tracker it was running.  What we did not realize was that the nodes were referencing themselves using an entry in /etc/hosts that was setup by the Ubuntu installation which pointed to (nodeName-desktop)  We changed the IP of this entry, on each node, to that specific node’s static IP address. This resolved the fetch failure issue we were having with the maps.

Test Run  Once the problem was resolved, our Hadoop cluster successfully counted the occurrence of each word in the input files.