Presentation is loading. Please wait.

Presentation is loading. Please wait.

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

Similar presentations

Presentation on theme: "Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes."— Presentation transcript:

1 Apache Hadoop Daniel Lust, Anthony Taliercio

2 What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes of data to complete a task Supports distributed applications under a free license Used by many popular companies Such as: Facebook, Twitter, Ebay, IBM, Apple, Microsoft, Hewlett-Packard, and many others…

3 Continued… Written in Java Scales well Can be used with thousands of nodes Can be used with just a few nodes and inexpensive hardware Your average Hadoop cluster will consist of two major parts A single master node and multiple working nodes. The master node is made up of four parts: the Job Tracker, Task Tracker, NameNode, and DataNode. A worker node, which is also known as a slave node, can either be a DataNode and TaskTracker or just one of the two.

4 Overview Of Hadoop - Hadoop uses whats called an HDFS Hadoop Distributed File System HDFS takes files and splits them across the network redundantly in a cluster The redundancy to eliminate possible data loss


6 MapReduce Software wrote by google to process massive amounts of unstructured data in a parallel process across a distributed cluster of processors

7 MapReduce. Offers a clean abstraction between data analysis tasks, organizing the jobs Issued by the HDFS, so no jobs are unnecessarily repeated. - If one of them fail, a node may point to a different node to complete the task

8 Running Hadoop First run of Hadoop on Master Computer Various processes are started including: TaskTracker JobTracker DataNode Secondary Node NameNode It also makes a connection through SSH to other SLAVE computers to start a DataNode and TaskTracker

9 Running Hadoop Used Hadoop to do a word count on six different books. HDFS copied the books to different clusters, and ran a pre-written program to do a word count on the books. Each node returned data, using the DataNode proccess to save its results. When a node failed, it will issue the job to another node

10 Example Output of Job Processes

11 Word count Output

12 Tested on 1-3 Nodes 1 NODE: JOB COMPLETION 00:01:45 2 NODES: JOB COMPLETION 00:01:28 3 NODE : JOB COMPLETION 00:01:00

13 Conclusion Our guide covered everything you need to get started with Apache Hadoop Although, there are many problems you can see along the way Troubleshooting was a large part of our project

Download ppt "Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes."

Similar presentations

Ads by Google