Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

Apache Hadoop Daniel Lust, Anthony Taliercio

What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes of data to complete a task Supports distributed applications under a free license Used by many popular companies Such as: Facebook, Twitter, Ebay, IBM, Apple, Microsoft, Hewlett-Packard, and many others…

Continued… Written in Java Scales well Can be used with thousands of nodes Can be used with just a few nodes and inexpensive hardware Your average Hadoop cluster will consist of two major parts A single master node and multiple working nodes. The master node is made up of four parts: the Job Tracker, Task Tracker, NameNode, and DataNode. A worker node, which is also known as a slave node, can either be a DataNode and TaskTracker or just one of the two.

Overview Of Hadoop - Hadoop uses whats called an HDFS Hadoop Distributed File System HDFS takes files and splits them across the network redundantly in a cluster The redundancy to eliminate possible data loss

MapReduce Software wrote by google to process massive amounts of unstructured data in a parallel process across a distributed cluster of processors

MapReduce. Offers a clean abstraction between data analysis tasks, organizing the jobs Issued by the HDFS, so no jobs are unnecessarily repeated. - If one of them fail, a node may point to a different node to complete the task

Running Hadoop First run of Hadoop on Master Computer Various processes are started including: TaskTracker JobTracker DataNode Secondary Node NameNode It also makes a connection through SSH to other SLAVE computers to start a DataNode and TaskTracker

Running Hadoop Used Hadoop to do a word count on six different books. HDFS copied the books to different clusters, and ran a pre-written program to do a word count on the books. Each node returned data, using the DataNode proccess to save its results. When a node failed, it will issue the job to another node

Example Output of Job Processes

Word count Output

Tested on 1-3 Nodes 1 NODE: JOB COMPLETION 00:01:45 2 NODES: JOB COMPLETION 00:01:28 3 NODE : JOB COMPLETION 00:01:00

Conclusion Our guide covered everything you need to get started with Apache Hadoop Although, there are many problems you can see along the way Troubleshooting was a large part of our project

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

Similar presentations

Presentation on theme: "Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.

Similar presentations

Presentation on theme: "Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes."— Presentation transcript:

Similar presentations

About project

Feedback