Download presentation
Presentation is loading. Please wait.
Published bySabastian Rickman Modified over 10 years ago
1
Dan Bassett, Jonathan Canfield December 13, 2011
2
What is Hadoop? Allows for the distributed processing of large data sets across clusters of computers Open-source project written in Java Actively supported Inspired by a project that Google started 2
3
What’s the big deal? Changes the economics and dynamics of large scale computing Scalable Cost effective Flexible Fault Tolerant 3
4
Commercially supported InfoSphere BigInsights Silicon Graphics CloudRack EMC Greenplum Google App Engine Oracle Big Data Appliance Cloudera CDH, Professional Services Microsoft Windows Server, SQL Server 4
5
Who Uses Hadoop? 5
6
Prominent Users Facebook - claims to have the largest Hadoop cluster in the world at 30PB. Yahoo! - claims to have the world’s largest Hadoop production application. eBay – 5.3PB, 532 nodes cluster New York Times – processed 4TB of image data into 11 million PDFs at cost of ~ $240 6
7
H OW D OES I T W ORK ? 7
8
Architecture Hadoop Common Hadoop Distributed File System (HDFS) MapReduce Engine 8
9
File System (HDFS) One big file system from many nodes Fault-tolerant Runs on low-cost commodity hardware 9
10
MapReduce Engine Splits input data Assigns work to nodes Processed in parallel 10
11
MapReduce Illustration 11
12
MapReduce Step 1 12
13
MapReduce Step 2 13
14
MapReduce Step 3 14
15
MapReduce Step 4 15
16
MapReduce Step 4 16
17
MapReduce Step 5 17
18
MapReduce Step 5 18
19
MapReduce Step 6 19
20
MapReduce Illustration 20
21
Resources Project Home http://hadoop.apache.org/ Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop IBM http://www-01.ibm.com/software/data/infosphere/hadoop/ 21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.