Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Introduction to Apache Hadoop Zibo Wang

Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries for data-intensive computing using simple single map-reduce interface and its own distributed file system called HDFS.  Started by Doug Cutting and Mike Cazfarella.  Written in JAVA

Introduction  The use of Hadoop  Compute  Storage  Database  The advantages of Hadoop  Scalable Algorithms  Log Management  Extract-Transform-Load (ETL) Platform

Map-Reduce  Introduced by Google  A simple and powerful interface that enables automatic parallelization and distribution of large-scale computation.  Two major functions  Map  Reduce  Nodes and trackers

Map-Reduce

Hadoop Distributed File System (HDFS)  It has large block size (default 64mb) for storage to compensate for seek time to network bandwidth. So very large files for storage are ideal.  Streaming data access. Write once and read many times architecture. Since files are large time to read is significant parameter than seek to first record.  Commodity hardware. It is designed to run on commodity hardware which may fail. HDFS is capable of handling it.

HDFS Architecture  Filesystem Metadata  Framework of write  Framework of read

Prominent Users of Hadoop  Yahoo!  More than 10,000 core Linux cluster  Open scource  Facebook  30 PB data  Amazon  Amazon Elastic Compute Cloud  Amazon Simple Storage Service

Thank you!

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Similar presentations

Presentation on theme: "Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Similar presentations

Presentation on theme: "Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries."— Presentation transcript:

Similar presentations

About project

Feedback