Download presentation
Presentation is loading. Please wait.
Published byAlicia Cook Modified over 9 years ago
1
Introduction to Apache Hadoop Zibo Wang
2
Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries for data-intensive computing using simple single map-reduce interface and its own distributed file system called HDFS. Started by Doug Cutting and Mike Cazfarella. Written in JAVA
3
Introduction The use of Hadoop Compute Storage Database The advantages of Hadoop Scalable Algorithms Log Management Extract-Transform-Load (ETL) Platform
4
Map-Reduce Introduced by Google A simple and powerful interface that enables automatic parallelization and distribution of large-scale computation. Two major functions Map Reduce Nodes and trackers
5
Map-Reduce
6
Hadoop Distributed File System (HDFS) It has large block size (default 64mb) for storage to compensate for seek time to network bandwidth. So very large files for storage are ideal. Streaming data access. Write once and read many times architecture. Since files are large time to read is significant parameter than seek to first record. Commodity hardware. It is designed to run on commodity hardware which may fail. HDFS is capable of handling it.
7
HDFS Architecture Filesystem Metadata Framework of write Framework of read
8
Prominent Users of Hadoop Yahoo! More than 10,000 core Linux cluster Open scource Facebook 30 PB data Amazon Amazon Elastic Compute Cloud Amazon Simple Storage Service
9
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.