HDFS Hadoop Distributed File System

Name: HDFS Hadoop Distributed File System
Uploaded: 2017-08-21T10:43:36+00:00
Duration: PTM5S49
Channel: Leonard Owens
Description: HDFS Hadoop Distributed File System

HDFS Hadoop Distributed File System
柯懷貿王建鑫彭偉慶

Outline Introduction HDFS – How it works Pros and Cons Conclusion 柯懷貿

Introduction to HDFS Hadoop Distributed File System Cloud Computing
JAVA Processing PB-Level Data Distributed Computing Environment Hadoop MapReduce HDFS HBase Allow files shared via internet Write-once-read-many Restricting access Replication & Fault tolerance Mapping between logical objects & physical objects Dung Cutting established Nutch Project File System for Hadoop framework Remote Procedure Call Master/Slave Yahoo! has accomplished 10,000-core Hadoop cluster in 2008 柯懷貿

MapReduce 柯懷貿

HBase NoSQL Using several servers to store PB-level data 柯懷貿

HDFS Distributed, scalable, and portable File replication(default : 3)
Reading efficacy 柯懷貿

王建鑫

HDFS major roles Client(user) – read/write data from/to file system
Name node(masters) – oversee and coordinate the data storage function, receive instructions from Client Data node(slaves) – store data and run computations, receive instructions from Namenode 王建鑫

王建鑫

Rack Awareness 王建鑫

王建鑫

HDFS fault tolerance Node failure – data node or nam enode is dead
Communication failure – cannot send and retrieve data Data corruption – data corrupted while sending over network or corrupted in the hard disks Write failure – the data node which is ready to be written is dead Read failure - the data node which is ready to be read is dead 王建鑫

王建鑫

Detect the Network failure
Whenever data is sent, an ACK is replied by the receiver If the ACK is not received(after several retries), the sender assumes that the host is dead, or the network has failed Also Checksum is sent along with transmitted data→can detect corrupt data when transferring 王建鑫

Handling the write/read failure
Client write the block in smaller data units(usually 64KB) called packet Each data node replies back an ACK for each packet to confirm that they got the packet If client don’t get the ACKs from some nodes, dead node detected Client then adjust the pipeline to skip that node(then?) Handling the read failure：just read another node 王建鑫

Handling the write failure cont’d
Name node contains two tables: List of blocks – blockA in dn1, dn2,dn8； blockB in dn3, dn7, dn9… List of Data nodes – dn1 has blockA, blockD； dn2 has blockE, blockG… Name node check list of blocks to see if a block is not properly replicated If so, ask other data nodes to copy block from data nodes that have the replication. 王建鑫

Pros Very large files Streaming data access Commodity hardware
A file size overs xxxMB, GB, TB, PB .….. Streaming data access Write-once, read-many. Efficient on reading whole dataset. Commodity hardware High reliability and availability. Doesn’t require expensive, highly reliable hardware. 彭偉慶

Cons 彭偉慶

Conclusion HDFS - an Apache Hadoop subproject.
Highly fault-tolerant and is designed to be deployed on low-cost hardware. High throughput but not low latency. 彭偉慶

HDFS Hadoop Distributed File System

Similar presentations

Presentation on theme: "HDFS Hadoop Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HDFS Hadoop Distributed File System

Similar presentations

Presentation on theme: "HDFS Hadoop Distributed File System"— Presentation transcript:

Similar presentations

About project

Feedback