Download presentation
Presentation is loading. Please wait.
Published byCandice Newman Modified over 8 years ago
1
HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth
2
HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order to allocate Blocks or find the Blocks for a given file Client then streams to/from datanodes to place blocks (writing) or read blocks (reading) Inherent replication, client writes to multiple blocks simultaneously, reads from closest replica
3
HDFS in another slide Credit: wiki.apache.org, original author Dhruba I think
4
The NameNode Maintains fsimage on disk and FSNameSystem in memory Exposes RPC methods for creating files, allocating add'l blocks to a file, deleting files, listing directories and locating the blocks to an existing file FSNameSystem maintains state in memory while flushing edits to log and secondary namenode
5
NameNode Architecture
6
Write Workflow Client registers new file with NameNode NameNode allocates new inode for file Client registers new blocks with namenode – Receives block locations for each block – Streams block through datanodes in chain Client closes file NameNode commits all blocks' final sizes to edit log
7
Write workflow
8
State after writing Credit: wiki.apache.org, original author Dhruba I think
9
0.20.2-append http://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20-append HDFS-200 – Makes sync() work properly – Better behavior if client closes file before all replicas check in with NN – no generation stamp musical chairs – Correct client behavior when reading from file that may be being appended to HDFS-101, HDFS-826 – Better behavior when DN dies in write pipeline – 101 was released with 0.20.2, 826 is in 0.20.2- append HDFS-278 – Explodes appropriately when NN hangs during write
10
Trouble! NameNode SPOF – see HADOOP-4539 NameNode OOM – 200 bytes per block adds up Replication storm scenario (can happen from switch failure or loss of node in almost-full cluster) Unbalanced disks on datanodes can be an issue Most datanode issues during MapReduce will fix themselves through task failure/relaunch.
11
Reads in a slide Locates DataNode, opens socket, says hi DataNode allocates a thread to stream block contents from opened position Client reads in as it processes Great for streaming entire blocks Positioned reads cut across the grain
12
DFS Client : seek + read
13
DFS DataNode : read
14
DFS read inefficiencies Client has to open new Socket connection for each read – bad for random reads Server opens/closes 2 files (block, meta) and a Selector per read, as well as allocating a new buffer More IO ops than necessary – Selector per thread, 64kb max packet size No optimizations for repeated reads of the same file Lots of files open – 4 per concurrent read ( client socket, 2 file handles, selector)
15
HDFS-918 Attempts to address some inefficiencies Instead of thread-per-read, all reads are multiplexed and dispatched to a threadpool when the client is writable File handles are pooled – since we're read-only, they can be shared across threads One selector for the whole VM Buffers are thread-local and re-used Attempts to transfer 512kb per select()
16
HDFS-918
17
HDFS-918 status Showed approximately 1-1.5ms gain for random read, 5% gain for streaming – Not particularly robust benchmarks, need confirmation from elsewhere Hbase exposed a resource leak, think I fixed it but haven't tested extensively BlockChannelPool was point of contention until re-implemented using read/write lock Larger transferTos are effective Patches available for 0.20.2 and trunk as of 1 month ago
18
Thanks! Plug – www.proclivitysystems.com – We analyze consumer behavior and optimize your marketingwww.proclivitysystems.com Recently ported core engines from PL/SQL to Hadoop. Reduced runtime by factor of 20.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.