Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order.

Similar presentations


Presentation on theme: "HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order."— Presentation transcript:

1 HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth

2 HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order to allocate Blocks or find the Blocks for a given file Client then streams to/from datanodes to place blocks (writing) or read blocks (reading) Inherent replication, client writes to multiple blocks simultaneously, reads from closest replica

3 HDFS in another slide Credit: wiki.apache.org, original author Dhruba I think

4 The NameNode Maintains fsimage on disk and FSNameSystem in memory Exposes RPC methods for creating files, allocating add'l blocks to a file, deleting files, listing directories and locating the blocks to an existing file FSNameSystem maintains state in memory while flushing edits to log and secondary namenode

5 NameNode Architecture

6 Write Workflow Client registers new file with NameNode NameNode allocates new inode for file Client registers new blocks with namenode – Receives block locations for each block – Streams block through datanodes in chain Client closes file NameNode commits all blocks' final sizes to edit log

7 Write workflow

8 State after writing Credit: wiki.apache.org, original author Dhruba I think

9 0.20.2-append http://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20-append HDFS-200 – Makes sync() work properly – Better behavior if client closes file before all replicas check in with NN – no generation stamp musical chairs – Correct client behavior when reading from file that may be being appended to HDFS-101, HDFS-826 – Better behavior when DN dies in write pipeline – 101 was released with 0.20.2, 826 is in 0.20.2- append HDFS-278 – Explodes appropriately when NN hangs during write

10 Trouble! NameNode SPOF – see HADOOP-4539 NameNode OOM – 200 bytes per block adds up Replication storm scenario (can happen from switch failure or loss of node in almost-full cluster) Unbalanced disks on datanodes can be an issue Most datanode issues during MapReduce will fix themselves through task failure/relaunch.

11 Reads in a slide Locates DataNode, opens socket, says hi DataNode allocates a thread to stream block contents from opened position Client reads in as it processes Great for streaming entire blocks Positioned reads cut across the grain

12 DFS Client : seek + read

13 DFS DataNode : read

14 DFS read inefficiencies Client has to open new Socket connection for each read – bad for random reads Server opens/closes 2 files (block, meta) and a Selector per read, as well as allocating a new buffer More IO ops than necessary – Selector per thread, 64kb max packet size No optimizations for repeated reads of the same file Lots of files open – 4 per concurrent read ( client socket, 2 file handles, selector)

15 HDFS-918 Attempts to address some inefficiencies Instead of thread-per-read, all reads are multiplexed and dispatched to a threadpool when the client is writable File handles are pooled – since we're read-only, they can be shared across threads One selector for the whole VM Buffers are thread-local and re-used Attempts to transfer 512kb per select()

16 HDFS-918

17 HDFS-918 status Showed approximately 1-1.5ms gain for random read, 5% gain for streaming – Not particularly robust benchmarks, need confirmation from elsewhere Hbase exposed a resource leak, think I fixed it but haven't tested extensively BlockChannelPool was point of contention until re-implemented using read/write lock Larger transferTos are effective Patches available for 0.20.2 and trunk as of 1 month ago

18 Thanks! Plug – www.proclivitysystems.com – We analyze consumer behavior and optimize your marketingwww.proclivitysystems.com Recently ported core engines from PL/SQL to Hadoop. Reduced runtime by factor of 20.


Download ppt "HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order."

Similar presentations


Ads by Google