HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order.

HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth

HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order to allocate Blocks or find the Blocks for a given file Client then streams to/from datanodes to place blocks (writing) or read blocks (reading) Inherent replication, client writes to multiple blocks simultaneously, reads from closest replica

HDFS in another slide Credit: wiki.apache.org, original author Dhruba I think

The NameNode Maintains fsimage on disk and FSNameSystem in memory Exposes RPC methods for creating files, allocating add'l blocks to a file, deleting files, listing directories and locating the blocks to an existing file FSNameSystem maintains state in memory while flushing edits to log and secondary namenode

NameNode Architecture

Write Workflow Client registers new file with NameNode NameNode allocates new inode for file Client registers new blocks with namenode – Receives block locations for each block – Streams block through datanodes in chain Client closes file NameNode commits all blocks' final sizes to edit log

Write workflow

State after writing Credit: wiki.apache.org, original author Dhruba I think

0.20.2-append http://svn.apache.org/repos/asf/hadoop/common/branches/ branch-0.20-append HDFS-200 – Makes sync() work properly – Better behavior if client closes file before all replicas check in with NN – no generation stamp musical chairs – Correct client behavior when reading from file that may be being appended to HDFS-101, HDFS-826 – Better behavior when DN dies in write pipeline – 101 was released with 0.20.2, 826 is in 0.20.2- append HDFS-278 – Explodes appropriately when NN hangs during write

Trouble! NameNode SPOF – see HADOOP-4539 NameNode OOM – 200 bytes per block adds up Replication storm scenario (can happen from switch failure or loss of node in almost-full cluster) Unbalanced disks on datanodes can be an issue Most datanode issues during MapReduce will fix themselves through task failure/relaunch.

Reads in a slide Locates DataNode, opens socket, says hi DataNode allocates a thread to stream block contents from opened position Client reads in as it processes Great for streaming entire blocks Positioned reads cut across the grain

DFS Client : seek + read

DFS DataNode : read

DFS read inefficiencies Client has to open new Socket connection for each read – bad for random reads Server opens/closes 2 files (block, meta) and a Selector per read, as well as allocating a new buffer More IO ops than necessary – Selector per thread, 64kb max packet size No optimizations for repeated reads of the same file Lots of files open – 4 per concurrent read ( client socket, 2 file handles, selector)

HDFS-918 Attempts to address some inefficiencies Instead of thread-per-read, all reads are multiplexed and dispatched to a threadpool when the client is writable File handles are pooled – since we're read-only, they can be shared across threads One selector for the whole VM Buffers are thread-local and re-used Attempts to transfer 512kb per select()

HDFS-918

HDFS-918 status Showed approximately 1-1.5ms gain for random read, 5% gain for streaming – Not particularly robust benchmarks, need confirmation from elsewhere Hbase exposed a resource leak, think I fixed it but haven't tested extensively BlockChannelPool was point of contention until re-implemented using read/write lock Larger transferTos are effective Patches available for 0.20.2 and trunk as of 1 month ago

Thanks! Plug – www.proclivitysystems.com – We analyze consumer behavior and optimize your marketingwww.proclivitysystems.com Recently ported core engines from PL/SQL to Hadoop. Reduced runtime by factor of 20.

HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order.

Similar presentations

Presentation on theme: "HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order.

Similar presentations

Presentation on theme: "HDFS Deep Dive Berlin Buzzwords 2010 Jay Booth. HDFS in a slide One NameNode, N datanodes Files are split into Blocks Client talks to namenode in order."— Presentation transcript:

Similar presentations

About project

Feedback