15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh.

15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh

Annoucements Go Vote! Interpretive Dances happen only after Lecture Office Hour Change  Mon: 6:30-9:30  Tues: 6-7:30 Exams are graded

Hadoop Core at 30,000 ft

Back to the Map Reduce Model Recall that –map (in_key, in_value) -> (inter_key, inter_value) list combine (inter_key, inter_value) → (inter_key, inter_value) combine (inter_key, inter_value) → (inter_key, inter_value) –reduce (inter_key, inter_value list) -> (out_key, out_vlaue)‏ What resource are we most constrained by?  “Oceans of Data, Skinny pipes” How many types of data will the file system care about? How long will we need each kind? What is the common case for each?

What would a MR Filesytem need? General Use case: large files  Mostly append to end, long sequential reads, few deletes  Appends might be concurrent Scability  Adding (or losing) machines should be relatively painless Nodes work on nearby data  Minimize moving data between machines Bandwidth is our limiting resource Remember how much data Failure (handling)is Common  Yea, yea we know, we took 213, we know hardware sucks No, really failure (handling) is common (constant)‏  Disks, processors,whole nodes, racks, and datacenters

Addressing Those Concerns Sequential Reads, appends need to be fast  Deletes can be painful “Hot plug” machines  Add or lose machines while system is running jobs  System should auto detect the change HDFS should distribute data somewhat evenly  So that all workers have a reasonable amount of data to chew on  And coordinating with the Jobtracker (job master)‏ Data Replication  Should be spread out. Why?  What type of problems could arise?

Moving into the Details Nodes in HDFS  NameNode (master) ( like GFS Master)‏  DataNodes (slaves) ( like GFS chunkservers)‏ NB – Hadoop and HDFS closely paired  “careful use of jargon defines the true expert”  “worker node A” and “data node 1” are frequently the same machine Two types of Masters  Jobtracker (Hadoop Job Master)‏  NameNode (file system Master)‏ What I mean by 'master' for the rest of the lecture

Your Data goes in.... Files are divided into Chunks  64 MB The mapping between filename and chunks goes to the Master Each chunk is replicated and sent off to DataNodes  By default, 3  The master determines which dataNodes

What the Clients Do Where the data starts On file creation creates a seperate file w/checksum When data fetched back from a dataNode, checksum computed again Cache file data  Avoid bothering the Master too often When a Client has 1 chunk's worth of data  Contacts the Master,  Master sends name of dataNodes to send it to  ONLY sends it to the 1 st

What the DataNodes Do Heartbeat to the Master Opens, closes, or replicates a chunk if requested from Master During replication, sends data to next dataNode in chain

What the Namespace Node Does System metadata!  Holds Name->ID mapping  Chunk replicas locations  Transcation Logs EditLog FSImage It is responsible for coherency  Uses the logs atomically  Addresses the conccurent writes issue It is checkpointed  Similar to AFS volume snapshots  Will pull last consistent log upon restart

What the Namespace Node Does Listens for Heartbeats Listens for Client Requests If no heartbeat  marks a node as dead  Its data is deregistered It selects dataNodes  Which nodes get which chunks  Signals creating, opening, closing Deletes  Orders move to /trash  Starts delete timer

All together Now!

Additional Resources Hadoop wiki Youtube → “Hadoop” → Google developer videos (1-3 will be helpful)‏ Google University  Includes UW course, the other UW course, a couple others  Use are your own risk “The Google File System” paper is rather readable as research papers go

15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh.

Similar presentations

Presentation on theme: "15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh.

Similar presentations

Presentation on theme: "15-440, Hadoop Distributed File System Allison Naaktgeboren  Wut u mean? I iz loadin a HA-doop fileh  Ur doin' it rong kitteh."— Presentation transcript:

Similar presentations

About project

Feedback