Download presentation
Presentation is loading. Please wait.
Published byKristin Brister Modified over 9 years ago
1
Hadoop Namenode High Availability August 2008 Requirements and Procedures
2
2 Requirements Two nodes to satisfy availability requirements. High availability for internal components of each node. Disk redundancy Network redundancy Redundant network architecture. Heartbeat mechanism between the two nodes. Replication of namenode metadata. Automatic fail over with no human action required.
3
Internal Components Disks o 2x 300 GB 15k RPM SAS. o Hardware RAID 1 mirroring. o SMART monitoring. Network o Dual 1Gbps on-board NICs. o Linux bonding with LACP.
4
4 Redundant Network Architecture Linux bonding o See bonding.txt from Linux kernel docs. o LACP, aka 802.3ad, aka mode=4. (http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol) o Must be supported by your switches. o Throughput advantage Observed at 1.76Gb/s o Allows for failure of either NIC instead of a single heartbeat connection via crossover. Switching infrastructure and physical segregation. o See diagram
5
5 Network Diagram
6
6 Heartbeat Between Nodes Provided by "heartbeat" package. (http://www.linux-ha.org/) Manage multiple resources: Virtual IP address DRBD Disk Hadoop processes /etc/ha.d/haresources example: cw-grid101.contextweb.prod IPaddr::10.10.5.59 cw-grid101.contextweb.prod drbddisk::r0 cw-grid101.contextweb.prod Filesystem::/dev/drbd0::/hadoop::ext3::defaults cw-grid101.contextweb.prod hadoop Heartbeat uses bond0 network interface. (* Not approved) 3 second timeout for "deadtime". Created LSB compliant hadoop init script.
7
7 Replication of Namenode Metadata DRBD Replication. (http://www.drbd.org/)
8
/etc/drbd.conf example: global { usage-count no; } resource r0 { protocol C; syncer { rate 110M; } # approximately 50% of total available startup { wfc-timeout 0; degr-wfc-timeout 120; } on cw-grid101.contextweb.prod { device /dev/drbd0; disk /dev/sda4; address 10.10.5.60:7788; meta-disk internal; } on cw-grid102.contextweb.prod { device /dev/drbd0; disk /dev/sda4; address 10.10.5.61:7788; meta-disk internal; } }
9
Fail Over Order of Events Virtual IP fails over. DRBD system switches primary node. (/proc/drbd status) File system fsck and mount at /hadoop. Hadoop started via LSB compliant init script. End to end fail over time approximately 15 seconds. Optionally, original master is rebooted to help avoid split-brain.
10
DRBD Status Updating # cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008- 06-02 10:04:55 0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r--- ns:18440304 nr:0 dw:27072452 dr:18511901 al:11746 bm:12767 lo:14 pe:12 ua:246 ap:1 oos:84438604 [==>.................] sync'ed: 18.0% (82459/100465)M finish: 0:14:31 speed: 96,904 (77,472) K/sec Synchronized # cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-x8664-build, 2008- 06-02 10:04:55 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:102901512 nr:0 dw:27140024 dr:102898169 al:11781 bm:17923 lo:0 pe:0 ua:0 ap:0 oos:0 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.