Download presentation
Presentation is loading. Please wait.
Published bySusan Stafford Modified over 8 years ago
1
1 用企鵝龍打造多人雲端實驗叢集 Building Multiuser Hadoop Testbed with DRBL 用企鵝龍打造多人雲端實驗叢集 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw 1 /44
2
Source:http://www.funnyjunksite.com/wp- content/uploads/2007/08/programmer.jpghttp://www.funnyjunksite.com/wp- content/uploads/2007/08/programmer.jpg Source:http://www.funnyjunksite.com/wp- content/uploads/2007/08/programmer.jpghttp://www.funnyjunksite.com/wp- content/uploads/2007/08/programmer.jpg Source: http://www.sysadminday.com/images/people/136-3697.JPG Source: http://www.sysadminday.com/images/people/136-3697.JPG Programmer v.s. System Admin. 2 /44
3
AgendaAgenda What is Cluster Computing ? How to deploy PC cluster ? What is DRBL and Clonezilla ? Can DRBL help to deploy Hadoop ? Live Demo of DRBL Live and Clonezilla Live PART 3 : PART 1 : PART 2 : 3 /44
4
PC Cluster 101 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 1 : 4 /44
5
At First, We have “ 4 + 1 ” PC Cluster It'd better be 2 n It'd better be 2 n Manage Schedule r Manage 5 /44
6
GiE Switch WANWAN Then, We connect 5 PCs with Gigabit Ethernet Switch Then, We connect 5 PCs with Gigabit Ethernet Switch 10/100/1000MBps10/100/1000MBps Add 1 NIC for WAN Add 1 NIC for WAN 6 /44
7
LAN Switch WANWAN 4 Compute Nodes will communicate via LAN Switch. Only Manage Node have Internet Access for Security! Compute Nodes Manage Node 7 /44
8
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICH BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Compute Nodes BasicSystemSetupforClusterBasicSystemSetupforCluster 8 /44
9
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICHOpenPBSOpenPBS BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Job Mgnt. NFSNFS File Sharing Ex tra On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node 9 /44
10
Research topics about PC Cluster Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences http://www.gridbus.org/papers/CC-Edu.pdf Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences http://www.gridbus.org/papers/CC-Edu.pdfClusterComputingClusterComputing SystemArchitectureSystemArchitecture ParallelComputingParallelComputing ParallelAlgorithmsAndApplicationsParallelAlgorithmsAndApplications ProcessArchitectureProcessArchitecture NetworkArchitectureNetworkArchitecture StorageArchitectureStorageArchitecture System-levelMiddlewareSystem-levelMiddleware Share Memory Programming Programming Distributed Memory Programming Programming Application-level Middleware Programming Application-level 10 /44
11
Challenges of Cluster Computing ● Hardware – Ethernet Speed / PC Density – Power / Cooling / Heat – Network and Storage Architecture ● Software – Job Scheduler ( Cluster level ) – Account Management – File Sharing / Package Management ● Limitation – Shared Memory – Global Memory Management 11 /44
12
Common Method to deploy Cluster 1. Setup one Templatemachine Templatemachine 2. Cloning tomultiplemachine tomultiplemachine 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark 12 /44
13
Challenges of Common Method Upgrade Software ? Add New User Account ? Configuration Syncronization How to share user data ? 13 /44
14
How to deploy 4000+ Nodes ???? 14 /44
15
Advanced Methods to deploy Cluster ● SSI ( Single System Image ) – Multiple PCs as Single Computing Resources – Image-based ● homogeneous ● ex. SystemImager, OSCAR, Kadeploy – Package-based ● heterogeneous ● easy update and modify packages ● ex. FAI, DRBL ● Other deploy tools – Rocks : RPM only – cfengine : configuration engine 15 /44
16
Comparison of Cluster Deploy Tools 16 /44
17
Hadoop Deployment Tool Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 2-1 : 17 /44
18
Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf 18 /44
19
Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf 19 /44
20
Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf 20 /44
21
Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf 21 /44
22
企鵝龍與再生龍 DRBL and Clonezilla 企鵝龍與再生龍 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 2-2 : 22 /44
23
● D iskless R emote B oot in L inux ● Network is cheap, Man Hour is expansive. ● In short, DRBL is.... – Use network cable to replace SATA cable – All student PCs are connected to one single server + + + + = = Server Diskles s PC Diskles s PC source: http://www.mren.com.twhttp://www.mren.com.tw source: http://www.mren.com.twhttp://www.mren.com.tw Diskful l PC Diskful l PC 何謂企鵝龍 What is DRBL ?? 23 /44
24
何謂再生龍 What is Clonezilla ?? ● Clone ( 複製 ) + zilla = Clonezilla ( 再生龍 ) ● Open Source Alternative to Norton Ghost ● Support Windows, Linux and Mac Disk to Disk Image to N Disks DisktoImageDisktoImage 24 /44
25
企鵝龍的開機原理 How does DRBL work ? 企鵝龍的開機原理 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 2-3 : 25 /44
26
1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian,... 1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian,... Linux Kernel Kernel Module GNU Libc Boot Loader 26 /44
27
2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... 2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... DHCP D TFTP D NFSNFS BashBashPerlPerl Network Booting YPYPNISNIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! DRBL Server based on existing Open Source and keep Hacking! SSH D Linux Kernel Kernel Module GNU Libc Boot Loader 27 /44
28
pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTP D NFSNFSYPYPNISNIS SSH D 28 /44
29
BIOS PXE 3nd, We enable PXE function in BIOS configuration. 3nd, We enable PXE function in BIOS configuration. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTP D NFSNFSYPYPNISNIS SSH D 29 /44
30
BIOS PXE While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTP D NFSNFSYPYPNISNIS SSH D DHCP D 30 /44
31
IP 1 IP 2 IP 3 IP 4 While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTP D NFSNFSYPYPNISNIS SSH D DHCP D 31 /44
32
IP 1 IP 2 IP 3 IP 4 After PXE get its IP address, it will download booting files from TFTPD. Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D 32 /44
33
IP 1 IP 2 IP 3 IP 4 Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd 33 /44
34
Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. NFSNFS 34 /44
35
Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D Config. Files Ex. hostname Config. Files Ex. hostname NFSNFS Config. 1 Config. 2 Config. 3 Config. 4 35 /44
36
DRBL Server YPYPNISNIS DHCP D TFTP D NFSNFS BashBashPerlPerl SSH D BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD Applications and Services will also deployed to each Compute Node via NFS.... Applications and Services will also deployed to each Compute Node via NFS.... 36 /44
37
DRBL Server DHCP D TFTP D With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFSNFS SSH D YPYPNISNIS SSHDSSHDSSHDSSHDSSHDSSHDSSHDSSHD SSH Client 37 /44
38
Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 2 -1: 當企鵝龍遇上小飛象 When DRBL meet Hadoop 當企鵝龍遇上小飛象 38 /44
39
Deploy Hadoop Using DRBL ● Under development. Need packaging. ● drbl-hadoop – mounting local hard disk for HDFS svn co http://trac.nchc.org.tw/pub/grid/drbl-hadoop ● hadoop-register – Website and ssh applet svn co http://trac.nchc.org.tw/pub/cloud/hadoop-register 39 /44
40
About hadoop.nchc.org.tw ● DRBL Server x 1 (hadoop) with more space for /home and /tftpboot ● DRBL Client x 19 (hadoop101~hadoop119) ● Using Cloudera Hadoop Debian Packages ● Use drbl-hadoop and cloudera's init.d script to deploy hadoop ● Use hadoop-register to host web service and ssh applet 40 /44
41
Lesson Learn ● Cloudera Hadoop Package use init.d script to start/stop... – name node, data node, job tracker, task tracker ● Creat 500 users in advanced : – Use DRBL build-in command /opt/drbl/sbin/drbl-useradd ● Setup default HDFS home directory – Use for loop to run “hadoop fs -mkdir tmp“ for each user ● Setup permission of user HDFS folders – Use for loop to run “hadoop dfs -chown $(id) /usr/$(id)“ ● HDFS use the space of /var/lib/hadoop/cache/hadoop/dfs ● MapReduce use the space of /var/lib/hadoop/cache/hadoop/mapred 41 /44
42
結論 Conclusion ● Thanks to Cloudera to provide Hadoop related packages. ● Benefits – DRBL save your time and money. It make your life easier. – It's developed by Taiwan developers. Easy to communicate. – Using Network Booting could save power consumption.... ● Weakness – DRBL-Hadoop currently is only good for building experimental Hadoop Cluster. – If you are looking for operational product, maybe you can try SmartFrog. 42 /44
43
Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw PART 2 -2: Live Demo 43 /44
44
Wireless DHCP Server VMNet 8 DHCP Server Physical Network Device VMNet 1 Host-Only 140.110.88.* 172.21.253.* 192.168.200.* Running DRBL Live CD eth0 = NAT eth1 = Host-only Diskless Clients Demo Network Topology
45
Questions?Questions? Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw 45 /44
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.