DataNode 硬碟空間配置 軟體貨櫃主機
DataNode 硬碟空間配置 ( 一 ) $ ssh dna1 $ df -h Filesystem Size Used Avail Use% Mounted on rootfs 19G 5.6G 12G 32% / none 19G 5.6G 12G 32% / :: $ hdfs dfsadmin -report :: Name: :50010 (dna1) Hostname: dna1 Rack: /17/10 Decommission Status : Normal Configured Capacity: (18.32 GB) DFS Used: (60 KB) Non DFS Used: (6.49 GB) DFS Remaining: (11.83 GB) DFS Used%: 0.00% DFS Remaining%: 64.56% ::
DataNode 硬碟空間配置 ( 二 ) $ sudo nano /opt/conf/A/hdfs-site.xml :: dfs.datanode.du.reserved $ stophdfs a $ starthdfs a $ ssh nna hdfs dfsadmin -report :: Name: :50010 (dna1) Hostname: dna1 Rack: /17/10 Decommission Status : Normal Configured Capacity: (10.32 GB) DFS Used: (32 KB) Non DFS Used: (6.50 GB) DFS Remaining: (3.83 GB) DFS Used%: 0.00% DFS Remaining%: 37.08%
管理 YARN 運算資源 軟體貨櫃主機
管理 YARN 運算資源 $ sudo nano /opt/conf/A/yarn-site.xml yarn.nodemanager.resource.memory-mb 1024 yarn.nodemanager.resource.cpu-vcores 1 $ curl {"clusterMetrics":{"appsSubmitted":0,"appsCompleted":0, "appsPending":0,"appsRunning":0,"appsFailed":0,"appsKilled":0, "reservedMB":0,"availableMB":2048,"allocatedMB":0,"reservedVirtualCores":0, "availableVirtualCores":2,"allocatedVirtualCores":0,"containersAllocated":0, "containersReserved":0,"containersPending":0,"totalMB":2048,"totalVirtualCores":2, "totalNodes":2,"lostNodes":0,"unhealthyNodes":0,"decommissionedNodes":0, "rebootedNodes":0,"activeNodes":2}}
管理 YARN 運算資源 設定 MapReduce 程式記憶體需求 $ cat /opt/conf/A/mapred-site.xml yarn.app.mapreduce.am.resource.mb 512
軟體貨櫃主機 新增 Node Manager
新增 Node Manger $ sudo nano /opt/hosts-0.2 :: nma3 # node manager :: $ dkcreate a :: Warning: Permanently added 'nma3, ' (ECDSA) to the list of known hosts. nma3 created :: $ dkstart a.yarn dkstart a.yarn Rma Running Nma1 Running Nma2 Running Nma3 starting java version “1.7.0_79” Scala compiler version – Copyright , LAMP/EPFL $ startyarn a
新增 Node Manger $ ssh rma yarn node -list -all 15/08/19 00:29:26 INFO client.RMProxy: Connecting to ResourceManager at rma/ :8032 Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers nma1:55970 RUNNING nma1: nma2:44551 RUNNING nma2: nma3:35229 RUNNING nma3: $ curl {"clusterMetrics":{"appsSubmitted":0,"appsCompleted":0,"appsPending":0,"ap psRunning":0,"appsFailed":0,"appsKilled":0,"reservedMB":0,"availableMB":3 072,"allocatedMB":0,"reservedVirtualCores":0,"availableVirtualCores":3,"all ocatedVirtualCores":0,"containersAllocated":0,"containersReserved":0,"contain ersPending":0,"totalMB":3072,"totalVirtualCores":3,"totalNodes":3,"lostNodes" :0,"unhealthyNodes":0,"decommissionedNodes":0,"rebootedNodes":0,"activeN odes":3}}
設定 YARN 分散運算 Node Manager 白名單 軟體貨櫃主機
設定 YARN 分散運算 – Node Manager 白名單 $ sudo nano /opt/conf/A/yarn.allow nma1 nma2 $ sudo nano /opt/conf/A/yarn-site.xml :: yarn.resourcemanager.nodes.include-path /opt/conf/A/yarn.allow
啟用 Node Manager 白名單 $ stopyarn a $ startyarn a $ ssh rma yarn node -list -all 15/08/19 00:45:53 INFO client.RMProxy: Connecting to ResourceManager at rma/ :8032 Total Nodes:2 Node-Id Node-State Node-Http-Address Number-of-Running-Containers nma2:34726 RUNNING nma2: nma1:48948 RUNNING nma1:8042 0
修改 Node Manager 白名單 $ sudo nano /opt/conf/A/yarn.allow nma1 nma2 Nma3 $ ssh nma3 yarn-daemon.sh start nodemanager $ yarn rmadmin -refreshNodes $ yarn node -list -all 15/08/19 00:52:35 INFO client.RMProxy: Connecting to ResourceManager at rma/ : /08/19 00:52:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers nma3:37007 RUNNING nma3: nma2:34726 RUNNING nma2: nma1:48948 RUNNING nma1:8042 0
Pig 分析工具 軟體貨櫃主機
下載資料 $ ssh $ wget :44: Resolving tobala.net (tobala.net) Connecting to tobala.net (tobala.net)| |:80... connected. HTTP request sent, awaiting response OK :: :44:11 (67.6 KB/s) - ‘hypermarket.csv’ saved [15944/15944] $ hdfs dfs -put hypermarket.csv hypermarket.csv
啟動 pig $ pig grunt> clear; grunt> a = load '/data/hypermarket.csv' using PigStorage(','); :03:35,233 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS grunt> dump a; grunt> b = foreach a generate $0,$1,$2,$3,$4,$5; grunt> dump b; grunt> c = order b by $0 ASC; grunt> dump c; grunt> store d into 'customer.csv' using PigStorage(',');
執行 customer.pig $ nano customer.pig a = load 'hypermarket.csv' using PigStorage(','); b = foreach a generate $0,$1,$2,$3,$4,$5; c = order b by $0 ASC; d = filter c by $0 != ' 會員編號 '; store d into 'customer.csv' using PigStorage(',');$ pig -f customer.pig $ pig -f customer.pig
HDFS Balancer 軟體貨櫃主機 參考文章 1. Hadoop HDFS Balancer Explained
上載檔案至 HDFS $ ssh cla01 $ dd if=/dev/zero of=foo.bar bs=1M count= records in records out bytes (1.6 GB) copied, s, 8.8 MB/s $ hdfs dfs -put foo.bar /tmp
設定 HDFS Balancer 傳輸量 $ sudo nano /opt/A/hdfs-site.xml :: dfs.datanode.balance.bandwidthPerSec
啟動 HDFS Balancer ( 一 ) $ hdfs dfsadmin -report Configured Capacity: (5.17 GB) :: Name: :50010 (dna1) :: Configured Capacity: ( MB) DFS Used: ( MB) Non DFS Used: 0 (0 B) DFS Remaining: (35.84 MB) DFS Used%: 93.36% DFS Remaining%: 6.64% :: Name: :50010 (dna2) :: Configured Capacity: (2.32 GB) DFS Used: ( MB) Non DFS Used: 0 (0 B) DFS Remaining: (2.06 GB) DFS Used%: 11.31% DFS Remaining%: 88.69% :: * 上面 dna1 的 DFS Used% 值為 93.36%, 代表可儲存空間已快用完
啟動 HDFS Balancer ( 二 ) $ hdfs balancer -threshold 30 ("-threshold" : Percentage of disk capacity) :: 15/07/10 20:17:38 INFO balancer.Balancer: 1 over-utilized: [ :50010:DISK] 15/07/10 20:17:38 INFO balancer.Balancer: 0 underutilized: [] 15/07/10 20:17:38 INFO balancer.Balancer: Need to move MB to make the cluster balanced. 15/07/10 20:17:38 INFO balancer.Balancer: Decided to move MB bytes from :50010:DISK to :50010:DISK 15/07/10 20:17:38 INFO balancer.Balancer: Will move MB in this iteration 2015/7/10 下午 08:17: B MB MB :: 15/07/10 20:17:48 INFO balancer.Balancer: 1 over-utilized: [ :50010:DISK] 15/07/10 20:17:48 INFO balancer.Balancer: 0 underutilized: [] 15/07/10 20:17:48 INFO balancer.Balancer: Need to move MB to make the cluster balanced. 15/07/10 20:17:48 INFO balancer.Balancer: Decided to move MB bytes from :50010:DISK to :50010:DISK :: * 是 dna1 的 IP 位址
HDFS Federation 軟體貨櫃主機 參考文章 1. HDFS Federation Configuration
建立第二個 HDFS 分散檔案系統 建立第二個 HDFS 所有貨櫃主機 $ sudo nano /opt/hosts-0.2 $ dkcreate b :: clb1 created : dsb01 dsb02 nnb created : dsb01 dsb02 dnb1 created dnb2 created rmb created nmb1 created nmb2 created
啟動第二個 HDFS 所有貨櫃主機 $ dkstart b.hdfs nnb starting java version "1.7.0_79" Scala compiler version Copyright , LAMP/EPFL dnb1 starting java version "1.7.0_79" Scala compiler version Copyright , LAMP/EPFL dnb2 starting java version "1.7.0_79" Scala compiler version Copyright , LAMP/EPFL $ formathdfs b myring format (yes/no) yes nnb format ok nnb clean sn dnb1 clean dn dnb2 clean dn
設定第二個 HDFS 分散檔案系統 $ sudo nano /opt/conf/B/hdfs-site.xml :: fs.defaultFS hdfs://nnb:8020 fs.default.name hdfs://nnb:8020
設定第二個 HDFS 分散檔案系統 $ sudo nano /opt/conf/B/hdfs-site.xml :: dfs.nameservices hdfs1,hdfs2 dfs.namenode.rpc-address.hdfs1 nna:8020 dfs.namenode.rpc-address.hdfs2 nnb:8020
啟動 HDFS 分散檔案系統 $ starthdfs b starting namenode, logging to /tmp/hadoop-bigred-namenode-nnb.out starting secondarynamenode, logging to /tmp/hadoop-bigred- secondarynamenode-nnb.out starting datanode, logging to /tmp/hadoop-bigred-datanode-dnb1.out starting datanode, logging to /tmp/hadoop-bigred-datanode-dnb2.out $ ssh nnb $ hdfs dfsadmin -report :: Live datanodes (2): Name: :50010 (dnb1) Hostname: dnb1 :: Name: :50010 (dnb2) Hostname: dnb2 ::
HDFS Federation 軟體貨櫃主機
設定第一個 HDFS 分散檔案系統 $ sudo nano /opt/conf/A/hdfs-site.xml :: dfs.nameservices hdfs1,hdfs2 dfs.namenode.rpc-address.hdfs1 nna:8020 dfs.namenode.rpc-address.hdfs2 nnb:8020
重新格式化第一個 HDFS 關閉 NameNode $ hadoop-daemon.sh stop namenode stopping namenode $ hdfs namenode -format -clusterID myring :: 15/07/10 16:40:34 INFO namenode.NNConf: XAttrs enabled? true 15/07/10 16:40:34 INFO namenode.NNConf: Maximum size of an xattr: Re-format filesystem in Storage Directory /home/pi/nn ? (Y or N) y :: 15/07/10 16:41:34 INFO util.ExitUtil: Exiting with status 0 15/07/10 16:41:34 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at nna/ ************************************************************/
清除 DataNode 的資料目錄 $ ssh dna1 password: $ hadoop-daemon.sh stop datanode stopping datanode $ rm -r dn $ exit $ ssh dna2 password: $ hadoop-daemon.sh stop datanode stopping datanode $ rm -r dn $ exit $ ssh dna3 password: $ hadoop-daemon.sh stop datanode stopping datanode $ rm -r dn $ exit
啟動第一個 HDFS 分散檔案系統 $ hadoop-daemon.sh start namenode starting namenode, logging to /tmp/hadoop-pi-namenode-nna.out $ ssh dna1 password: $ hadoop-daemon.sh start datanode starting datanode, logging to /tmp/hadoop-pi-datanode-dna1.out $ exit $ ssh dna2 password: $ hadoop-daemon.sh start datanode starting datanode, logging to /tmp/hadoop-pi-datanode-dna2.out $ exit $ ssh dna3 password: $ hadoop-daemon.sh start datanode starting datanode, logging to /tmp/hadoop-pi-datanode-dna3.out $ exit
HDFS Federation 軟體貨櫃主機 參考文章 1. HDFS Federation Configuration
設定第一個 HDFS 的 Hadoop Client $ sudo nano /opt/conf/A/core-site.xml :: fs.default.name viewfs://myring/ fs.viewfs.mounttable.myring.link./hdfs1 hdfs://nna:8020 fs.viewfs.mounttable.myring.link./hdfs2 hdfs://nnb:8020
使用 HDFS 聯邦分散檔案系統 $ ssh cl01 $ hdfs dfs -ls / Found 2 items -r-xr-xr-x - bigred bigred :38 /hdfs1 -r-xr-xr-x - bigred bigred :38 /hdfs2 $ hdfs dfs -ls -d /hdfs1 drwxr-xr-x - bigred supergroup :00 /hdfs1 $ hdfs dfs -mkdir /hdfs1/abc $ hdfs dfs -mkdir /hdfs2/xyz mkdir: Permission denied: user=bigred, access=WRITE, inode="/":pi:supergroup:drwxr-xr-x
設定第二個 HDFS 的 Hadoop Client $ sudo nano /opt/pi/hadoop-2.6.0/etc/hadoop/core-site.xml :: fs.default.name viewfs://myring/ fs.viewfs.mounttable.myring.link./hdfs1 hdfs://nna:8020 fs.viewfs.mounttable.myring.link./hdfs2 hdfs://nnb:8020
使用 HDFS 聯邦分散檔案系統 $ ssh cla01 $ hdfs dfs -ls / Found 2 items -r-xr-xr-x - bigred bigred :38 /hdfs1 -r-xr-xr-x - bigred bigred :38 /hdfs2 $ hdfs dfs -ls /hdfs1 Found 1 items drwxr-xr-x - bigred supergroup :41 /hdfs1/abc $ hdfs dfs -mkdir /hdfs2/xyz