Cloud Computing project NSYSU Sec. 2 Demo
NSYSU EE IT_LAB2 Parse & Index Parse: 截出抓取文件內文字字元,並進行過濾、文 字處理。 Index: 將文字字元依順序排列並建立字元與文件關 係之連結。
NSYSU EE IT_LAB3 Flowchart Seed urls Run crawl command as a hadoop job Assign job’s fragments to each tasktracker; go fetch web’s data Store context to output dir. on HDFS Url DB Doc. data Fetch log HDFS ( ) Map & reduce Index(s) Parse Documents; Create index file ( ) Map & reduce
NSYSU EE IT_LAB4 Architecture Machine 01 Machine 02Machine 03 master (x.x.x.1) slave2 (x.x.x.3)slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker Datanode TaskTracker administer user Job
NSYSU EE IT_LAB5 Hadoop cluster – 1 node Machine 01 master (x.x.x.1) Namenode JobTracker Datanode TaskTracker
NSYSU EE IT_LAB6 Hadoop cluster – 2 nodes Machine 01 Machine 02 master (x.x.x.1) slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker
NSYSU EE IT_LAB7 Crawler input
NSYSU EE IT_LAB8 Crawler ouput Output of doc.
NSYSU EE IT_LAB9 Time compare 效果比較 ( 文字分析與索引檔建立 ): 單一主機兩台主機 所需時間 50 分 14 秒 24 分 26 秒 網頁資訊文件分析、索引建立時間比較
NSYSU EE IT_LAB10 Thanks for your attention!!