Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud Computing project NSYSU Sec. 2 Demo. NSYSU EE IT_LAB2 Parse & Index  Parse:  截出抓取文件內文字字元,並進行過濾、文 字處理。  Index:  將文字字元依順序排列並建立字元與文件關 係之連結。

Similar presentations


Presentation on theme: "Cloud Computing project NSYSU Sec. 2 Demo. NSYSU EE IT_LAB2 Parse & Index  Parse:  截出抓取文件內文字字元,並進行過濾、文 字處理。  Index:  將文字字元依順序排列並建立字元與文件關 係之連結。"— Presentation transcript:

1 Cloud Computing project NSYSU Sec. 2 Demo

2 NSYSU EE IT_LAB2 Parse & Index  Parse:  截出抓取文件內文字字元,並進行過濾、文 字處理。  Index:  將文字字元依順序排列並建立字元與文件關 係之連結。

3 NSYSU EE IT_LAB3 Flowchart Seed urls Run crawl command as a hadoop job Assign job’s fragments to each tasktracker; go fetch web’s data Store context to output dir. on HDFS Url DB Doc. data Fetch log HDFS ( ) Map & reduce Index(s) Parse Documents; Create index file ( ) Map & reduce

4 NSYSU EE IT_LAB4 Architecture Machine 01 Machine 02Machine 03 master (x.x.x.1) slave2 (x.x.x.3)slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker Datanode TaskTracker administer http://x.x.x.1:50070 http://x.x.x.1:50030 user Job

5 NSYSU EE IT_LAB5 Hadoop cluster – 1 node Machine 01 master (x.x.x.1) Namenode JobTracker Datanode TaskTracker

6 NSYSU EE IT_LAB6 Hadoop cluster – 2 nodes Machine 01 Machine 02 master (x.x.x.1) slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker

7 NSYSU EE IT_LAB7 Crawler input

8 NSYSU EE IT_LAB8 Crawler ouput  Output of doc.

9 NSYSU EE IT_LAB9 Time compare  效果比較 ( 文字分析與索引檔建立 ): 單一主機兩台主機 所需時間 50 分 14 秒 24 分 26 秒 網頁資訊文件分析、索引建立時間比較

10 NSYSU EE IT_LAB10 Thanks for your attention!!


Download ppt "Cloud Computing project NSYSU Sec. 2 Demo. NSYSU EE IT_LAB2 Parse & Index  Parse:  截出抓取文件內文字字元,並進行過濾、文 字處理。  Index:  將文字字元依順序排列並建立字元與文件關 係之連結。"

Similar presentations


Ads by Google