Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali

Similar presentations


Presentation on theme: "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali"— Presentation transcript:

1 A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali m_jamali@ce.sharif.edu

2 Introduction The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate

3 Crawler Architecture

4 Crawler Architecture (cont)

5 Content Similarity Measure

6 Evaluations 1. Precision 2. Recall We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We compared the both results with usual BFS crawler We compared the both results with usual BFS crawler

7 Experimental Results

8 Experimental Results (cont)

9

10 TCP: Total Crawled Pages, RPC: Related Pages' Count TCP: Total Crawled Pages, RPC: Related Pages' Count RCT: Relative Crawling Time, AHR: Average Harvest Rate RCT: Relative Crawling Time, AHR: Average Harvest Rate AHR: the mean of harvest rates in each segment AHR: the mean of harvest rates in each segment

11 Experimental Results (cont)

12 THE END


Download ppt "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali"

Similar presentations


Ads by Google