Download presentation
Presentation is loading. Please wait.
1
A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali m_jamali@ce.sharif.edu
2
Introduction The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate
3
Crawler Architecture
4
Crawler Architecture (cont)
5
Content Similarity Measure
6
Evaluations 1. Precision 2. Recall We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We compared the both results with usual BFS crawler We compared the both results with usual BFS crawler
7
Experimental Results
8
Experimental Results (cont)
10
TCP: Total Crawled Pages, RPC: Related Pages' Count TCP: Total Crawled Pages, RPC: Related Pages' Count RCT: Relative Crawling Time, AHR: Average Harvest Rate RCT: Relative Crawling Time, AHR: Average Harvest Rate AHR: the mean of harvest rates in each segment AHR: the mean of harvest rates in each segment
11
Experimental Results (cont)
12
THE END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.