A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali m_jamali@ce.sharif.edu

Introduction The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate Focused crawler entails a very small investment in hardware and network resources and yet achieves respectable coverage at a rapid rate

Crawler Architecture

Crawler Architecture (cont)

Content Similarity Measure

Evaluations 1. Precision 2. Recall We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We ran the algorithm 2 times, one with a good hub for the topic and the other with a general page We compared the both results with usual BFS crawler We compared the both results with usual BFS crawler

Experimental Results

Experimental Results (cont)

TCP: Total Crawled Pages, RPC: Related Pages' Count TCP: Total Crawled Pages, RPC: Related Pages' Count RCT: Relative Crawling Time, AHR: Average Harvest Rate RCT: Relative Crawling Time, AHR: Average Harvest Rate AHR: the mean of harvest rates in each segment AHR: the mean of harvest rates in each segment

Experimental Results (cont)

THE END

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali

Similar presentations

Presentation on theme: "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali

Similar presentations

Presentation on theme: "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity SeyedMohsen (Mohsen) Jamali"— Presentation transcript:

Similar presentations

About project

Feedback