Presentation is loading. Please wait.

Presentation is loading. Please wait.

Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.

Similar presentations


Presentation on theme: "Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006."— Presentation transcript:

1 Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006

2 What is focused crawling Crawling vs. Focused crawling Seed Page Target page

3 Crawling methods Web search algorithm: – Breadth-first (using in standard crawling) – Best-first (using in focused crawling) – They are local-search strategies Web analysis algorithm – content-based web analysis page text, title, URL, page layout – link-based web analysis hard to analyze the page while the knowledge about the search graph is not yet known completely.

4 Related works Naïve Bayes Crawler: relevance score is the cosine similarity between page and topic IBM focused crawler introduce a distiller to find topic hubs. CORA crawler: assign Q-value according number of target pages in neighborhood Context focused crawler introduce a link hierarchy Automatic Publication Data Gatherer: classified the webpage without the page PaSE: locate publication using Search Engine

5 General framework repository Page fetch UnitURL filterURL extractor Frontier Classifier Feature extractor Highly depend on the seed pages Term Extraction module

6 Baseline system

7 Three stage of the crawling

8 Framework for upgraded system

9 TargetURLSearch Engine More Pages Term Extraction

10

11 Baseline systemUpgrade system Publication pages found45117 precision 3.21%8.36% recall 26.63%69.23% F1 0.0570.149


Download ppt "Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006."

Similar presentations


Ads by Google