Download presentation
Presentation is loading. Please wait.
Published bySamantha Casey Modified over 8 years ago
1
Autumn 20111 Web Information retrieval (Web IR) Handout #11:FICA: A Fast Intelligent Crawling Algorithm Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir
2
Autumn 20112 Web Crawling Search engines do not index the entire Web Therefore, we have to focus on the most valuable and appealing ones To do this, a better crawling criterion is required –FICA
3
Autumn 20113 Breadth-First Crawling p r q t s v u x w z y BFS Advantages Why it is a acceptable algorithm?
4
Autumn 20114 Logarithmic Distance Crawling p r q t s v u x w z y d pt =log4 log4 d pz =log4+log2=0.9 d pv =log4+log3=1.07 When i points to j then:
5
Autumn 20115 FICA Intelligent surfer model It is based on reinforcement learning
6
Autumn 20116 FICA (On-line) Downloader Web Web Priority Queue FICA scheduler URLs Web pages Repository Text and Metadata Distance is used as the priority value URL1 URL2 … Seeds
7
Autumn 20117 Comparison with Others Downloader Web Web Repository Partial Ranking Algorithm URLs and Links URL1 URL2 … Seeds
8
Autumn 20118 Experimental Results Experiment was done on UK web graph including 18 million web pages We chose PageRank as an ideal ranking mechanism
9
Autumn 20119 FICA Properties Its time complexity is O(ElogV) –Complexity of Partial PageRank is FICA outperforms others in discovering highly important pages It requires small memory for computation It is online & adaptive
10
Autumn 201110 FICA as a Ranking Algorithm AlgorithmKendall's Tau Partial PageRank 0.18 Back Link0.09 Breadth-first0.11 OPIC0.62 FICA0.61 We used Kendall's metric for correlation between two rank lists Ideal is PageRank
11
Autumn 201111 Dynamic Version of FICA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.