Download presentation
Presentation is loading. Please wait.
1
An Adaptive Crawler for Locating Hidden-Web Entry Points Problem: Several million pages on the Internet are hidden in the deep web. How can existing methods of surface discovery of topic-specific deep web sources be improved using learning techniques? Coverage and/or efficiency of topic-specific deep- web discovery can be greatly increased through informed search, but tunnel-vision can limit coverage.
2
ACHE builds off FFC (Adaptive Crawler for Hidden Web Entries) (Form-Focused Crawler) FFC ACHE Evaluation Optimisation & Exploration 1. N priority queues, based off estimated distance, weighted by likelihood (link classifier) 2.Benefit as topic likelihood (page classifier)
3
Evaluation Can significantly outperform FCC for some topics where classification may be biased (e.g. bookstores) Does not introduce significant overhead, despite extra components. What about heterogeneous databases? What about limited search criteria in forms?
4
CSE494 Links A huge portion of the web is “deep” and difficult to search. Several applications are interested in topic- specific mining of the deep web. FFC uses link neighbourhood text for distance classification and uses traditional classification techniques for staying on topic. Has an unfortunate name. (e.g. NBC, Shingles)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.