Download presentation
Presentation is loading. Please wait.
1
Inf 141 Information Retrieval Winter 2008
Websphinx & Webgraph Inf 141 Information Retrieval Winter 2008
2
Assignment 3 See course webpage for specifications Due Friday Feb 8th
Working in groups of 2-3 people with subject: Inf 141 Team Registration Train your group Each member of your group must be able to run your architecture on their own for Assignment 04. Quiz next wednesday
3
Assignment 3
4
Websphinx www.cs.cmu.edu/~rcm/websphinx/
To write a crawler, extend class Crawler and override shouldVisit () and visit() to create your own crawler. visit(): The page is passed to the crawler's visit() method for user-defined processing. shouldVisit(Link l): Callback for testing whether a link should be traversed. Default returns true for all links. Override for other behaviors.
5
Websphinx Create an array consisting of your seed set of links
Look at the Link Class Links to webpage Make a link from a string URL Make a link from a start tag and end tag Look at Page Class Mainly supports automatically parsed HTML pages Parsing produces a list of tags, words, an HTML parse tree, links Can make pages
6
Webgraph Webgraph is a framework to study the web graph
Use ArrayListMutableGraph class Mutable graph class based on IntArrayList Creates a new mutable graph copying a given immutable graph ArrayListMutableGraph(ImmutableGraph g) View ImmutableGraph class
7
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.