Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inf 141 Information Retrieval Winter 2008

Similar presentations


Presentation on theme: "Inf 141 Information Retrieval Winter 2008"— Presentation transcript:

1 Inf 141 Information Retrieval Winter 2008
Websphinx & Webgraph Inf 141 Information Retrieval Winter 2008

2 Assignment 3 See course webpage for specifications Due Friday Feb 8th
Working in groups of 2-3 people with subject: Inf 141 Team Registration Train your group Each member of your group must be able to run your architecture on their own for Assignment 04. Quiz next wednesday

3 Assignment 3

4 Websphinx www.cs.cmu.edu/~rcm/websphinx/
To write a crawler, extend class Crawler and override shouldVisit () and visit() to create your own crawler. visit(): The page is passed to the crawler's visit() method for user-defined processing. shouldVisit(Link l): Callback for testing whether a link should be traversed. Default returns true for all links. Override for other behaviors.

5 Websphinx Create an array consisting of your seed set of links
Look at the Link Class Links to webpage Make a link from a string URL Make a link from a start tag and end tag Look at Page Class Mainly supports automatically parsed HTML pages Parsing produces a list of tags, words, an HTML parse tree, links Can make pages

6 Webgraph Webgraph is a framework to study the web graph
Use ArrayListMutableGraph class Mutable graph class based on IntArrayList Creates a new mutable graph copying a given immutable graph ArrayListMutableGraph(ImmutableGraph g) View ImmutableGraph class

7 Questions ?


Download ppt "Inf 141 Information Retrieval Winter 2008"

Similar presentations


Ads by Google