CourseCrawler Matt Berntsen Don Frehulfer Evan Kaiser
General Purpose Tool that can be used to find definitions for terms/acronyms Crawl through a set list of glossary web sites and index all terms/definitions Provide mechanism for ranking of the definitions by users
Search Functionality 1. User visits main page 2. User enters and submits query 3. Resulting terms/definitions are generated and formatted into a table, sorted by rank 4. Users may follow links to pages or enter new searches
Administrative Functionality 1. Client logs in to Administrative page 2. Existing sites are listed with corresponding site names 3. User clicks to add/delete/edit entries 4. User logs out
Crawler Functionality 1. User sets crawling interval 2. Crawler wakes up after specified time 3. Crawler runs by traversing the websites on the user provided list 4. Crawler updates definition database 5. Crawler then sleeps specified time
Program Structure (Simplified) Webserver frontend, handles search and administrative functionality. Crawler, crawls through pages harvesting terms and corresponding definitions Database backend, allow efficient storage and retrieval of large amounts of data.
Program Structure (Graphical) INTERNET DB Database Interface Crawler/Parser (Thread) HTML_ENGINE (Package) CourseWebServer
Program Structure (UML)
Demonstration Please wait while we fire up Mozilla. Random IE Suxx0rs message