Crawler-Based Search Engine By Ryan Caplet, Morris Wright and Bryan Chapman
Background ► Crawler based search engine A script/bot that searches the web in methodical, automated manner (wikipedia, “web crawler”) The bot starts with seeds (small list of URLs) to create a bigger list of sites to visit. And so on…
Motivation ► The motivation for this project is we are all interested in how a search engine works. ► The way we are doing it we are getting more experience in various programming languages and programs
Initial Priorities ► Set up server ► Set up database ► Both be fully functional ► Setup indexer ► Make indexer work with the web page ► Ranking
Projected Team Member Breakdown ► Bryan Chapman The Crawler Analyzing Files ► Ryan Caplet Search Functions Test Functions ► Morris Wright UI Development Database Management Web Server Account Manager
Development Environment ► Use of Linux and Apache Web Server ► A possible place for development is the UCONN ECS web server ► Use of MySQL
Programming Languages ► PHP For web page programming ► Perl or Python Possibly for other scripting needs ► HTML For displaying web pages ► Server Query Language Interaction with the database
Database Management - Projected ► Four Fields ID Title URL Keywords
Projected Security Concerns ► Prevent Injections ► Make sure search queries match what is in the database ► Filter through webpage tags
Basic Use ► Our basic scope is to search the UCONN network for instances of what we want to search for ► URLs that are searched are going to be added to an SQL database.
Test Plans ► Test plans for this project will be… Keeping good consistency of rendering across different OSs/Browsers Check to make sure that search queries are match what is in the database
Conclusion And that is it!