Metasearch engine for Austrian research information Marek Andričík Vienna University of Technology Search engines Metasearch engines Prototype
Search engines General, big well-known search engines. They index “everything”. Google, Altavista… Special not open-source software. Incompatibilities in query languages. Specialized, smaller, topic-based or area-restricted. Smaller hardware requirements. Many open-source solutions available. 1st engine in 1994: hundreds of thousands docs. 3 years later: tens of millions. Today: milliards of documents. Queries: from thousands to hundreds of millions per day.
Search engines problems When search engine results can be unsatisfactory: Engine does not know about document. Document has changed and was not re-indexed. Document is not directly accessible. Only through special (usually web) interface. Existence of several concurrent engines raises chance that one search engine has already indexed one particular document, while others did not.
Metasearch engines Appeared one year later after search engines did. It does not have its own index nor it uses indexes of other search engines. What metasearch can/cannot solve: It will not find any new document. It will not help with tracking changes. It can access documents behind proprietary interfaces easily.
Metasearch engines problems Query languages of search engines differ. It is necessary to transform primary query to set of secondary queries. Metasearch can: Define common simplified grammar. Simplify primary query. In second case, search results can differ.
Prototype For each query: Primary query -> set of secondary queries. Submitting in parallel. Serial parsing of results. Sorting according to ranking. Final list is shown. CGI program written in Perl.
Table of features Contains: Boolean capabilities (AND, OR, NOT, parenthesis, phrase support, asterisk). Covered categories (Persons, Institutions, Projects, Results) 5 search engines: our mnoGoSearch search engine, Dissertationsdatenbank, AURIS, Cordis, DEPATISnet.
Ranking, customization Not much information about document ranking. Engines usually do not show numerical ranking. Prototype does: Preserves already sorted partial results. Links with match in title are preferred. Sorts using overall ranking number of every search engine. It is possible to have own login and customize several parameters (ranking, sorting, timeouts, history or language).
Conclusion Prototype is still work in progress, but already offers useable functionality.