Metasearch engine for Austrian research information Marek Andričík Vienna University of Technology Search engines Metasearch engines Prototype.

Slides:



Advertisements
Similar presentations
Database Searching: How to Find Journal Articles? START.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Search Engine Marketing Free Traffic for Your Web Site Paul Allen, CEO
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Web Server Hardware and Software
Using COS Funding Opportunities the world’s largest funding information database ™
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Web Algorithmics Web Search Engines. Retrieve docs that are “relevant” for the user query Doc : file word or pdf, web page, , blog, e-book,... Query.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
How Search Engines Work Source:
Search engines. The number of Internet hosts exceeded in in in in in
Internet Research Search Engines & Subject Directories.
Library 10 – Information Competency Search Engines.
Databases & Data Warehouses Chapter 3 Database Processing.
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
2003 April 151 Data Centres: Connecting to the Real World Clive Page.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Searching the Web Using Search Engines and Directories Effectively Tutorial 4.
Search Engine Interfaces search engine modus operandi.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Searching Information. General Steps Identifying Key Words, Synonyms, and Key Phrases Constructing an effective search statement Advance search/boolean.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
 This method of searching uses software programs that search the web and record information. These programs are called by many names spiders, robots,
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Search Engines.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Searching The Internet Open Text Searching vs. Subject Tree Search Open Text Search Search Engine scans the Web looking for a word or group of words.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Search Tools and Search Engines Searching for Information and common found internet file types.
Search Engines By: Faruq Hasan.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Internet Research – Illustrated, Fourth Edition Unit A.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
NTU Libraries Dec Find the database webpage and link to Factiva  Direct URL:
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Search Engines, SEO and Web Search By Alessandro Ballarin.
CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
User Interfaces and Information Retrieval Dina Reitmeyer WIRED (i385d)
Yahoo! BOSS Open up Yahoo!’s Search data via web services Developer & Custom Tracks Big Goal – If you’re in a vertical and you perform a search, you should.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
TOPSpro Special Topics
Map Reduce.
CIW Lesson 6 Web Search Engines.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Federated & Meta Search
Search Engines & Subject Directories
Eric Sieverts University Library Utrecht Institute for Media &
Comparing Numbers.
Locating & Verifying information on the Web
Anatomy of a Search Search The Index:
Introduction to Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Comparing Numbers.
ADVANCED SEARCH ON WESTLAWNEXT
Scopus - Elsevier (Advanced Course: Module 8)
Search for Article Citation
Presentation transcript:

Metasearch engine for Austrian research information Marek Andričík Vienna University of Technology Search engines Metasearch engines Prototype

Search engines General, big well-known search engines. They index “everything”. Google, Altavista… Special not open-source software. Incompatibilities in query languages. Specialized, smaller, topic-based or area-restricted. Smaller hardware requirements. Many open-source solutions available. 1st engine in 1994: hundreds of thousands docs. 3 years later: tens of millions. Today: milliards of documents. Queries: from thousands to hundreds of millions per day.

Search engines problems When search engine results can be unsatisfactory: Engine does not know about document. Document has changed and was not re-indexed. Document is not directly accessible. Only through special (usually web) interface. Existence of several concurrent engines raises chance that one search engine has already indexed one particular document, while others did not.

Metasearch engines Appeared one year later after search engines did. It does not have its own index nor it uses indexes of other search engines. What metasearch can/cannot solve: It will not find any new document. It will not help with tracking changes. It can access documents behind proprietary interfaces easily.

Metasearch engines problems Query languages of search engines differ. It is necessary to transform primary query to set of secondary queries. Metasearch can: Define common simplified grammar. Simplify primary query. In second case, search results can differ.

Prototype For each query: Primary query -> set of secondary queries. Submitting in parallel. Serial parsing of results. Sorting according to ranking. Final list is shown. CGI program written in Perl.

Table of features Contains: Boolean capabilities (AND, OR, NOT, parenthesis, phrase support, asterisk). Covered categories (Persons, Institutions, Projects, Results) 5 search engines: our mnoGoSearch search engine, Dissertationsdatenbank, AURIS, Cordis, DEPATISnet.

Ranking, customization Not much information about document ranking. Engines usually do not show numerical ranking. Prototype does: Preserves already sorted partial results. Links with match in title are preferred. Sorts using overall ranking number of every search engine. It is possible to have own login and customize several parameters (ranking, sorting, timeouts, history or language).

Conclusion Prototype is still work in progress, but already offers useable functionality.