Metasearch Technologies: Definitions, Issues, Reference Applications William H. Mischo & Mary C. Schlembach Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign Session 2441: Federated Searching ASEE 2004 National Conference June 22, 2004
Outline Distributed, heterogeneous repositories require federation and linking. Metasearch definitions. Technologies. UIUC RFP process. Issues and trends. Expanding use of metasearch. Custom reference and search applications.
Distributed Information Environment We live in a world of multiple, heterogeneous information resources. –OPACs: local, regional, national shared. –Locally mounted and remote A & I Services. –Discrete publisher and vendor full-text repositories. –E-Resource registries: Serial solutions, TDNet. –OAI search services (OAIster, NSDL) and preprint servers. –Web search engines. –Vertical publisher and vendor portals (ARL Portal, DOE Information Bridge, Elsevier Scirus & Scopus, EI Village, BioMed Central, Public Library of Science). Surface Web and Hidden Web. –Institutional Repositories (D-Space). –Instructional (course) management systems (WebCT, Blackboard). David Seaman: ‘we don’t shelve by publisher, why do we expect users to search by publisher.’
Metasearch as a Solution Distributed, heterogeneous resources and repositories require federation and linking. Terminology: Metasearch, parallel search, federated search, broadcast search, cross- database search, simultaneous search, search portal. Defined by allowing search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once. NISO Metasearch Initiative: emerging standards and best practices.
Value of Metasearch (Pro) To recommend and rank specific information resources for users; to facilitate search over multiple information resources. “Metasearching and other means of unifying search across heterogeneous products…most significant trend.” Deployment of algorithmic searches that mimic the behavior of reference librarian. Integration of E-resources, Local Link Resolvers, and Metasearch.
Value of Metasearch (caveats) Metasearch vendor: “metasearch does not provide the robust search functionality of native interfaces.” Thesauri browse, ing large (non- displayed) sets. LJ editorial 4/1/04, “Do we want or need metasearching?” Content, Limiting, full- text links, Used Education, Boolean, Thesauri. NISO Workshop: Local Link resolution a mission-critical application; metasearch not yet.
Nomenclature Metasearch also used to refer to systems that search a multiple number of previously crawled Web search engines, such as Google, AlltheWeb, AltaVista. Examples: EZ2Find, Vivisimo, Dogpile, Kartoo. In our world, refers to systems that work over the distributed information environment predominated by bibliographic resources.
Federated vs. Broadcast Searching Federated: heterogeneous information resources are imported or “harvested” (sometimes using OAI-PMH protocols) into a local, central site and the normalized results are placed into a homogeneous database system for search and discovery. EI Village, ISI Web of Knowledge, OAISTER, Grainger OAI Search Service, NSDL.
Federated vs. Broadcast Searching Broadcast: user search arguments are sent asynchronously (all at the same time) to remote, distributed systems and the search results are collected, normalized, and displayed to the user. MetaLib, EnCompass for Resource Access, WebFeat. Not mutually exclusive. Can do broadcast over federated systems.
Broadcast Search Basic Technologies Z39.50 HTTP “screen-scraping” XML gateway and Web Services. Proprietary APIs.
MetaSearch Implementations Ex Libris MetaLib. Endeavor EnCompass. Innovative Interfaces MetaFind. MuseGlobal MuseSearch EI Village; ISI Web of Knowledge. WebFeat. California Digital Library SearchLight system. Fretwell-Downing. Locals (NCSU, Grainger Library, Los Alamos).
Retrieval Issues Pass-through to native interface at point of search departure. Coupling of metasearch records with Local Link Resolvers. –Providing OpenURL enabled links to full- text, other services. Merging and De-Duplication. –Partial de-duping of sequentially retrieved sets. Pulling over already extant full-text links from vendor systems.
Technology Issues Consortium-based implementations. Search Statistics (COUNTER compliance). Vendor concerns with supporting multiple metasearch sessions – throwing a logoff to kill a session. Search query standards – SRW/SRU, XQuery, OpenURL, one-step URL- launch searches.
Future and Custom Applications Time of rapid development and growth in Metasearch applications. Expect continuing evolution. Metasearch technology fairly easy to implement locally over selected resources. Focusing on apps that allow custom Best-Match and algorithmic searching that mimics reference librarian.
Our Approach User interface and discovery systems that emphasize function or needs-based approaches to retrieval. Reference and Known-item. Metasearch technologies that offer additional opportunities beyond simultaneous search of discrete A & I Services. Performing multiple searches within individual resources to determine “Best-Match” search results. Combined with selected simultaneous search of other resources.
UIUC Examples Conference (Paper) Search: –Multiple searches within OPAC for held conference proceeding + EI Village for specific paper and OCLC Conference Papers. Failed conference search presents similar journal articles. Journal Finder –searching e-resource registry (based on TDNet), local serial databases, two different OPAC searches for holdings. Searches CrossRef for DOI full-text link. Used in training reference staff and assisting in patron point-of-need services.
Features Performing multiple searches within a specific resource in order to arrive at the optimum result set. Interpret the user-entered search argument and then route the query to selected resources: ACM, IEEE. Takes user-entered title search string and checks against an abbreviation database at the title and word level. Stop words in OPAC. Search results presented as they are returned or having the aggregate results interpreted and presented with accompanying explanations.