A centre of expertise in digital information management UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations Seminar London, Friday 2 December 2005 Pete Johnston Research Officer, UKOLN, University of Bath
A centre of expertise in digital information management Is Metasearching Better Searching? What is metasearch? Making metasearch work –The NISO Metasearch Initiative Metasearch today –Metasearch and Google –Metasearch and "social bookmarking"
A centre of expertise in digital information management What is metasearch?
A centre of expertise in digital information management What is metasearch? Metasearch, parallel search, federated search, broadcast search, cross-database search, search portal are a familiar part of the information community's vocabulary. They speak to the need for search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at one time. NISO MetaSearch initiative
A centre of expertise in digital information management The search problem User wants to find, access, and use items made available by multiple content providers Content providers make their collections available through their own separate presentation services User interacts with multiple services in succession, e.g. –Query Resource Discovery Network (RDN) for Web resources –Query Zetoc for journal articles –etc
The search problem Web Sites
A centre of expertise in digital information management The search problem User has to –Discover different services –Manage different authentication/access requirements –Use different user interfaces for search –Interpret different result sets different metadata –Manipulate different result sets human-readable (HTML) but difficult to merge, reuse May still not have access to (appropriate copy of) resource
A centre of expertise in digital information management The metasearch solution The provision of "metasearch" services that –enable user to search across the metadata databases of multiple content providers from a single interface –manage multiple result sets and present to user –manage authentication/access –(etc!) Seamless (to the user) discovery of and access to heterogeneous, distributed resources!
A centre of expertise in digital information management Approaches to metasearch (1): cross-searching Metasearch service accepts user query Sends query to multiple content provider search targets Receives responses from targets Presents result sets to user
Z39.50, SRW, SRU, etc Metasearch: Cross-search Web Site Search Targets
A centre of expertise in digital information management Approaches to metasearch (2): harvesting Metasearch service periodically gathers metadata records from content provider repositories into local database Metasearch service accepts user query Executes query on local database Presents result sets to user Some harvesting services may also harvest/index copy of resource
Metasearch: Harvester OAI-PMH Web Site Repositories
A centre of expertise in digital information management Cross-searching & harvesting Metasearch service may use both in combination! Cross-search –Latest results returned –Content provider controls searches available –May slow overall performance Harvesting –Better performance for user query –Options for normalisation etc by harvester –Only as up-to-date as last harvest
A centre of expertise in digital information management A hospitable climate for metasearch? Metasearch service depends on access to metadata Web Services –Standards for providing machine interfaces to applications on Web –Based on HTTP and XML –SOAP (messaging protocol), WSDL (service description), WS-* (!!) –WS not just for search! –Service-oriented approaches, modular applications –Google and Amazon provide Web Services "Web 2.0" –"The Web as platform" –Recombining data and services from multiple sources
A centre of expertise in digital information management The problems with metasearch User requires/expects resources from increasing range of content providers What if content provider doesn't implement standard search/harvest interface? Some proprietary APIs, "XML Gateways" –Scalability Some "screen-scraping" –Parsing of HTML pages to obtain metadata –Rights issues –Scalability, volatility
A centre of expertise in digital information management The problems with metasearch Metasearch services work, but…. For service provider –complex, laborious –fragile, susceptible to change by content provider –duplication of effort by service providers For content provider –concerns over efficiency –concerns over access management –rights, branding, results presentation/ranking
A centre of expertise in digital information management Making metasearch work
A centre of expertise in digital information management Making metasearch work Effective metasearch requires agreements between content providers and service providers –Transport protocol(s) –Query language(s) syntax and semantics –Metadata schemas syntax and semantics –Metadata quality presence of values, formats of literals etc –Intellectual property rights issues how metadata records and resources are presented, used –Authorisation / authentication –Disclosure / discovery of collections and services Andy Powell, "Metasearching: an overview", Presentation to BCS EPSG Seminar, July 2004
A centre of expertise in digital information management The NISO Metasearch Initiative Response to concerns of librarians, systems vendors, content providers Aims to enable –metasearch service providers to offer more effective and responsive services –content providers to deliver enhanced content and protect their intellectual property –libraries to deliver services that distinguish their services from Google and other free web services NISO MetaSearch initiative
A centre of expertise in digital information management Task Group 1: Access Management Conducted survey of authentication methods in use Developed use cases for authentication in metasearch context Ranked methods by ability to satisfy needs of use cases Recommends either: –IP-Authentication with a Proxy Server, or –Username/Password authentication Liaison with Shibboleth community
A centre of expertise in digital information management Task Group 2: Collection Description Metasearch service needs information about targets available for search/harvest –Discover collections of potential interest –Obtain sufficient information to identify a collection –Select one or more collections from amongst a number of discovered collections –Discover the services that provide access to the collection –Select a service with which to interact –Interact with service Collection description Service description
Metasearch 1 Metasearch 2 Collection/Service Knowledge Base 1 Collection/Service Knowledge Base 2 Shared Collection/Service Registry
A centre of expertise in digital information management Task Group 2: Collection Description Collection Description Specification –Metadata schema for collection-level description –Closely aligned with DCMI Collection Description Application Profile –Title, Subject, Size, Language, Item Type, Owner, Collector, Audience, Rights etc –Whole/Part relationships –Collection/Catalogue relationships –Collection/Service relationships
A centre of expertise in digital information management Task Group 2: Collection Description Information Retrieval Service Description Specification –Describe those digital services that provide access to collections –Zeerex Indicates protocol used Describes access point(s) for service Describes authentication/authorization requirements Lists operations/queries supported
A centre of expertise in digital information management Task Group 3: Search/Retrieve Result Set Metadata –Metadata schema to describe result set and record within result set –To support ranking, branding etc Citation Metadata –Metadata schema for citation components (based on subset of OpenURL)
A centre of expertise in digital information management Task Group 3: Search/Retrieve NISO XML Gateway –Based on SRU ("non-conformant subset") –Query encoded in URI, transmitted in HTTP GET, response as XML document –Three levels of implementation Level 0: Any query grammar Level 1: Provide description record for database Level 3: Support CQL –Liaison with A9 Opensearch
A centre of expertise in digital information management Metasearch today
Metasearch and Google Google –Harvests full-text of Web pages by following links –Makes indexes available for search –Result ranking based on number of links to page Index coverage limited to "visible Web" –Problems with Authentication controls Non-persistent URIs Non-textual resources Even if indexed, low ranking if few links No fielded searching
A centre of expertise in digital information management Metasearch and Google "Success is as much about what you dont search as what you do" Selection is important Relevance of results not determined only by links, citations e.g. often useful/vital to select/filter by audience, purpose of resource Roy Tennant, "Is Metasearch Dead?"
A centre of expertise in digital information management Metasearch and Google Google interest in indexing "hidden Web" –Collaborations with repository providers, OCLC etc –Google Scholar Google interest in metadata-based approach? –Google Base Google and Metasearch as complementary approaches to discovery
Metasearch and "Social bookmarking" del.icio.us
Bibliographic metadata added to item by Connotea Metasearch and "Social bookmarking" Connotea
A centre of expertise in digital information management Metasearch and "Social Bookmarking" Simple user-generated metadata –Typically description plus "tags" –Capture user perceptions of resources –Some services adding richer metadata Social: merging of personal collections –Bookmarking services as discovery services Connotea as "community-driven recommendation system" (Lund et al) Metadata available via RSS or simple API –Can metasearch services use/integrate metadata from bookmarking services?
Is Metasearching Better Searching? Technical components for metasearch available User expectations of coverage mean metasearch is a cross-domain problem However, quality of metasearch dependent on –metadata quality –metadata consistency –…across multiple providers Metasearch can complement other approaches Metasearch as "enabler" –supporting construction of many different services
A centre of expertise in digital information management UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations Seminar London, Friday 2 December 2005 Pete Johnston Research Officer, UKOLN, University of Bath