Distributing the Indexing and Retrieval of Information Winston Bourne IRNLP
Introduction n Need for Distributed IR n How indices help n Controlling a distributed search n Using meta-data n Distributed IR solutions n Meta-search engines
Need for Distributed IR n Large, ever growing pool of data n Finding required data n Relevance and quality of results n Distributing IR allows scaling of searches
How Indices help n Index resources by some method n Indices far smaller than data pool n Allow multiple agents to search quickly - consider library
Controlling a Distributed Search n Prevent duplicate results n Use domain specific agents n Identify and track queries n Self organizing networks, using hypertext
Using Meta-Data n Meta-data is data about data n Create a description of a resource n Convert query into meta-data
Distributed IR solutions n Emerge, specifically for Scientific data. n Harvest, general Distributed IR, with choice of topology n CHIC-Pilot project, demonstration distributed IR architecture, converging many standards & protocols
Standards and protocols n Dublin Core: generic set of Meta- data descriptors n RDF: used by Emerge, XML based, inheritance hierarchy n SOIF: used by Harvest, simple text based n Centroids: remove redundant terms to further compress indices n Z39.50: ISO, widely used by established institutions.
Meta-Search Engines n Everyday demonstration of Distributed IR n Use single interface to query many conventional search engines n Collation of duplicate results n Convert query to target compatible n Breadth of search at expense of depth