When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –

Slides:



Advertisements
Similar presentations
2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Intermediate Course Module 3: Metadata Catalogs and Geospatial One.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Contextual Linking Architecture Christophe Blanchi June Corporation for National Research Initiatives Approved for.
Searching Options and Result Sets Sara Randall Endeavor Information Systems October 30, 2003.
Metasearching: The Problem, Promise, Principles, Possibilities & Perils Roy Tennant California Digital Library.
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Deconstructing Cataloging A Web Services Approach to Bibliographic Control Thomas Hickey.
Cathy N. Hartman University of North Texas Libraries October 10, 1998 Cathy N. Hartman University of North Texas Libraries October 10, 1998.
Making distributed configuration simple with the Torus Mike Taylor, Index Data.
Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data
Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data
Alvis status report: Index DataMike Taylor Alvis status report: Index Data Check out the exciting things to come! 1. Technical contribution.
ZeeRex – an Explain Mechanism for SRW/UMike Taylor ZeeRex – an Explain Mechanism for SRW/U 1. What ZeeRex is 2. How we got where we.
Distributed Service Registries Workshop, July 2005 Slide 1 NISO Metasearch Initiative Registries Robert Sanderson Dept. of Computer Science University.
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
A REST-ful Web Services Approach to Library Federated Search using SRU Kevin Reiss Rutgers-Newark Law Library CALI 2005 – June 11th.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Enterprise Search with FAST Rick McDannel Manager of Information Technology.
South Dakota Library Network MetaLib Management Basics Updating Resources South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
WEB OF KNOWLEDGE 5.2
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
2009 Annual ASERL Membership Meeting Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library
Open Source Library Portals Ron Davies, Ian Hamilton Central Library, European Commission The opinions.
WEB OPAC 2.0 Discovering a better search tool Kevin Collins & Darren Chase, Stony Brook University.
Ray Denenberg Ralph LeVan Workshop 20 March 25, 2006; Washington Metasearch - the NISO Initiative.
BC Integration of Systems and Resources MetaLib at Boston College Theresa Lyman Digital Resources Reference Librarian Boston College Libraries.
Discovery Tools in Academic Libraries: why, what and how? Edith Falk Chef Librarian The Hebrew University Library Authority.
© 2009 Deep Web Technologies, Inc. Federated Search: A Tool for Knowledge Discovery iGroup Online Education Conference Presented by Abe Lederman Founder.
Lesson 1 Introduction: Federated Searching Defined.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
Lesson 2 Technology: Federated Searching Explained.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
Lund Online 07/10/2009 Ingolf Kaspar, Regional Sales Manager EBSCO Publishing.
Z39 Intro DigiTool Version 3.0. Z39 Intro 2 Overview What is z39.50? “A network protocol which specifies rules that allow searching of a range of different.
Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004.
The world’s libraries. Connected. Re-integrating the library: WorldShare Management Services Matt Goldner Product & Technology Advocate OCLC.
The world’s libraries. Connected. Re-integrating the library: WorldShare Management Services Matt Goldner Product & Technology Advocate OCLC.
LIBRARY RESOURCE DISCOVERY PRODUCTS: COMMERCIAL AND OPEN SOURCE OPTIONS Web Manager’s Academy Marshall Breeding Director for Innovative Technology and.
EBSCOhost Integrated Search (EHIS) Available Now.
LIBRARY RESOURCE DISCOVERY PRODUCTS AND SERVICES: OVERVIEW AND PERSPECTIVES Marshall Breeding Director for Innovative Technology and Research Vanderbilt.
Web Scale Discovery Service Vs Federated Search NIKESH NARAYANAN
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
THE STATE OF LIBRARY SEARCH AND DISCOVERY Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
DISCOVERY PRODUCTS AND SERVICES: Introduction and current trends Marshall Breeding Director for Innovative Technology and Research Vanderbilt University.
Mississippi State University Libraries’ EBSCO Discovery Service Experience.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Core Integration Web Services Dean Krafft, Cornell University
© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.
THE FUTURE OF THE LIBRARY CATALOG OPACS GIVE WAY TO DISCOVERY Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library.
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Matt Goldner Product & Technology Advocate Mela Kircher Product Manager WorldCat Local Metasearch 13 November 2009.
Taking the Library Back from Google Abe Lederman, President and CTO October 18-20, 2007.
DISCOVERY SYSTEMS: SOLUTIONS A USER COULD LOVE OVERVIEW OF DISCOVERY SYSTEMS Marshall Breeding Director for Innovative Technology and Research Vanderbilt.
Discovery of Library Resources
Building Search Systems for Digital Library Collections
PNDS Architecture - an overview
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
By Abe Lederman President and CTO June 26, 2011
Presentation transcript:

When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –

Search When worlds collide : metasearching and central indexes Mike Taylor –

Search When worlds collide : metasearching and central indexes Mike Taylor –

Search When worlds collide : metasearching and central indexes Mike Taylor – Data

Search When worlds collide : metasearching and central indexes Mike Taylor – Data Problem solved!

Search When worlds collide : metasearching and central indexes Mike Taylor – Data ??

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching 360 Search EHIS (EBSCO) MetaLib Pazpar2 (Open source)

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data A.K.A. federated search Searching

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data A.K.A. federated search A.K.A. distributed search Searching

Metasearch When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data A.K.A. federated search A.K.A. broadcast search A.K.A. distributed search Searching ?

Back to the sad searcher When worlds collide : metasearching and central indexes Mike Taylor – Data ??

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting Summon WorldCat Primo Central

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting Summon WorldCat Primo Central MasterKey

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting A.K.A. local index

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting A.K.A. local index A.K.A. discovery services

Central index When worlds collide : metasearching and central indexes Mike Taylor – Data Fat database Harvesting A.K.A. local index A.K.A. vertical search A.K.A. discovery services ?

We need a controlled vocabulary! When worlds collide : metasearching and central indexes Mike Taylor – Metasearch = Federated search = Distributed search = Broadcast search Central index = Local index = Discovery services = Vertical search (if you ever heard anything so dumb)

Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – Central indexing compared with metasearching: - requires harvesting infrastructure - requires lots of local storage - requires co-operation from services to be harvested - does not have access to all searchable data - will always be somewhat out of date - is faster at search time (or SHOULD be) - allows data to be normalised (e.g. dates extracted) - allows for better relevance ranking - can provide pre-baked facets - may have access to some data that not searchable

Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor –

Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor –

Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor –

Which approach is better? When worlds collide : metasearching and central indexes Mike Taylor – Let's do both!

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting ! Integrated Search

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting ! Integrated Search

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting ! Integrated Search

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting ! Integrated Search

Metasearch hides the complexity When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching

Metasearch Nine tenths under The surface When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching

Metasearch What you see looks beautiful When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching

Problems that need solving When worlds collide : metasearching and central indexes Mike Taylor – A. Problems with pure metasearching B. How those problems change when you add a central index

Problems with metasearching When worlds collide : metasearching and central indexes Mike Taylor – Examples based on Index Data's suite: Pazpar2 is a free metasearching engine with a stupid name MasterKey is a non-open suite that wraps it MasterKey is only one way to use Pazpar2 Also integrated into other vendors' UIs.

Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI Must be made available via a standard protocol

Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl

Problems with metasearching #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI Must be made available via a standard protocol Option 1: build a gateway in Perl Option 2: MasterKey Connect (non-open)

Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases

Problems with metasearching #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – Catalogs searchable using ANSI/NISO Z39.50 Support is very nominal in some cases IRSpy probes behaviour MasterKey target profiles describe behaviour

Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor –

Problems with metasearching #3: Data servers don't support relevance When worlds collide : metasearching and central indexes Mike Taylor – Pazpar2 does its own relevance ranking (Part of merging/deduplication)

Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor –

Problems with metasearching #4: Data servers don't return facets When worlds collide : metasearching and central indexes Mike Taylor – Pazpar2 calculates its own facets

There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data

There is a lot of magic in the magic box Searching Sorting Merging Deduplication Relevance Facet generation Time travel... When worlds collide : metasearching and central indexes Mike Taylor – Pazpar2 Data Remember, our engine is free:

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting ! What happens when we add a central index?

Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI

Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI

Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI You can't harvest Google

Problems with integrated search #1: No data server at all! When worlds collide : metasearching and central indexes Mike Taylor – Data is often only in a user-facing Web UI You can't harvest Google You just can't

Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases

Problems with integrated search #2: data server is crap^H^H^H^Hsuboptimal When worlds collide : metasearching and central indexes Mike Taylor – Repositories harvestable using OAI-PMH (an even worse name than pazpar2) Support is very nominal in some cases OAI-PMH client must be very tolerant Extensive data-cleaning is usually required

Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – Returned records carry relevance scores Must be merged with records scored by engine Requires score normalisation into same range Existing ordering may be used in merge

Problems with integrated search #3: Central index does support relevance When worlds collide : metasearching and central indexes Mike Taylor – Unranked #1 Ranked #1 Ranked #2 Solr Sort Merged Unranked #2 Sort

Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – Lists of field values with occurrence counts: Author Kernighan 27 Pike 13 Ritchie 7 Thompson 4 Title C 7 Unix 35 Programming 16 Date

Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – Lists are returned or calculated for each server: Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (1000 hits, 100 records) Cat 7 Dog 10 Dinosaur 87 Fish 23

Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – Metasearched counts normalised by total hit-count Server 1 (central index) (all facets from 2000 hits) Cat 68 Dinosaur 162 Fish 145 Frog 19 Server 2 (metasearch) (normalised to 1000 hits) Cat 70 Dog 100 Dinosaur 870 Fish 230

Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – Facet lists are merged Servers 1+2 (integrated) (as though for all records in result sets) Cat = 138 Dog = 100 Dinosaur = 1032 Fish = 375 Frog 19+0 = 19

Problems with integrated search #4: Central index does return facets When worlds collide : metasearching and central indexes Mike Taylor – Fringe benefit: facet-count normalisation is also useful when doing pure metasearching. Servers 1+2 (as though for all records in result sets) Cat = 138 Dog = 100 Dinosaur = 1032 Fish = 375 Frog 19+0 = 19

Summary of search issues When worlds collide : metasearching and central indexes Mike Taylor – Issue Metasearch solution Central index solution No data server Build gateways MasterKey Connect --- Bad data server Probe capabilities Profile targets Tolerant harvester Data-cleaning Relevance scores Magic engine Normalise scores Ingest from server Facets Magic engine Normalise counts Ingest from server

When worlds collide : metasearching and central indexes Mike Taylor – Magic box Data Searching Data Fat database Harvesting

When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –