OCLC Online Computer Library Center Interoperability Standards & Searching Multiple Repositories Ralph LeVan/OCLC Ray Denenberg/Library of Congress.

Slides:



Advertisements
Similar presentations
SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:
Advertisements

Z39.50 as a Web Service Ralph LeVan Research Scientist.
Search Web Services Ralph LeVan Senior Research Scientist.
Ralph LeVan Research Scientist
SRU and CQL Ralph LeVan Senior Research Scientist OCLC.
A centre of expertise in digital information management UKOLN is supported by: SRU: An overview of the SRU protocol and how it can be used.
A centre of expertise in digital information management UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Ray Denenberg Ralph LeVan Interoperability Standards & Searching Multiple Repositories Workshop 20 March 25, 2006; Washington.
1.  Understanding about How to Working with Server Side Scripting using PHP Framework (CodeIgniter) 2.
Engineering Village ™ Basic Searching.
Ray Denenberg Ralph LeVan Workshop 20 March 25, 2006; Washington Metasearch - the NISO Initiative.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
XLink: Open Linking Standard XML / XSL separate  data semantics  presentation semantics Need to also separate out  navigation semantics Single unique.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
Federated Searching: The ABC’s of HSE, XML, & Z39.50 Harry Samuels Product Manager Linking & Searching August 27, 2004.
Overview of Search Engines
RSS RSS is a method that uses XML to distribute web content on one web site, to many other web sites. RSS allows fast browsing for news and updates.
The Internet & The World Wide Web Notes
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
JSP Standard Tag Library
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Server-side Scripting Powering the webs favourite services.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Access 2008 Using WorldCat Grid Services in Library Applications Roy Tennant Senior Program Officer OCLC Research.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
ELAG 2004  Work Shop on ZING Bill Oldroyd, Animator British Library Janifer Gatenby, Scribe OCLC PICA, Leiden, Netherlands.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic.
WorldCat Local & World Cat Quick Start a new way to search your library’s resources and the world beyond.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
Creating Dynamic Web Pages Using PHP and MySQL CS 320.
Date : 3/3/2010 Web Technology Solutions Class: Application Syndication: Parse and Publish RSS & XML Data.
Chapter 6 Server-side Programming: Java Servlets
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
CNI, 4th April 2006 Slide 1 Key Standards Update: SRU (“Technical” Details) Dr. Robert Sanderson Dept. of Computer Science University of Liverpool
The physical parts of a computer are called hardware.
1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.
Information Retrieval
SRW/U: Re-Introduction SRW is a Web Services based Information Retrieval Protocol Motivations: Create an easy to implement protocol with the power of Z39.50.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Tableau Server URL Parameterization and Limits. Background This short set of material covers how Tableau Server Views can be invoked via URLs while passing.
Lucene Jianguo Lu.
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
The ___ is a global network of computer networks Internet.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Chapter 27 WWW and HTTP.
WorldCat: Broad Web visibility for our collection
Interoperability Standards & Searching Multiple Repositories
Bryan Burlingame 24 April 2019
Information Retrieval and Web Design
Presentation transcript:

OCLC Online Computer Library Center Interoperability Standards & Searching Multiple Repositories Ralph LeVan/OCLC Ray Denenberg/Library of Congress

The Problem How do I provide a common interface for my users? How do I combine results from multiple sources?

How do I provide a common interface for my users? How do I convert my queries into the Content Provider’s (CP’s) queries? How do I ask for 10 records? How do I ask for more records? How do I interpret their response?

How do I convert my queries into the CP’s queries? My user said “author=twain and title=huck finn” Google expects: +twain +”huck finn” Z39.50: twain/1=1003;4=2 “huck finn”/1=4;4=1 and Lucene: creator:twain and titlePhrase:”huck finn”

How do I ask for 10 records? Amazon won’t let you RedLightGreen: MAXRECORDS=n British Library: records=n

How do I ask for more records? Amazon: page=n RedLightGreen: STARTINDEX=n British Library: start=n

How do I interpret their response? How many records did I retrieve? Did something go wrong? How do I convert the CP’s records into something my users will recognize?

How many records did I retrieve? Amazon: Books (334) RedLightGreen: Viewing: 1-10 of 239 results British Library 190

Did Something Go Wrong? RedLightGreen: We didn't find any matches for dog and. British Library: Nothing found due to an error Too many hits. Refine your request.

How do I convert the records? Amazon: Thud! (Discworld, Book 32) by Terry Pratchett ( Hardcover - Sep 13, 2005) Books: See all 334 items Buy new : $24.95 $15.72 Used & new from $3.76 Usually ships in 24 hours Excerpt from page 2 : "... Terry Pratchett "Most of the news is... " See more references to pratchett in this book. Surprise me! See a random page in this book.

Converting Records Cont. RedLightGreen: Hogfather, by Terry Pratchett 3 editions published between 1996 and 1998 in English. Primary Subject: Discworld Imaginary Place - Fiction 2.

Converting Records Cont. British Library: Thud! / Terry Pratchett. doc- set&doc_number= &l_base=BLL01& from=A9OpenSearch Pratchett, Terry. ; London : Doubleday, ISBN (hbk.) : £ (Added : )

How do I combine results from multiple sources? Things you might want the server to do for you: –Common Record Format –Common Sort Order –Common Rank Order

Functional Matrix Request Record Starting Point Request Number of Records Request Record Schema Defined Query Grammar Specify Sort Order Specify Ranking Order Diagnostic Messages XML Response Record Count In Response Records In Known Schema

The Old Solutions Screen Scraping Private API’s Z39.50

Screen Scraping A query has to be generated and embedded in a CP specific URL Code has to be written to examine the HTML returned by a CP Prone to breakage –Web sites change formatting frequently Every site is unique –Separate code to be maintained for every site

Private API’s Often only a slight improvement over screen scraping Provides documentation on how to construct the URL Might provide documentation on how to construct the query Might guarantee a stable response format Still requires unique code for each site

Z39.50 Guarantees a standard request and response But… –Not HTTP or HTML Binary encoding over raw TCP/IP –Complicated 11 services 7 extended services –Easy to be compliant and not interoperable –Unfriendly The response to a protocol error was to drop the connection

Why Use A Standard API? Defined requests and responses Reusable code across sites Open Source code

The New Solutions OpenSearch 1.1 MXG –Levels 0-2 SRU

OpenSearch 1.1 From Wikipedia –OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication. It is a way for search engines to publish their search results in a standard and accessible format

OpenSearch 1.1 (cont.) Defines a Description Record with information about the CP –ShortName and LongName –Description –Tags –URL template Example:

OpenSearch 1.1 (cont.) URL Template –Server Indicates how to specify OpenSearch request parameters –Parameters not specified in the template are unavailable –The only mandatory parameter is {searchTerms}

OpenSearch 1.1 (cont.) Request Parameters –{searchTerms} –{count} –{startIndex} –{startPage} –{language} –{outputEncoding} –{inputEncoding}

OpenSearch 1.1 (cont.) Uses RSS 2.0 with a few extra elements for the response –RSS define title, description and link elements –OpenSearch adds the totalResults, startIndex, itemsPerPage, link and Query elements bin/OSxml1.cgi/?q=levan&format=rss

Functional Matrix OS 1.1 Request Record Starting Point● Request Number of Records ○ Request Record Schema Defined Query Grammar Specify Sort Order Specify Ranking Order Diagnostic Messages XML Response ○ Record Count In Response ○ Records In Known Schema ○ Key: ●==Full Support ○==Limited Support

Cool Feature The RSS mechanism in OpenSearch provides the ability to have persistent and periodic queries!

NISO MetaSearch XML Gateway MXG MXG has been designed to provide a low implementation barrier to content providers that want to make their databases available to metasearch engines. Interoperability across content providers was explicitly not a goal of MXG

MXG Levels of Support Level 0: Requests are simple URL’s using any query grammar and responses are XML records Level 1: Adds a description record for the database Level 2: Support a limited subset of a standard query grammar: CQL

MXG Request Version (mandatory) Query (mandatory) StartRecord MaximumRecords s?version=1.1&query="levan"&startRec ord=1&maximumRecords=10

MXG Response <searchRetrieveResponse xmlns=" … "stuff"

MXG Response Records info:srw/schema/1/dc-v1.1 xml … 1

MXG Response recordData rrl1234 Dog and Cat

MXG Error Messages info:srw/diagnostic/1/51 66ntqk list.html

Functional Matrix MXG Level 0 Request Record Starting Point● Request Number of Records● Request Record Schema ○ Defined Query Grammar Specify Sort Order Specify Ranking Order Diagnostic Messages● XML Response● Record Count In Response● Records In Known Schema● Key: ●==Full Support ○==Limited Support

MXG Level 1 Add a description record for the database

Functional Matrix MXG Level 1 Request Record Starting Point● Request Number of Records● Request Record Schema● Defined Query Grammar Specify Sort Order Specify Ranking Order Diagnostic Messages● XML Response● Record Count In Response● Records In Known Schema● Key: ●==Full Support ○==Limited Support

MXG Level 2 Support a limited subset of a standard query grammar: CQL Supports indexes and Booleans ery=dc.author=levan&maximumRecords=1

Functional Matrix MXG Level 2 Request Record Starting Point● Request Number of Records● Request Record Schema● Defined Query Grammar ○ Specify Sort Order Specify Ranking Order Diagnostic Messages● XML Response● Record Count In Response● Records In Known Schema● Key: ●==Full Support ○==Limited Support

SRU MXG Level 2 Plus: –Full Query Grammar (CQL) –Full Sort Specification

CQL: Common Query Language Loosely based on CCL Search Boolean & Proximity Operators Index Sets & Indexes String Indexes vs. Keyword Indexes Truncation Characters ‘*’, ‘#’ & ‘?’ Relations: ‘=‘, all, any, exact, within Example: dc.title=“harry potter” or bib1.isbn= x

Sort sortKeys parameter with the following comma separated values specified: –Xpath (path to the element to be sorted on) –Schema (that the xpath comes from) –Ascending (value is 1==true or 0==false, default==true) –CaseSensitive (value is 1==true or 0==false, default==false) –missingValue (values are omit, abort, highValue or lowValue, default==highValue) e.g. &sortKeys=title,onix,0

Functional Matrix SRU Request Record Starting Point● Request Number of Records● Request Record Schema● Defined Query Grammar● Specify Sort Order● Specify Ranking Order ○ Diagnostic Messages● XML Response● Record Count In Response● Records In Known Schema● Key: ●==Full Support ○==Limited Support

Cool Feature Combining SRU response data and echoed data with javascript and stylesheets allows for thin, browser based, clients s?version=1.1&query="levan"&startRec ord=1&maximumRecords=10

Functional Matrix OS 1.1 MXG L0 MXG L1 MXG L2 SRU Request Record Starting Point●●●●● Request Number of Records ○ ●●●● Request Record Schema ○ ●●● Defined Query Grammar ○ ● Specify Sort Order● Specify Ranking Order ○ Diagnostic Messages●●●● XML Response ○ ●●●● Record Count In Response ○ ●●●● Records In Known Schema ○ ●●●● Key: ●==Full Support ○==Limited Support