Extracting XML from Unicorn with OAI and SRU

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
A busy persons introduction to OAI-PMH Christopher Gutteridge ALT, April 2003.
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Heinrich Stamerjohanns Institute for Science Networking Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University.
Theo van Veen, Koninklijke Bibliotheek The European Library: opportunities for new services.
Accessing Distributed Resources Information: An OLAC perspective Steven Bird Gary Simons Chu-Ren Huang Melbourne SIL Academia Sinica ENABLER/ELSNET Workshop.
Gary Holton ANLC LSA Symposium: The Open Language Archives Community 4 January 2002 Creating an OLAC data provider at the Alaska Native Language Center.
SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:
Deconstructing Cataloging A Web Services Approach to Bibliographic Control Thomas Hickey.
Z39.50 as a Web Service Ralph LeVan Research Scientist.
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
Web forms and CGI scripts Dr. Andrew C.R. Martin
A centre of expertise in digital information management UKOLN is supported by: SRU: An overview of the SRU protocol and how it can be used.
Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data
Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data
Ray Denenberg Ralph LeVan Interoperability Standards & Searching Multiple Repositories Workshop 20 March 25, 2006; Washington.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Z39.50 for Finding It All William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton,
A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School.
IESR Interfaces: Current Services and Future Plans Ann Apps MIMAS, The University of Manchester, UK.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
Nate Trail Network Development & MARC Standards Office 8/1/2006 With help from Sydney Olive How to Build, Display and Find METS Objects.
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
OCLC Online Computer Library Center Interoperability Standards & Searching Multiple Repositories Ralph LeVan/OCLC Ray Denenberg/Library of Congress.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
CNI, 4th April 2006 Slide 1 Key Standards Update: SRU (“Technical” Details) Dr. Robert Sanderson Dept. of Computer Science University of Liverpool
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
Open Archives Initiative Protocol for Metadata Harvesting.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
VuFind Digital Libraries à la Carte International Ticer School 2009 Tilburg University 31 July, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB)
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Updated :02 Hong Kong University of Science & Technology Library Workshop on XML-Based Library Applications 4. XML Standards and Tools.
Getting a Leg Up on OAI for the NSDL
z/Ware 2.0 Technical Overview
Georges Arnaout Chaitanya Krishna
Making the most of research outputs
OAI and Metadata Harvesting
Presentation transcript:

Extracting XML from Unicorn with OAI and SRU European Unicorn User Group Conference Glasgow Caledonian University September 7th & 8th, 2006 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels

Agenda Introduction – Unicorn interfaces Part 1: An OAI frontend for Unicorn Part 2: An SRU frontend for Unicorn Short description of OAI and SRU protocols Overview of technical implementation Use cases and demos

Introduction OAI and SRU are ‘open’ protocols that permit exchange of metadata between information systems Well-known Unicorn interfaces: Unicorn API server Unicorn Webcat/iBistro/iLink server Unicorn Z39.50 server All comply to the philosophy of request/response sequences

Unicorn interfaces: API server Catalogue database [ Records and indexes ] TCPIP/Socket API request SirsiDynix Character client C Workflows client Java Themes client TCPIP/Socket API response API datacodes/values Client system Unicorn server Communication protocol TCPIP/Socket Information exchange protocol proprietary SirsiDynix API requests/responses Returned record structure proprietary SirsiDynix format (data-codes and -values)

Unicorn interfaces: iLink Web Server iLink Catalogue database [ Records and indexes ] HTTP iLink request (URL) Any Web browser HTTP HTML page HTML Client system Unicorn server Communication protocol HTTP Information exchange protocol URL requests / HTML responses Returned record structure HTML

Unicorn interfaces: Z39.50 Client system Unicorn server Catalogue database [ Records and indexes ] Z39.50 Z39.50 request Any Z3950 client Z3950 Z3950 response MARC21 Client system Unicorn server Communication protocol Z39.50 specific Information exchange protocol Z39.50 specific Returned record structure typically MARC21

Unicorn interfaces API: Proprietary low interoperability level HTML: Record data not well structured low reusability level Z39.50: Protocol specific more difficult to implement (high learning curve) Z39.50 is statefull Difficult to integrate into today’s web services environments communication: use HTTP information exchange: use open protocols (like OAI and SRU) record data structure: use XML (according to well-defined XML Schema)

2 new Unicorn interfaces HTTP / Open / XML OAI-PMH: Open Archives Initiative – Protocol for Metadata Harvesting SRU: Search and Retrieve via URL

OAI-PMH : the protocol Service Provider Data Provider Web Server OAI Frontend Document Archive HTTP embedded OAI requests HTTP embedded OAI responses Service Provider Data Provider

OAI-PMH: the protocol ‘Harvester collects metadata from archives’ Stateless protocol: sequence of OAI requests/responses over HTTP Just harvesting -- NOT searching

OAI-PMH: the protocol OAI requests HTTP GET|POST requests Syntax BASE URL host + port + path of OAI request handler key=value pairs Examples: http://www.cible.ulb.ac.be:80/ cgi-bin/OAI20/catalog? verb=Identify _ http://www.biomedcentral.com/ oai/1.1/bmcoai.asp? verb=GetRecord&identifier=oai:bmc:1471-2105-1-1&metadataPrefix=oai_dc

OAI-PMH: the protocol OAI responses XML encoded bytestreams, containing the records Record = triplet header (unique OAI identifier) metadata about Metadata schemes XML Schema Minimum: unqualified Dublin Core Community specific Example of a record (catkey 450000 from ULB catalogue): oai_dc marc21 umods

OAI-PMH: the protocol Simple : 6 OAI requests/responses Identify http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=Identify _ ListMetadataFormats [identifier] http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=ListMetadataFormats _ ListSets http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=ListSets _ GetRecord identifier, metadataPrefix http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=GetRecord&identifier=oai:ulbcat:245000&metadataPrefix=marc21 _

OAI-PMH: the protocol Simple : 6 OAI requests/responses ListRecords metadataPrefix, [from,until,set] http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=oai_dc _ http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=mhld21&set=elper _ http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=marc21&from=2006-08-01 _ ListIdentifiers metadataPrefix, [from,until,set] http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListIdentifiers&metadataPrefix=oai_dc _

OAI frontend for Unicorn Implementation of the data provider functionality (2001) http://www.openarchives.org/tools/tools.html pick a template and interface with Unicorn through Unicorn database tools Our choice: Object Oriented Perl frontend (H. Suleman – Virginia Tech) _

OAI frontend for Unicorn HTTP embedded OAI request Unicorn Server HTTP server Unicorn database CGI OAI C wrapper call the appropriate OAI request handler retrieve metadata from Unicorn database format in XML fork in ‘sirsi’ environment OAI.pl HTTP embedded OAI response

OAI frontend for Unicorn Example: implementation of the GetRecord request http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=GetRecord&identifier=oai:ulbcat:245000&metadataPrefix=oai_dc 1. Get metadata from Unicorn for catkey 245000 $record = `echo $catkey | catalogdump -of | filtermarc -iALL -od -Ds`; _ @dates = split(‘\|’,`echo $catkey | selcatalog -iK -opr`); 2. Convert ANSEL character set into ISO-LATIN-1 3. Map from MARC to oai_dc _ 4. Format into XML

OAI frontend for Unicorn Example: implementation of the ‘set’ parameter of the ListRecords request http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=oai_dc&set=elper Precompile set as a file of catkeys name of file: « name of set_catkeys » einstein_albert_catkeys elper_catkeys sd_catkeys all_catkeys through periodic execution of « mkoaisets » custom report

OAI frontend for Unicorn Example: implementation of the ‘from/until’ parameters of the ListRecords request http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=oai_dc&from=2006-08-01&until=2006-08-31 BRS index on creation/modification date? Every Unicorn record that gets created or modified is ‘touched’ in the ‘textedit’ and ‘browsedit’ directories Custom report ‘cadutext’ saves catkeys to <ud>/Savedkeys/adutext/rptid adds line ‘rptid|date|status’ to <ud>/Lastruns/cadutext Example: « from=2006-08-01&until=2006-08-31 » obtain report ids for all runs of cadutext after 2006-08-01 and before 2006-08-31 from the file <ud>/Lastruns/cadutext for each of these report ids: obtain catkeys from <ud>/Savedkeys/adutext/rptid and save them to randomnumber_catkeys file sort and uniq the randomnumber_catkeys file

OAI frontend for Unicorn Limitations of implementation: ListRecords/ListIdentifiers: The from and until parameters are not permitted if the set parameter is given on the request The from and until parameters are permitted if the set parameter is not given on the request, but their values should fall within a certain date range (at this moment arbitrarily set to ‘today - 2 months’ and ‘today’) Deleted records Complete source code and documentation available on the API Repository (http://sirsiapi.org)

OAI frontend - use cases @ ULB Use case 1: Vlink - OpenURL resolver system joint project with Vrije Universiteit Brussel (VUB) Vlink knowledge base ULB iLink JSTOR ISI Web of Science Elsevier ScienceDirect OVID WebSpirs OpenURL HTML extended services

OAI frontend - use cases @ ULB Use case 1: Vlink - OpenURL resolver system OpenURL sent from iLink http://bibdev.vub.ac.be/cgi-bin/openurlulb? sid=ULB:Webcat&id=oai:ulbcat:617924 This OpenURL does not contain enough metadata for the specific item ==> Vlink does a fetch back to Unicorn through an OAI GetRecord request to obtain a full MARC21 bibliographic description http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=GetRecord&identifier=oai:ulbcat:617924&metadataPrefix=marc21

OAI frontend - use cases @ ULB Use case 1: Vlink - OpenURL resolver system Feed Vlink Knowledge Base through OAI harvesting VLink Vlink Knowledge Base Unicorn OAI-PMH http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=mhld21&set=elper

OAI frontend - use cases @ ULB Use case 2: Unicat - Virtual Union Catalog of Belgium University library Catalog Unicorn Aleph VIRTUA VUBIS End User Unicat WWW Gateway Unicat Indexer Unicat Harvester Search/ Browse indexes Union OAI Archive OAI SRU Public Museum Other Central Repository Data providers HTML

SRU : the protocol Client System Unicorn Server Web Server SRU Frontend Catalogue database [ Records and indexes ] HTTP SRU request HTTP SRU response XML Client System Unicorn Server Communication protocol HTTP Information exchange protocol SRU Returned record structure XML

SRU: the protocol ‘Client searches and retrieves metadata records from an archive’ Stateless protocol: sequence of SRU requests/responses over HTTP Search and Retrieve (<-> OAI: harvesting)

SRU: the protocol SRU requests HTTP GET requests Syntax BASE URL host + port + path of SRU request handler key=value pairs 3 possible requests (operations) explain serves to record facilities available at an SRU server used by clients to self-configure returned explain record is in XML and follows the ZeeRex Schema Example: http://z3950.loc.gov:7090/voyager?version=1.1&operation=explain _ scan allows the client to request a range of the available terms at a given point within a list of indexed terms enables clients to present an ordered list of values and, if supported, how many hits there would be for a search on that term searchRetrieve

SRU: the protocol searchRetrieve operation searchRetrieve (principal) parameters Version: (of the request); current protocol version: 1.1 query: query expressed in CQL startRecord: position within the sequence of matched records of the first record to be returned maximumRecords: number of records requested to be returned recordSchema: schema requested for the records to be returned stylesheet: URL for an xml stylesheet. The client requests that the server simply return this URL in the response. CQL « Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accomodate complex concepts when necessary. » (http://www.loc.gov/standards/sru/cql)

SRU: the protocol searchRetrieve operation Examples of CQL queries: dinosaur title = "complete dinosaur" title exact "the complete dinosaur" dinosaur not reptile dinosaur and bird or dinobird publicationYear < 1980 title all "complete dinosaur" title contains all of the words: ‘complete’, and ‘dinosaur’ title any "dinosaur bird reptile" title contains any of the words: ‘dinosaur’, ‘bird’, or ‘reptile’ ribs prox/distance<=5 chevrons a more specific proximity query: ‘ribs’ within 5 words of ‘chevrons’

SRU: the protocol searchRetrieve operation -- examples http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&query=author=einstein _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=author=einstein _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=author=einstein&recordSchema=dc _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=author all "einstein albert“ _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=title all "einstein albert“ _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=title all "einstein albert“&stylesheet=http://bib49.ulb.ac.be/cibleCanevas.xsl _ http://bib49.ulb.ac.be:9000/Cible?version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1&query=title all "einstein albert“&stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl _

SRU frontend for Unicorn Web Server SRU Frontend Catalogue database [ Records and indexes ] HTTP SRU request HTTP SRU response XML Client system Unicorn Server

SRU frontend for Unicorn Z39.50 Frontend Catalogue database [ Records and indexes ] SRU/Z39.50 Gateway Web Server HTTP SRU request Z3950 Z3950 request HTTP SRU response XML Z3950 Z3950 response Client system SRU/Z39.50 Unicorn Server

SRU frontend for Unicorn SRU/Z39.50 Gateway: YAZ Proxy (Index Data) Implemented at ULB: 7/2006 (2 days) config.xml <target name="cible" default="1"> <url>bib7.ulb.ac.be:2200</url> <xi:include href="explain.xml"/> <cql2rpn>pqf.properties</cql2rpn> </target> <target name=“slavko" default="1"> <url>velma.library.mun.ca:2200</url> <xi:include href="explain.slavko.xml"/> <cql2rpn>pqf.slavko.properties</cql2rpn> explain.xml ZeeRex XML record as response to ‘explain’ operation pqf.properties specifies the mapping of various CQL indexes, relations, etc. into Type-1 query attributes

SRU frontend for Unicorn YAZ Proxy http://bib49.ulb.ac.be:9000/Cible? version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1& query=title all "einstein albert“& stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl _ http://bib49.ulb.ac.be:9000/Slavko? version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1& query=title all "einstein albert“& stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl _

SRU frontend : use case @ ULB Seamless integration of catalog searches in CMS Typo3 Example HTML page containing biography of famous belgian historian Henri Pirenne frame pointing to the following URL: http://bib49.ulb.ac.be:9000/Cible? version=1.1&operation=searchRetrieve&maximumRecords=10&startRecord=1& query=pirenne%20and%20epub-dnu-* &stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl Project Unicorn contains descriptions of databases, websites, etc with local thematic classification codes in 653 create thematic websites within our CMS, containing frames that list available databases per theme