The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.

Slides:



Advertisements
Similar presentations
OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit –
Advertisements

A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Version 2 of the OAI-PMH & some other stuff 2 nd Workshop on the OAI, CERN Geneva, October 17 th 2002 Herbert Van de Sompel Los Alamos National Laboratory.
1 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Alon Kadury.
Infrastructures for Using Metadata RSS and OAI-PMH CS 431 – March 14, 2005 Carl Lagoze – Cornell University.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Building Digital Libraries on Open Archives Donatella Castelli IEI-CNR Italy.
Some thoughts on OpenURL version 1.0 Herbert Van de Sompel Los Alamos National Laboratory – Research Library NISO AX meeting, Getty Museum, May
New Developments in OAI Michael L. Nelson Old Dominion University OA-Forum May 13-14, 2002 Pisa, Italy Many.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
IST Uwe Müller Humboldt University Berlin OAI-PMH Implementation - Tutorial -
LIS654 lecture repository interoperability Thomas Krichel
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata Harvesting Interoperable digital collections.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
IST Tutorial 1 OAI and OAI-PMH for absolute beginners a non-technical introduction Monica Duke UKOLN, University of Bath, United Kingdom
OAI-PMH The Open Archives Initiative Protocol for Metadata Harvesting Presenter: Knud Möller Friday,
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
The Open Archives Initiative Protocol for Metadata Harvesting: Overview Jewel Ward Visiting Scholar, Keio University Lib-Sys Seminar, Keio University,
New Digital Library Possibilities Using the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH) Michael L. Nelson Old Dominion University.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Herbert van de sompel & carl lagoze Herbert Van de Sompel Los Alamos National Laboratory – Research Library Carl Lagoze Cornell University – Computer Science.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2006.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
IST Uwe Müller Humboldt University Berlin OAI-PMH Implementation - Tutorial -
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research.
Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives Hussein Suleman, Digital.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Metadata Harvesting Interoperable digital collections.
Herbert van de sompel Frye Leadership Institute Emory University, June 11th 2002 Herbert Van de Sompel Los Alamos National Laboratory – Research Library.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0 Hussein Suleman Virginia Tech DLRL 25 March 2002.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
ODU CS CS 695 Fall 2002 Michael L. Nelson Introduction to Digital Libraries Week 10: The Open Archives Initiative Old Dominion University.
Herbert van de sompel CNI FALL 2000 – San Antonio, Texas – December 8th 2000 Closing Keynote Address Herbert Van de Sompel Cornell University Computer.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
NDLTD Standards, Metadata and the OAI-PMH Hussein Suleman University of Cape Town October 2003.
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2009.
Metadata Harvesting - OAI-PMH
Getting a Leg Up on OAI for the NSDL
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research Laboratory.
Georges Arnaout Chaitanya Krishna
CS431 guest lecture Simeon Warner
OAI and Metadata Harvesting
Open Archive Initiative
Presentation transcript:

the OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library

The Open Archives Initiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. “…the joint impact of these and future initiatives can be substantially higher when interoperability between them [e-print archives] can be established…” Paul Ginsparg, Rick Luce & Herbert Van de Sompel

winerequest for funding Luce, Van de Sompel, Ginsparg

A&I federated services imageFTXTOPACe-print

metadata harvesting via OAI-PMH metadata A&IimageOPACe-print FTXT harvester FTXT

federated services via OAI-PMH metadata A&IimageFTXTe-print Author Title Abstract Identifer OPAC

abouteprints document like objects resourcesmetadata OAMS unqualified Dublin Core unqualified Dublin Core transport HTTP responsesXML requests HTTP GET/POST verbs Dienst OAI-PMH natureexperimental stable model metadata harvesting metadata harvesting metadata harvesting Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0

the OAI-PMH service providerdata provider Requests Replies repositoryrepository harvesterharvester 6

Core concepts in OAI-PMH low-barrier interoperability data-provider & service-provider model metadata harvesting model OAI-PMH Dublin Core HTTP based Community specific, oai-rights Reply XML Schema Self contained shared metadata format and parallel, community- specific metadata formats acceptable use

resource item Dublin Core metadata MARCXML metadata records entry point to all records pertaining to the resource metadata pertaining to the resource OAI-PMHidentifier metadataPrefix datestamp OAI-PMH identifierOAI-PMH sets OAI-PMH data model

OAI-PMH harvesting tools Supporting protocol requests: Identify ListMetadataFormats ListSets repositoryrepository service providerdata provider harvesterharvester

supporting protocol requests ListMetadataFormats ListMetadataFormats / Time / Request REPEAT Format prefix Format XML schema /REPEAT repositoryrepository service providerdata provider harvesterharvester

Purpose –Return general information about the archive and its policies (e.g., datestamp granularity) Parameters –None Sample URL – Supporting : Identify

Identify Library of Congress T00:00:00Z transient YYYY-MM-DDThh:mm:ssZ deflate

harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ other granularities considered, but ultimately rejected granularity of from and until must be the same harvesting granularity

Purpose –List metadata formats supported by the repository as well as their schema locations and namespaces Parameters –identifier – for a specific record (O) Sample URL – Supporting : ListMetadataFormats

Purpose –Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) Parameters –None Sample URL – Supporting : ListSets

OAI-PMH harvesting tools Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository service providerdata provider harvesterharvester

OAI-PMH harvesting tools service providerdata provider Datestamp Identifier Set Records repositoryrepository harvesterharvester

harvesting requests * from=a * until=b * set=klm ListRecords * metadataPrefix=dc ListRecords / Time / Request REPEAT Identifier Datestamp Metadata /REPEAT repositoryrepository service providerdata provider harvesterharvester

Purpose –Returns the metadata (specific format) for a single item in the form of a record Parameters –identifier – unique id for item (R) –metadataPrefix – identifier of metadata format for the record (R) Sample URL – bin/oai2_0?verb=GetRecord&identifier=oai:lcoa1.loc.gov:loc.gdc/lhb cb.00835&metadataPrefix=oai_dchttp://memory.loc.gov/cgi- bin/oai2_0?verb=GetRecord&identifier=oai:lcoa1.loc.gov:loc.gdc/lhb cb.00835&metadataPrefix=oai_dc Harvesting : GetRecord

Purpose –Retrieves the metadata (specific format) for multiple items in the form of records Parameters –from – start datestamp (O) –until – end datestamp (O) –set – set to harvest from (O) –resumptionToken – flow control mechanism (X) –metadataPrefix – metadata format (R) Sample URL – bin/oai2_0?verb=ListRecords&metadataPrefix=oai_dchttp://memory.loc.gov/cgi- bin/oai2_0?verb=ListRecords&metadataPrefix=oai_dc Harvesting : ListRecords

Purpose –List headers for all items corresponding to the specified parameters Parameters –from – start datestamp (O) –until – end datestamp (O) –set – set to harvest from (O) –metadataPrefix – metadata format to list identifiers for (R) –resumptionToken – flow control mechanism (X) Sample URL – bin/oai2_0?verb=ListIdentifiers&metadataPrefix=oai_dchttp://memory.loc.gov/cgi- bin/oai2_0?verb=ListIdentifiers&metadataPrefix=oai_dc Harvesting : ListIdentifiers

header contains set membership of item header oai:arXiv:cs/ cs math …..

ListIdentifiers returns headers ListIdentifiers T08:55:46Z oai:arXiv:hep-th/ physic:hep oai:arXiv:hep-th/ physic:hep physic:exp ……

Not (necessarily) identifier of the resource Each item must have a globally unique identifier identifiers must follow rules for valid URIs Example: – oai: : – oai:etd.vt.edu:etd Each identifier must resolve to a single item and always to the same item –Can’t reuse OAI item identifiers OAI-PMH identifiers

Needed for every OAI record to support incremental harvesting Must be updated when addition or modification or deletion made in order to ensure changes are correctly propagated to harvesters –Also for dynamically generated metadata formats Different from dates within the metadata – OAI datestamp is used only for harvesting Can be either YYYY-MM-DD or YYYY-MM- DDThh:mm:ssZ (must be GMT timezone) OAI-PMH datestamps

requests must be submitted using the GET or POST methods of HTTP repositories must support both methods OAI-PMH request format

formatted as HTTP responses content type must be text/xml status codes (distinguished from OAI-PMH errors) e.g. 302 (redirect), 503 (service not available) OAI-PMH response format

response format: well formed XML: XML declaration ( ) OAI-PMH root element three child elements responseDate (UTC datetime) request (request that generated this response) a) error (in case of an error or exception condition) b) element with the name of the OAI-PMH request OAI-PMH response format

T08:55:46Z oai:arXiv:cs/ cs math ….. OAI-PMH response, no errors

T08:55:46Z ShowMe is not a valid OAI-PMH verb OAI-PMH response, error

repositories must indicate OAI-PMH errors inclusion of one or more error elements defined errors: –badArgument –badResumptionToken –badVerb –cannotDisseminateFormat –idDoesNotExist –noRecordsMatch –noMetaDataFormats –noSetHierarchy OAI-PMH errors

flow control on two protocol levels HTTP (503, retry-after) OAI-PMH, resumptionToken HTTP “retry-after” mechanism can be used in order to delay requests of clients resumptionTokens are used to return parts (incomplete lists) of the result. client receives a resumptionToken which can be used to issue another request – in order to receive further parts of the result Flow control

four of the request types return a list of entries three of them may reply ‘large’ lists OAI-PMH supports partitioning response decision on partitioning: repository response to a request includes incomplete list resumption token + expiration date, size of complete list, cursor (optional) new request with same request type resumption token as parameter all other parameters omitted! Flow control

HarvesterRepository Service Provider Data Provider “want to have all your records” archive.org/oai?verb=ListRecords& metadataPrefix=oai_dc “have 267, but give you only 100” 100 records + resumptionToken “anyID1” “want more of this” archive.org/oai?resumptionToken=anyID1 “have 267, give you another 100” 100 records + resumptionToken “anyID2” “want more of this” archive.org/oai?resumptionToken=anyID2 “have 267, give you my last 67” 67 records + resumptionToken “” Flow control

provenance container to facilitate tracing of harvesting history oai:r1:plog/ T13:00:02Z oai_dc T12:01:30Z … … … record-level “about” container

rights container to express rights pertaining to metadata W3C XML schema defines format for package to be included in container id, datestamp, sets metadata: DC, MARCXML, … … provenance, branding etc. record-level “about” container

friends container to facilitate dynamic discovery of repositories repository-level “description” container

Protocol document Validation tool Repository and harvesting tools Registries of public OAI-PMH repositories more info

The OAI-PMH protocol is a low-barrier interoperability specification for the recurrent exchange of metadata between systems Things become really cool when allowing flexibility re the interpretation of metadata. Indeed: in OAI-PMH metadata is XML-formatted data pertaining to the resource