OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – BCS Metadata Meeting, London 29 th May 2002 (Many slides borrowed from Michael L. Nelson)
OAI 2.0 Public, stable not released yet … (but very close) –Beta released mid-May –Public release scheduled: 1 st June 2.0 implementations in the pipeline –British Library, Cornell Univ, Ex Libris, my.OAI, Humbolt Univ, InQuirion Pty Ltd, Library of Congress, NASA, OCLC, Old Dominion Univ, U. of Illinois, U. of Southampton, UCLA, John Hopkins U., Indiana U., NYU, UKOLN, Virginia Tech
Open Archives Initiative The protocol is openly documented, and metadata is exposed to at least some peer group (note: rights management can still apply!) Archive defined as a collection of stuff -- not the archivists definition of archive. Repository used in most OAI documents. OAI is happening at break-neck speed...
Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata –Resources remain at remote repositories user... search for cfd applications local copy of metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained all searching, browsing, etc. performed on the metadata here individual nodes can still support direct user interaction
Metadata Harvesting Repositories (archives etc.) = low implementation cost Services = higher implementation cost Similar to web search model –DP9 gateway makes it exactly the same
abouteprints document like objects resourcesmetadata OAMS unqualified Dublin Core unqualified Dublin Core transport HTTP responsesXML requests HTTP GET/POST verbs Dienst OAI-PMH natureexperimental stable model metadata harvesting metadata harvesting metadata harvesting Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0
OAI-PMH v.2.0 [06/2002] Goal: recurrent exchange of metadata about resources between systems Input: OAI-PMH v.1.0 [01/01 – 09/02] feedback on OAI-implementers deliberations by OAI-tech [09/01 -] alpha test group of OAI-PMH v.2.0 [03/02 -]
low-barrier interoperability specification metadata harvesting model: data provider / service provider metadata about resources autonomous protocol distinction between protocol and periphery community-specific extensions HTTP based XML responses unqualified Dublin Core stable (1.0 characterized as experimental) OAI-PMH v.2.0 [06/2002]
OAI Data Model: Resources / Items / Records resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp
Overview of OAI Verbs VerbFunction Identifydescription of archive ListMetadataFormatsmetadata formats supported by archive ListSetssets defined by archive ListIdentifiersOAI unique ids contained in archive ListRecordslisting of N records GetRecordlisting of a single record archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)
Identify Arguments –none Errors –none Arguments –none Errors –badArgument
ListMetadataFormats Arguments –identifier (OPTIONAL) Errors –id does not exist Arguments –identifier (OPTIONAL) Errors –badArgument –noMetadataFormats –idDoesNotExist
ListSets Arguments –resumptionToken (EXCLUSIVE) Errors –no set hierarchy Arguments –resumptionToken (EXCLUSIVE) Errors –badArgument –badResumptionToken –noSetHierarchy
ListIdentifiers Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) Errors –no records match Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFormat –badResumptionToken –noSetHierarchy –noRecordsMatch
ListRecords Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –no records match –metadata format cannot be disseminated Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –noRecordsMatch –cannotDisseminateFormat –badResumptionToken –noSetHierarchy –badArgument
GetRecord Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –id does not exist –metadata format cannot be disseminated Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFor mat –idDoesNotExist
T08:55:46Z oai:arXiv:cs/ cs math ….. response no errors
T08:55:46Z ShowMe is not a valid OAI-PMH verb response with error
Idempotency of resumptionToken: return same incomplete list when rT is re-issued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new attributes for the resumptionToken: expirationDate completeListSize cursor resumptionToken Flow-Control
evolution from talking about OAI-PMH to talking about projects that use OAI-PMH to talking about projects and failing to mention they use OAI-PMH => OAI-PMH becomes part of the infrastructure Adoption
49 registered repositories [11/2001] 65 registered repositories [03/2002] 77 registered repositories [05/2002] 5+ million records many unregistered repositories private implementations (e.g. RDN) Data Providers (a.k.a. repositories)
Arc: cross-searching of registered repositories [ ] CiteBase: research literature search + citation ranking [ ] OLAC: cross-searching of Language Archive Community repositories [ ] Service Providers
Scirus scientific search engine [Elsevier] [ ] my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] [ ] Growing interest from web search engines S ervice Providers
Repository Explorer: interactive exploration of repositories [Virginia Tech] [ ] eprints.org: generic OAI-PMH compliant repository software [U of Southampton] [ ] ALCME repository and harvester software [OCLC] [ ] APIs, others OAI-PMH tools