herbert van de sompel & carl lagoze Herbert Van de Sompel Los Alamos National Laboratory – Research Library Carl Lagoze Cornell University – Computer Science the OAI Protocol for Metadata Harvesting an update
herbert van de sompel & carl lagoze o rigins & e volution of OAI-PMH p rocess leading to OAI-PMH v.2.0 w hat’s new in OAI-PMH v.2.0? w hat’s next?
herbert van de sompel & carl lagoze e volution towards OAI-PMH v.2.0 OAI-PMH 1.0 [01/2001] OAI-PMH 2.0 [06/2002] Santa Fe Convention [02/2000]
herbert van de sompel & carl lagoze abouteprints document like objects resourcesmetadata OAMS unqualified Dublin Core unqualified Dublin Core transport HTTP responsesXML requests HTTP GET/POST verbs Dienst OAI-PMH natureexperimental stable model metadata harvesting metadata harvesting metadata harvesting Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0
herbert van de sompel & carl lagoze Santa Fe Convention [02/2000] goal: optimize discovery of e-prints input: the UPS prototype RePEc data provider / service provider model Dienst protocol deliberations at Santa Fe meeting [10/99]
herbert van de sompel & carl lagoze Santa Fe Convention [02/2000] low-barrier interoperability specification metadata harvesting model: data provider / service provider focus on eprints (e.g. OAMS format) Dienst subset HTTP based XML responses experimental
herbert van de sompel & carl lagoze OAI-PMH v.1.0 [01/2001] goal: optimize discovery of document-like objects input: SFC DLF meetings on metadata harvesting deliberations at Cornell meeting [09/00] alpha test group of OAI-PMH v.1.0
herbert van de sompel & carl lagoze low-barrier interoperability specification metadata harvesting model: data provider / service provider focus on document-like objects autonomous protocol HTTP based XML responses unqualified Dublin Core experimental: months OAI-PMH v.1.0 [01/2001]
herbert van de sompel & carl lagoze OAI-PMH v.2.0 [06/2002] goal: recurrent exchange of metadata about resources between systems input: OAI-PMH v.1.0 feedback on OAI-implementers deliberations by OAI-tech [09/01 -] alpha test group of OAI-PMH v.2.0 [03/02 -]
herbert van de sompel & carl lagoze low-barrier interoperability specification metadata harvesting model: data provider / service provider metadata about resources autonomous protocol HTTP based XML responses unqualified Dublin Core stable OAI-PMH v.2.0 [06/2002]
herbert van de sompel & carl lagoze p rocess leading to OAI-PMH v.2.0 pre-alpha phase alpha-phase creation of OAI-tech beta-phase
herbert van de sompel & carl lagoze created for 1 year period charge: review functionality and nature of OAI-PMH v.1.0 investigate extensions release stable version of OAI-PMH by 05/02 determine need for infrastructure to support broad adoption of the protocol communication: listserv, SourceForge, conference calls creation of OAI-tech [06/01]
herbert van de sompel & carl lagoze US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Muhammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton) OAI-tech
herbert van de sompel & carl lagoze review process by OAI-tech: identification of issues conference call to filter/combine issues white paper per issue on-line discussion per white paper proposal for resolution of issue by OAI-exec discussion of proposal & closure of issue conference call to resolve open issues pre-alpha phase [09/01 – 02/02]
herbert van de sompel & carl lagoze creation of revised protocol document in-person meeting Lagoze - Van de Sompel - Nelson – Warner autonomous decisions internal vetting of protocol document pre-alpha phase [02/02]
herbert van de sompel & carl lagoze alpha-1 release to OAI-tech March 1st 2002 OAI-tech extended with alpha testers discussions/implementations by OAI-tech ongoing revision of protocol document alpha phase [02/02 – 05/02]
herbert van de sompel & carl lagoze The British Library Cornell U. -- NSDL project & e-print arXiv Ex Libris FS Consulting Inc -- harvester for my.OAI Humboldt-Universität zu Berlin InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC OAI-PMH 2.0 alpha testers (1/2)
herbert van de sompel & carl lagoze OAI-PMH 2.0 alpha testers (2/2) Old Dominion U. -- ARC, DP9 U. of Illinois at Urbana-Champaign U. Of Southampton -- OAIA, CiteBase, eprints.org UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection UKOLN, U. of Bath -- RDN Virginia Tech -- repository explorer
herbert van de sompel & carl lagoze beta phase [05/02] beta release on May 1st 2002 to : registered data providers and service providers interested parties fine tuning of protocol document preparation for the release of 2.0 conformant tools by alpha testers
herbert van de sompel & carl lagoze w hat’s new in OAI-PMH v.2.0? corrections new functionality general changes to improve solidity of protocol quick recap
herbert van de sompel & carl lagoze service providerdata provider Requests Replies repositoryrepository harvesterharvester 6 OAI-PMH
herbert van de sompel & carl lagoze Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository service providerdata provider harvesterharvester
herbert van de sompel & carl lagoze service providerdata provider Datestamp Identifier Set Records repositoryrepository harvesterharvester
herbert van de sompel & carl lagoze general changes clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines
herbert van de sompel & carl lagoze general changes clear separation of OAI-PMH and HTTP OAI-PMH error handling all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb)
herbert van de sompel & carl lagoze general changes notion of item has become prominent resource / item / record metadata can be disseminated from item item == identifier record == identifier, datestamp, metadataPrefix
herbert van de sompel & carl lagoze general changes better definitions of harvester, repository, item, unique identifier, record, datestamp, set oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression
herbert van de sompel & carl lagoze general changes all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling
herbert van de sompel & carl lagoze T08:55:46Z oai:arXiv:cs/ cs math ….. response no errors
herbert van de sompel & carl lagoze T08:55:46Z ShowMe is not a valid OAI-PMH verb response with error
herbert van de sompel & carl lagoze corrections all dates/times are UTC, encoded in ISO8601, Z-notation T20:30:00.00Z
herbert van de sompel & carl lagoze idempotency of resumptionToken : return same incomplete list when rT is reissued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp expirationDate attribute for rT corrections
herbert van de sompel & carl lagoze harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ granularity of from and until must be the same new functionality
herbert van de sompel & carl lagoze Identify more expressive new functionality Library of Congress transient T00:00:00Z YYYY-MM-DDThh:mm:ssZ deflate
herbert van de sompel & carl lagoze header contains set membership of item new functionality oai:arXiv:cs/ cs math …..
herbert van de sompel & carl lagoze ListIdentifiers returns headers new functionality T08:55:46Z oai:arXiv:hep-th/ physic:hep oai:arXiv:hep-th/ physic:hep physic:exp ……
herbert van de sompel & carl lagoze ListIdentifiers mandates metadataPrefix as argument new functionality verb=ListIdentifiers &metadataPrefix=olac &from= &until= &set=Perseus:collection:PersInfo
herbert van de sompel & carl lagoze character set for metadataPrefix and setSpec extended to URL-safe characters new functionality A-Z a-z 0-9 _ ! ‘ $ ( ) + -. *
herbert van de sompel & carl lagoze introduction of provenance container to facilitate tracing of harvesting history in the periphery oai:r1:plog/ T13:00:02Z oai_dc T12:01:30Z … … …
herbert van de sompel & carl lagoze introduction of friends container to facilitate discovery of repositories in the periphery
herbert van de sompel & carl lagoze revision of oai-identifier guidelines for collection-level and set-level metadata in the periphery
herbert van de sompel & carl lagoze f uture adoption communities OAI-PMH
herbert van de sompel & carl lagoze release of OAI-PMH v.2.0 [06/2002] no backwards compatibility with v.1.0/1.1 stable migration process for registered repos ? formal standardization ? ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ? the OAI-PMH
herbert van de sompel & carl lagoze proliferation of community-specific add-ons for: collection & set level metadata expressive metadata formats (e.g. qualified DC XML Schema) shared set-structures machine readable rights (about the metadata) communities
herbert van de sompel & carl lagoze evolution from talking about OAI-PMH to talking about projects that use OAI-PMH to talking about projects and failing to mention they use OAI-PMH => OAI-PMH becomes part of the infrastructure adoption
herbert van de sompel & carl lagoze I just wanted to report what I consider an OAI success. I discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials Initiative service without the need for a single or phone call. They reported that it was working very well for them. [Caroline Arms, Library of Congress]
herbert van de sompel & carl lagoze
herbert van de sompel & carl lagoze i ndicators of a doption of OAI-PMH tools structural support service providers data providers
herbert van de sompel & carl lagoze 49 registered repositories [11/2001] 65 registered repositories [03/2002] 5+ million records many unregistered repositories data providers
herbert van de sompel & carl lagoze Arc : cross-searching of registered repositories [Old Dominion U] [ ] OLAC: cross-searching of Language Archive Community repositories service providers
herbert van de sompel & carl lagoze Scirus scientific search engine [Elsevier] [ ] my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] [ growing interest from web search engines service providers
herbert van de sompel & carl lagoze Repository Explorer: interactive exploration of repositories [Virginia Tech] [ ] eprints.org: generic OAI-PMH compliant repository software [U of Southampton] [ ] ALCME repository and harvester software [OCLC] [ ] OAI-PMH tools
herbert van de sompel & carl lagoze Kepler [Old Dominion U] your personal OAI data provider: Kepler archivelet the Kepler service provider harvests from archivelets that register archivelet downloadable mlhttp:// ml exploration
herbert van de sompel & carl lagoze DP9 [Old Dominion U] provides entry page to repositories for web- crawlers provides bookmarkable URL for OAI record provides resolution of OAI identifier into metadata software downloadable exploration
herbert van de sompel & carl lagoze CNI & DLF support the day-to-day operation of the OAI Executive structural support
herbert van de sompel & carl lagoze Metadata Harvesting Initiative of the Mellon Foundation NSF funded NSDL UK FAIR call for proposals to support disclosure of institutional assets (papers, learning materials, etc.) several EC projects exploring/supporting usage of OAI-PMH: TEL, Leaf, Cyclades, OA Forum, Figaro structural support