New Digital Library Possibilities Using the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH) Michael L. Nelson Old Dominion University.

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit –
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Version 2 of the OAI-PMH & some other stuff 2 nd Workshop on the OAI, CERN Geneva, October 17 th 2002 Herbert Van de Sompel Los Alamos National Laboratory.
1 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Alon Kadury.
Infrastructures for Using Metadata RSS and OAI-PMH CS 431 – March 14, 2005 Carl Lagoze – Cornell University.
New Developments in OAI Michael L. Nelson Old Dominion University OA-Forum May 13-14, 2002 Pisa, Italy Many.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Advanced Overview of Version 2.0 of the Open Archives Initiative Protocol for Metadata Harvesting Michael L. Nelson Old Dominion University Norfolk VA.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata Harvesting Interoperable digital collections.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
OAI: Past, Present and Future Michael L. Nelson several slides stolen from Herbert Van de Sompel Open Archives Meeting Institute of Mechanical.
Metadata Harvesting Interoperable digital collections.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
OAI Overview Michael L. Nelson Old Dominion University Norfolk Virginia, USA Bioinformatics Seminar ODU CS 791/891.
The Open Archives Initiative Protocol for Metadata Harvesting: Overview Jewel Ward Visiting Scholar, Keio University Lib-Sys Seminar, Keio University,
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Herbert van de sompel & carl lagoze Herbert Van de Sompel Los Alamos National Laboratory – Research Library Carl Lagoze Cornell University – Computer Science.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2006.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives Hussein Suleman, Digital.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Metadata Harvesting Interoperable digital collections.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA ISTEC / NSF.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0 Hussein Suleman Virginia Tech DLRL 25 March 2002.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
OAI: XML-Based Digital Library Interoperability Michael L. Nelson NASA Langley Research Center
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
ODU CS CS 695 Fall 2002 Michael L. Nelson Introduction to Digital Libraries Week 10: The Open Archives Initiative Old Dominion University.
ODU CS 751/851 Fall 2006 Michael L. Nelson Introduction to Digital Libraries Week 9: The Open Archives Initiative Old Dominion University.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
NDLTD Standards, Metadata and the OAI-PMH Hussein Suleman University of Cape Town October 2003.
Metadata Harvesting - OAI-PMH
Getting a Leg Up on OAI for the NSDL
Georges Arnaout Chaitanya Krishna
CS431 guest lecture Simeon Warner
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
Old Dominion University Department of Computer Science
Old Dominion University Department of Computer Science
Open Archive Initiative
Presentation transcript:

New Digital Library Possibilities Using the Open Archives InitiativeProtocol for Metadata Harvesting (OAI-PMH) Michael L. Nelson Old Dominion University Norfolk Virginia, USA International Conference on Scientific Electronic Publishing in Developing Countries Valparaiso, Chile October 2, 2002 Several Slides Also from Van de Sompel & Warner

Random Thoughts 1.Thanks to the Organizing Committee for inviting me 2.Me deseo habla prestado la atencion a mis clases del Espanol de la escuela secundaria… 3.Publishers & Editors: if you want increased coverage, exposure and readership, you must “do” OAI…

Outline OAI-PMH history and technical highlights –a full technical review is out of the scope of this presentation Example data provider user Example service provider uses Implicatations for authors and editors Looking to the future

Open Archives Initiative The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management can still apply!) Archive defined as a “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. OAI is happening at break-neck speed...

The Rise and Fall of Distributed Searching wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice –Davis & Lagoze, JASIS 51(3), pp –Powell & French, Proc 5 th ACM DL, pp distributed searching of N nodes still viable, but only for small values of N NCSTRL: N > 100; bad NTRS/NIX: N<=20; ok (but could be better)

The Rise and Fall of Distributed Searching Other problems of distributed searching (from STARTS) –source-metadata problem how do you know which nodes to search? –query-language problem syntax varies and drifts over time between the various nodes –rank-merging problem how do you meaningfully merge multiple result sets? Temptations: –centralize all functions “everything will be done at X” –standardize on a single product “everyone will use system Y”

Santa Fe Convention [02/2000] goal: optimize discovery of e-prints input: the UPS prototype RePEc /SODA “data provider / service provider model” Dienst protocol deliberations at Santa Fe meeting [10/99]

Data Providers –publishing into an archive –providing methods for metadata “harvesting” provide non-technical context for sharing information also Service Providers –harvest metadata from providers –implement user interface to data Self-describing archives –Much of the learning about the constituent UPS archives occurred out of band… Data and Service Providers Even if these are done by the same DL, these are distinct roles

Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata –data remains at remote repositories user... search for “cfd applications” local copy of metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained all searching, browsing, etc. performed on the metadata here individual nodes can still support direct user interaction

low-barrier interoperability specification metadata harvesting model: data provider / service provider focus on document-like objects autonomous protocol HTTP based XML responses unqualified Dublin Core experimental: months OAI-PMH v.1.0 [01/2001]

abouteprints document like objects resourcesmetadata OAMS unqualified Dublin Core unqualified Dublin Core transport HTTP responsesXML requests HTTP GET/POST verbs Dienst OAI-PMH natureexperimental stable model metadata harvesting metadata harvesting metadata harvesting Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0

OAI-PMH 2.0 Good news: OAI-PMH is still Six Verbs + Dublin Core Incremental improvements –single XML schema –ambiguities removed –more expressive options –cleaner separation of roles & responsibilities Bad news: not backwards compatible with 1.1

Dublin Core Dublin Core Metadata Initiative – –from , recognizing the need for simple, interoperable metadata for resource discovery –good overview of metadata & DC: –15 elements (qualifiers possible)

OAI Mechanics Request is encoded in http Response is encoded in XML XML Schemas for the responses are defined in the OAI-PMH document

Overview of OAI-PMH Verbs VerbFunction Identifydescription of archive ListMetadataFormatsmetadata formats supported by archive ListSetssets defined by archive ListIdentifiersOAI unique ids contained in archive ListRecordslisting of N records GetRecordlisting of a single record metadata about the repository harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

protocol vs periphery clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines

OAI-PMH vs HTTP clear separation of OAI-PMH and HTTP OAI-PMH error handling all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb) http codes 302, 503, etc. still available to implementers, but no longer represent OAI-PMH events

resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp set-membership is item-level property resource – item - record

other general changes better definitions of harvester, repository, item, unique identifier, record, set, selective harvesting oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression

other general changes all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling

T08:55:46Z oai:arXiv:cs/ cs math ….. response no errors note no http encoding of the OAI-PMH request

T08:55:46Z ShowMe is not a valid OAI-PMH verb response with error with errors, only the correct attributes are echoed in

resumptionToken harvester RDBMS ListRecords Records , resumptionToken=AXad31 ListRecords, resumptionToken=AXad31 Records , resumptionToken=pQ22-x ListRecords, resumptionToken=pQ22-x Records scenario: harvesting 2770 records in 3 separate 1000 record “chunks”

idempotency of resumptionToken : return same incomplete list when rT is reissued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new, optional attributes for the resumptionToken: expirationDate completeListSize cursor resumptionToken

harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ other granularities considered, but ultimately rejected granularity of from and until must be the same harvesting granularity

Identify more expressive Identify Library of Congress transient T00:00:00Z YYYY-MM-DDThh:mm:ssZ deflate

header contains set membership of item header oai:arXiv:cs/ cs math ….. eliminates the need for the “double harvest” 1.x required to get all records and all set information

ListIdentifiers returns headers ListIdentifiers T08:55:46Z oai:arXiv:hep-th/ physic:hep oai:arXiv:hep-th/ physic:hep physic:exp ……

introduction of provenance container to facilitate tracing of harvesting history provenance oai:r1:plog/ T13:00:02Z oai_dc T12:01:30Z … … …

introduction of friends container to facilitate discovery of repositories friends

NASA example (1) A light weight, DP-centric method to communicate the existence of “others”

…</friends/ harvester Identify NASA example (2)

introduction of branding container for DPs to suggest rendering & association hints <branding xmlns=" xmlns:xsi=" xsi:schemaLocation=" MySite(tm) <metadataRendering metadataNamespace=" mimeType="text/xsl"> <metadataRendering metadataNamespace=" mimeType="text/css"> branding

revision of oai-identifier <oai-identifier xmlns=" identifier" xmlns:xsi=" xsi:schemaLocation=" identifier oai oai-stuff.foo.org : oai:oai-stuff.foo.org:5324 oai-identifier domain based repository names

SOAP implementation Result set filtering Multiple / “best” metadata GetRecord -> GetRecords Machine readable rights management XML format for “mini-archives” did not make it into OAI-PMH v.2.0

Resources on DL projects are typically spent in 2 areas: –creating & maintaining the collection data provider –developing access services for the collection (searching, browsing, etc.) service provider OAI-PMH allows for specialization based on resources / interest So What Does OAI-PMH Mean for Your Digital Library?

NACA Report 1345 as seen through its native DL

NACA Report 1345 as seen through MAGiC

NACA Report 1345 as seen through its Scirus (Elsevier)

NACA Report 1345 as seen through my.OAI (FS Consulting)

Scientific Communication With only some exceptions, which interface is used for discovery is not as important as the fact that discovery occurred in the first place… –“control” of the discovered objects is not “lost” by data providers however, higher level mirroring services can be built on top of OAI (cf. NACA & ARC mirroring between NASA LaRC and MAGiC) The real power of OAI-PMH derives as much from what it does not do as what it actually does

What Does OAI-PMH Mean for Authors? On the surface, absolutely nothing! –the ideal OAI deployment should be absolutely invisible to normal DL operations –uninterested users should not even notice or care Indirectly, they should enjoy the benefits of the critical mass of current and developing DL tools & systems –personal, institutional data providers –proliferation of targetted, value-added service providers

What Does OAI-PMH Mean For Editors? Absolutely everything… The decoupling of SPs and DPs will have significant and profound implications on scientific and technical information exchange –OAI-PMH is actually just one component in a larger engineering effort for scholarly communication (e.g. OpenURL) Service and resource integration will be the focus of journals, professional societies, universities, etc. –OAI-PMH will be a basic, core technology for scientific publishing as http & XML

Field of Dreams It should be easy to be a data provider, even if it makes more work for the service provider. –if enough data providers exist, the service providers will come (DPs >> SPs) Open-source / freely available tools –“drop-in” data providers: industrial strength: personal size: –tools to make your existing DL a data provider: also: OAI-implementers mailing list / mail archive! –service providers: Arc:

OAI Observation: Front-End Only No input/registry mechanism –OAI harvesting protocol is always a front-end for something else filesystem, Dienst, RDBMS, LDAP, etc. –convenient for pre-existing DLs, but does not address “new” DLs e.g., “we want to do OAI” Bounds the scope of OAI –responsibilities and domain of OAI are still be discussed –tension between functionality and simplicity

OAI Observation: No T&C Possible to use multiple OAI servers in a DMZ-like configuration… Public OAI Server Private OAI Server Source database OAI requests from trusted hosts OAI requests from arbitrary hosts could even use a separate copy of the database…

OAI Observation: No T&C Possible to use OAI harvesting protocol in closed, restricted systems OAI 1OAI 2 OAI 3OAI 4 all OAI requests originate from these 4 DLs

Metadata –Q: “Which format should I use?” A: any/all of them… –lowest common denominator: unqualified Dublin Core –Again, little known about actual behavior will DC be actually be useful? or too lossy? will communities create/adopt specific formats? will native (presumably richer) formats be harvested? we very much want this to happen... “The Return of MARC” ?!

The Future: Community Building Ultimately, protocols and metadata formats are not what makes a difference Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML) The best current example: The Open Language Archives Community – OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends

Backup Slides

Detailed Review of the OAI-PMH 2.0 Verbs

Identify Arguments –none Errors –none Arguments –none Errors –badArgument

ListMetadataFormats Arguments –identifier (OPTIONAL) Errors –id does not exist Arguments –identifier (OPTIONAL) Errors –badArgument –noMetadataFormats –idDoesNotExist

ListSets Arguments –resumptionToken (EXCLUSIVE) Errors –no set hierarchy Arguments –resumptionToken (EXCLUSIVE) Errors –badArgument –badResumptionToken –noSetHierarchy

ListIdentifiers Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) Errors –no records match Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFormat –badResumptionToken –noSetHierarchy –noRecordsMatch

ListRecords Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –no records match –metadata format cannot be disseminated Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –noRecordsMatch –cannotDisseminateFormat –badResumptionToken –noSetHierarchy –badArgument

GetRecord Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –id does not exist –metadata format cannot be disseminated Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFormat –idDoesNotExist

Argument Summary metadataPrefixfromuntilsetresumptionTokenidentifier Identify  ListMetadata Formats  optional ListSets  exclusive  ListIdentifiers  optional exclusive  ListRecords  optional exclusive  GetRecord   

Error Summary Identify BA ListMetadata Formats BANMFIDDNE ListSets BABRTNSH ListIdentifiers BABRTCDFNRMNSH ListRecords BABRTCDFNRMNSH GetRecord BACDFIDDNE Generate badVerb on any input not matching the 6 defined verbs this is an inversion of the table in section 3.6 of the OAI-PMH specification