Getting a Leg Up on OAI for the NSDL

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Accessing Distributed Resources Information: An OLAC perspective Steven Bird Gary Simons Chu-Ren Huang Melbourne SIL Academia Sinica ENABLER/ELSNET Workshop.
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
DDI3 Uniform Resource Names: Locating and Providing the Related DDI3 Objects Part of Session: DDI 3 Tools: Possibilities for Implementers IASSIST Conference,
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
OAI in DigiTool DigiTool Version 3.0.
ComPADRE Experiences developing an OAI server over an existing database repository Resources for Physics and Astronomy Education Lyle Barbato American.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Infrastructures for Using Metadata RSS and OAI-PMH CS 431 – March 14, 2005 Carl Lagoze – Cornell University.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Making Metadata Work for the NSDL. Starting from Sept with...  A prototype with not much behind it that was re-usable (
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Thomas G. Habing – University of Illinois at Urbana-Champaign Recap: SIGIR 2001 OAI Workshop 19 September OAI Provider Workshop, University of.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata Harvesting Interoperable digital collections.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
Open Archives Initiative Protocol for Metadata Harvesting.
OAI Tools By Thomas G. Habing Grainger Engineering Library Information Center University.
Metadata Harvesting Interoperable digital collections.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Introduction to OAI Static Repositories By Thomas G. Habing Grainger Engineering Library.
Software & Technologies: an overview
Metadata Harvesting - OAI-PMH
Metadata Frameworks ADN - for online/offline resources
An Overview of Data-PASS Shared Catalog
Georges Arnaout Chaitanya Krishna
XML Schemas for Dublin Core Metadata
CS431 guest lecture Simeon Warner
OAI and Metadata Harvesting
Metadata for research outputs management Part 2
OAI 11/20/07.
Tech introduction.
Open Archive Initiative
WebDAV Design Overview
IVOA Interoperability Meeting - Boston
CSE591: Data Mining by H. Liu
Presentation transcript:

Getting a Leg Up on OAI for the NSDL Naomi Dushay NSDL Core Integration Cornell University

What is OAI? Open Archives Initiative … Protocol for Metadata Harvesting (OAI-PMH) intended as an easy way to share metadata over the internet “pull” model of exchange

OAI Harvesting OAI query OAI Repository Service Using OAI Harvester Metadata OAI Harvester Service Using Harvested Metadata OAI query OAI Repository OAI response Metadata OAI Repository OAI query OAI response

How Does OAI Work? OAI Protocol runs on top of HTTP Requests for data encapsulated in URLs: http://[someOAIBaseURL]?verb=[oai verb]{other arguments as needed} Responses are XML documents

Required Know-How HTTP: sending XML responses to HTTP GET and POST requests Web server XML: namespaces (URIs and prefixes), XML Schema validity XML schema validator(s) Possibly XML schema creation Metadata: it depends on your situation

OAI and the NSDL Metadata Repository (“union catalog”) “normalized” metadata with Qualified Dublin Core as its base, to improve: services (e.g. search results, or UI display) metadata quality, when possible predictability of data for re-harvesting services automated harvest/expose model, with OAI at each end

OAI in the NSDL Infrastructure Your collection’s metadata Your collection’s OAI server other OAI Services NSDL Metadata Repository (MR) NSDL MR OAI server Your collection’s metadata, scrubbed & normalized NSDL Search Service NSDL Archive Service http://nsdl.org

Automated MR ingest process Your collection’s OAI server Validation OAI Harvest NSDL Collection Registration “raw” or “native” metadata Validation Normalize normalized metadata NSDL MR OAI server Metadata Repository Notify collection of problems; May need to halt processing

OAI-PMH: Key points OAI-PMH requests are embedded in HTTP it’s a web request/response service, not a flat file XML, not HTML multiple metadata formats are allowed OAI ≠ simple DC only! Each metadata format MUST have a valid XML schema

Metadata Formats and Schemas XML namespace XML Schema location OAI metadataPrefix Simple Dublin Core, OAI flavor http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd oai_dc Qualified Dublin Core, latest NSDL flavor http://ns.nsdl.org/nsdl_dc_v1.02/ http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc_v1.02.xsd (As you like; We use “nsdl_dc”) Your format (An appropriate URI) (URL for an XML schema) (As you like)

MR ingest requires: compliant OAI 2.0 server Correct implementation of OAI-PMH: correct responses to all queries Every OAI response must be (deeply) XML schema valid Proper encoding in proper places XML encoding URL encoding UTF-8 encoding

OAI 2.0 – Identify baseURL email address(es) protocol version description for OAI identifier syntax, especially if adhering to oai-identifier syntax described in Implementation Guidelines

OAI 2.0 – ListMetadataFormats correct XML namespace for each format a valid XML schema for each format targetNamespace MUST match XML namespace above super easy out: use oai_dc easy out: use nsdl_dc

OAI 2.0 – ListSets super easy out: if all your metadata is NSDL relevant, don’t use sets for our sake. if you want the NSDL to harvest only SOME of your OAI server’s metadata, then use sets. We will harvest only the sets you specify … but our default is to harvest all of them. super easy setSpec strings: use only alpha-num characters

OAI 2.0 – ListRecords Every metadata record served must (deeply) validate to its indicated XML schema If used, resumptionTokens must be implemented properly resumptionToken is an exclusive argument Last response has an empty resumptionToken Selective Harvesting must work properly “from” and “until” arguments must limit the results appropriately “set” arguments must limit the results appropriately, if implemented

Common Points of Confusion - 1 about the metadata vs. about the resource identifiers: OAI vs. DC record/header/identifier vs. record/metadata/../dc:identifier dates: OAI vs. DC record/header/datestamp vs. record/metadata/../dc:date OAI about containers are about the metadata rights: OAI about vs. DC record/about/../(dc:rights?) vs. record/metadata/../dc:rights

OAI identifiers Must uniquely identify individual metadata records at your site for OAI harvest and OAI reharvest Must stay the same for your metadata records metadata is updated; OAI identifier unchanged

Common Points of Confusion - 2 Dates format confusion OAI dates must be encoded as ISO8601 and must be in UTC (≈ GMT) OAI-PMH allows YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. DC date encoding – “Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.” <responseDate> (All OAI-PMH responses) Time when OAI server responds to a request OAI-PMH sez: ‘must be the time and date of the response in UTC.  This is encoded using the "Complete date plus hours, minutes, and seconds" variant of ISO8601 . This format is YYYY-MM-DDThh:mm:ssZ.’ <datestamp> (OAI-PMH <record>/<header>) “from” and “until” arguments in OAI requests <dc:date>

When a Collection Deletes Records if not indicated in OAI server incremental harvest for MR never shows update; MR copy never deleted! if indicated in OAI server transiently reharvested soon enough   not reharvested soon enough  incremental harvest for MR never shows update; MR copy never deleted! if OAI server indicated and persistent MR finds delete on incremental harvest  

In an ideal world, we’d like nsdl_dc Information, example records, etc. in the NSDL Metadata Primer Persistent deleted records OAI identifier syntax, per OAI Implementation Guidelines

How do we normalize metadata? Perform “safe” transforms to “smarten up” metadata XSL stylesheets -- from your XML metadata to our normalized XML metadata Principles: Do no harm (Don’t lose information) Add information, when possible Indicate schemes for valid values Remove meaningless text “…”, “not available”, “-” Empty elements Correct wrong information “text/pdf”  “application/pdf” Remove characters that impede functionality or display Encoding fixes (e.g. “&”, double XML encodings, bad UTF-8 …) Scrub URLs

Automated MR Ingest process Your collection info and harvesting info is registered OAI validation – can we run our harvester on your OAI server? (see handouts) OAI harvest of your metadata (nsdl_dc if available; oai_dc if not; other formats soon) XML schema validation of all of your metadata UTF-8 encoding validation (bad UTF-8 chars changed into harmless ones) Normalized nsdl_dc created. Your metadata, “raw” and normalized, is loaded into the MR tables and made available to the NSDL’s MR OAI server.

Deleted Records – Our Solution “Full reharvest” Mark all the site’s records in MR “deleted” Harvest all metadata records for the collection As we ingest each newly retrieved record into the MR, if we over-write an old record, “un-delete” it. Expensive network bandwidth processing time Okay for small collections (under ~15,000) Okay for metadata that changes infrequently