OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH for Resource Harvesting Herbert Van de Sompel Digital.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

RESEARCH LIBRARY Content Packaging for Complex Objects MPEG – 21 1 February 2007 Frances Knudson Repository Team Los Alamos National Laboratory Research.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Open Archives Initiative Michael L. Nelson Computer Science,
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Depositing e-material to The National Library of Sweden.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
UKOLN is supported by: A non-technical introduction to: OAI-ORE ( Defining Image Access project meeting.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
What’s New from the OAI Herbert Van de Sompel Michael Nelson Simeon Warner Carl Lagoze CERN workshop on Innovations.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
An Update from the OAI Herbert Van de Sompel Carl Lagoze Michael Nelson Simeon Warner CNI Task Force Meeting December 7 th 2004, Portland, OR.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAIResource Software Her This work supported in part by the.
A New Model for Web Resource Harvesting Michael L. Nelson Old Dominion University joint work with: Her Herbert Van de Sompel Xiaoming Liu Carl Lagoze Simeon.
ECDL 2005, September 18 th - 23 th 2005, Vienna, Austria File-based storage of Digital Objects: XMLtapes & Internet Archive ARC files Xiaoming Liu, Luda.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
The DNER - a national digital library Andy Powell ZIG Meeting, York October 2001 UKOLN, University of Bath UKOLN is funded by Resource:
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland A New Model for Web Resource Harvesting Her This work supported.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
SCIELO AS AN OPEN ARCHIVE: the development of SciELO / OpenArchives data provider interface Prof. Carlos H. Marcondes Federal Fluminense University/ Information.
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team Multi-Graph.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Van de Sompel, Herbert Los Alamos National Laboratory – Research Library OAI-PMH for Resource Harvesting.
UKOLN is supported by: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK AULIC Institutional Repositories Meeting University.
Sharing With the Open Archives Initiative Jenn Riley Metadata Librarian Indiana University.
Introduction to metadata
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
An Update on the OAI-ORE Project CNI Spring 2007 Task Force Meeting, Phoenix AZ, April 17, 2007 Lagoze, Nelson & Van de Sompel An Update on the Open Archives.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
NSDL October 12-15, 2003Eisenhower National Clearinghouse Slide 1 NSDL and the Open Archives Initiative NSDL – OAI – and the Eisenhower National Clearinghouse.
OAI Object Reuse & Exchange: Atom Serialization Nordbib Workshop, September , Stockholm, Sweden OAI-ORE: Atom Serialization The ORE Editors are:
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Archive Ingest and Handling Test: ODU’s Perspective Michael L. Nelson Department of Computer Science Old Dominion University
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
UKOLN is supported by: Content packaging and MPEG-21 DID Andy Powell, UKOLN, University of Bath JISC Joint Programmes Meeting, July.
Herbert Van de Sompel Research Library, Los Alamos National Laboratory OAI4, October , CERN, Geneva, Switzerland RESEARCH LIBRARY Lessons in.
VIVA Special Collections Committee GRANT MEETING January 26, 2007 METADATA: The Who, What, Why, Where, and When Bob Vay George Mason University.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Mod_oai: Metadata Harvesting for Everyone Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Aravind Elango
What’s New from the OAI Herbert Van de Sompel Michael Nelson Simeon Warner Carl Lagoze CERN workshop on Innovations.
LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
Building A Repository for Digital Objects
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
A New Model for Web Resource Harvesting
An Update from the OAI <
Open Archive Initiative
Institutional Repositories
Presentation transcript:

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH for Resource Harvesting Herbert Van de Sompel Digital Library Research & Prototyping Team Research Library, Los Alamos National Laboratory Michael Nelson Computer Science Department Old Dominion University

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Tutorial Outline OAI-PMH for Resource Harvesting: problem statement and conceptual solution MPEG-21 DIDL: An XML-based Complex Object Format for OAI-PMH- based Resource Harvesting Accurate mirroring the collection of the American Physical Society using OAI-PMH-based Resource Harvesting mod_oai: An OAI-PMH-based model for Web Resource Harvesting OAIResource: A software tool for OAI-PMH-based Resource Harvesting

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Resource Harvesting: Use cases Discovery: use content itself in the creation of services o search engines that make full-text searchable o citation indexing systems that extract references from the full-text content o browsing interfaces that include thumbnail versions of high-quality images from cultural heritage collections Preservation: o periodically transfer digital content from a data repository to one or more trusted digital repositories o trusted digital repositories need a mechanism to automatically synchronize with the originating data repository

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Resource Harvesting: Use cases Discovery: o Institutional Repository & Digital Library Projects: UK JISC, DARE, DINI o Web search engines: competition for content (cf Google Scholar) Preservation: o Institutional Repository & Digital Library Projects: UK JISC, DARE, DINI o Library of Congress: NDIIP Archive Export/Ingest, e-deposit OAI-PMH is well-established. Can OAI-PMH be used for Resource Harvesting?

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches Typical scenario: 1.An OAI-PMH harvester harvests Dublin Core records from the OAI-PMH repository. 2.The harvester analyzes each Dublin Core record, extracting dc.identifier information in order to determine the network location of the described resource. 3.A separate process, out-of-band from the OAI-PMH, collects the described resource from its network location.

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Issue 1  Locating the resource based on information provided in dc.identifier  dc.identifier used to convey a variety of identifier: (simultaneously) URL DOI, bibliographic citation, … Not expressive enough to distinguish between identifier, locator.  Several derferencing attempts required  URI provided in dc.identifier is commonly that of a bibliographic “splash page”  How to know it is a bibliographic “splash page”, not the resource?  If it is a bibliographic “splash page”, where is the resource?

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Issue 2  Using the OAI-PMH datestamp of the Dublin Core record to trigger incremental harvesting:  Datestamp of DC record does not necessarily change when resource changes no metadata updatemetadata update no resource updateOKunnecessary resource download resource updatemissed resource update OK DC record datestamp no change DC record datestamp change

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Conventions  Conventions address Issue 1; Issue 2 can not really be addressed.  First dc.identifier is locator of the resource  what if the resource is not digital?  Use of dc.format and/or dc.relation to convey locator

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Conventions A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films Vorobiev, A. ING-INF/01 Elettronica A parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one... Microwave engineering Europe 2002 Documento relativo ad una Conferenza o altro Evento PeerReviewed pdf locator of resource splash page

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Conventions … … locator of resource splash page

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Conventions … … locator of resource splash page

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Existing OAI-PMH based approaches : Other attempts  dc.identifier leads to splash page & splash page contains special purpose XHTML link to resource(s)  What if there is no splash page?  How does a harvester know he is in this situation?  OA-X: protocol extension  OK in local context  Strategic problem to generalize  How to consolidate with OAI-PMH data model  Qualified Dublin Core  Could bring expressiveness to distinguish between locator & identifier  But what with datestamp issue?

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Proposed OAI-PMH based approach  Use metadata formats that were specifically created for representation of digital objects:  Complex Object Formats as OAI-PMH metadata formats o MPEG-21 DIDL, METS,..

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH data model resource item Dublin Core metadata METS records OAI-PMH identifier = entry point to all records pertaining to the resource MPEG-21 DIDL metadata pertaining to the resource simplehighly expressive more expressive highly expressive MARCXML metadata

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Complex Object Formats : characteristics Representation of a digital object by means of a wrapper XML document Represented resource can be: o simple digital object (consisting of a single datastream) o compound digital object (consisting of multiple datastreams) Unambiguous approach to convey identifiers of the digital object and its constituent datastreams Include datastream: o By-Value: embedding of base64-encoded datastream o By-Reference: embedding network location of the datastream o not mutually exclusive; equivalent Include a variety of secondary information o By-Value o By-Reference o Descriptive metadata, rights information, technical metadata, …

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films Vorobiev, A. application/pdf …

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Complex Object Formats & OAI-PMH Resource represented via XML wrapper => OAI-PMH Uniform solution for simple & compound objects Unambiguous expression of locator of datastream Disambiguation between locators & identifiers OAI-PMH datestamp changes whenever the resource (datastreans, secondary information) changes OAI-PMH semantics apply: “about” containers, set membership

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH based approach using Complex Object Format Typical scenario: 1.An OAI-PMH harvester checks for support of a complex object format using the ListMetadataFormats verb 2.The harvester harvests the complex object metadata. Semantics of the OAI-PMH datestamp guarantee that new and modified resources are detected. 3.A parser at the end of the harvesting application analyzes each harvested complex object record: -The parser extracts the bitstreams that were delivered By-Value. -The parser extracts the unambiguous references to the network location of bitstreams delivered By-Reference. 4.A separate process, out-of-band from the OAI-PMH, collects the bitstreams delivered By-Reference from the extracted network locations.

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Complex Object Formats & OAI-PMH : existing implementations LANL Repository o Local storage of Terrabytes of scholarly assets o Assets stored as MPEG-21 DIDL documents o DIDL documents made accessible to downstream applications via the OAI-PMH Mirroring of American Physical Society collection at LANL o Maps APS document model to MPEG-21 DIDL Transfer Profile o Exposes MPEG-21 DIDL documents through OAI-PMH infrastructure o Inlcudes digests/signatures DSpace & Fedora plug-ins o Maps DSpace/Fedora document model to MPEG-21 DIDL Transfer Profile o Exposes MPEG-21 DIDL documents through OAI-PMH infrastructure mod_oai

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Complex Object Formats & OAI-PMH : issues Which Complex Object Format(s) How to Profile Complex Object Format(s) for OAI-PMH Harvesting Large “records” Compound objects with multiple datastreams. What if only 1 datastream gets updated? Because the resource is represented as, can rights pertaining to the resource be expressed according to the “rights for metadata” OAI-rights guideline? Tools: o Software library to write compliant complex objects o Integration of this library with repository systems (Fedora, DSpace, eprints.org, ….) o Software to harvest resources based on OAI-PMH model

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Readings Herbert Van de Sompel, Michael Nelson, Carl Lagoze, Simeon Warner. Resource Harvesting witin the OAI-PMH Framework. D-Lib Magazine. December