Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.

Slides:



Advertisements
Similar presentations
Adding OAI-ORE Support to Repository Platforms Alexey Maslov, Adam Mikeal, Scott Phillips, John Leggett, Mark McFarland Texas Digital Library TCDL09.
Advertisements

The REPOX system Nuno Freire -
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
The Biosafety Clearing-House of the Cartagena Protocol on Biosafety Tutorial – BCH Resources.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
OAI in DigiTool DigiTool Version 3.0.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
NAL-Institutional Repository: A Case Study CSIR Metadata Harvester I.R.N. Goudar Head, ICAST, NAL National Symposium on Open Access and.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SCIELO AS AN OPEN ARCHIVE: the development of SciELO / OpenArchives data provider interface Prof. Carlos H. Marcondes Federal Fluminense University/ Information.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Introduction to Archon for CARLI Members Jen Masciadrelli, Library Systems Coordinator, CARLI Office Sarah Horowitz, Special Collections Librarian, Augustana.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
A Training Program for Shareable Metadata Metadata for You & Me is a collaboration between the University of Illinois Library and Indiana University. This.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI metadata: why and how Jenn Riley Metadata Librarian Indiana University.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Yannis Ioannidis, Professor Evita Mailli University of Athens Dept. of Informatics & Telecom. MaDgIK Lab.
Software & Technologies: an overview
Getting a Leg Up on OAI for the NSDL
Georges Arnaout Chaitanya Krishna
The Re3gistry software and the INSPIRE Registry
OAI and Metadata Harvesting
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research Laboratory Virginia.
Open Archive Initiative
Prepared by Peter Boško, Luxembourg June 2012
IVOA Interoperability Meeting - Boston
SDMX IT Tools SDMX Registry
Presentation transcript:

Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop

Harvesting in Europeana: workflow and requirements Best-practices Recommendations Common issues Tools / Software Resources Documentation Table Of Content

1.Determine collections to be contributed Questionnaire Harvesting in Europeana

2.Obtain OAI-PMH repository parameters: –Absolute minimum (enough for fully implemented, tested and documented OAI repositories) Server base URL –Very useful to have: Mapping between described collection(s) and OAI- PMH set(s) Prefix of metadata format to use preferably for Europeana (if not described in ListMetadataFormats response): ex: oai_dc, mods, tel, ese Harvesting in Europeana

3.Configuration of harvester 4.Full harvest with ListRecords request –Records collected in XML files ≤ 10MB –Harvest stored in SVN Harvesting in Europeana

Compliancy to OAI-PMH 2.0 protocol specifications  Follow implementation guidelines OAI-PMH v2 for repository implementers Full functional tests!! Best-practices: implementation

OAI validation = Your OAI repository correctly implements the OAI-PMH!  Correct response to all OAI-PMH requests: with arguments, various error conditions, every XML schema of every OAI response is valid,... Best-practices: OAI validation

Follow the Open Archive Initiative Protocol Testing Validate your server using the validator supplied by the OAI. Without registering  clicking checkbox "only validate and do not register (you may then register later)." Recommended approach to OAI validation

#Protocol_Conformance_Testing

=> bottom of the page

Set = "an optional construct for grouping items for the purpose of selective harvesting.“ Issues and recommendations: sets

Number of obstacles related to sets: Interpreting how a repository has organized sets and determining which sets to harvest –Issue: setName not human understandable and/or no setDescription provided. –Issue: Large number of sets to sort through. Knowing when there are records that belong to no sets –Issue: Items that belong to no sets are included in the OAI repository. Knowing when there are empty sets –Issue: Data provider exposes sets with no records.

Number of obstacles related to sets: Understanding relationships between sets –Issue: Relationships between sets are not expressed. Mechanism to express relationships between hierarchical sets But no mechanism to express relationships between overlapping sets! The only way to know: harvest the identifiers or records which contain the header information  sets record belongs to

Number of obstacles related to sets: Knowing how many records there are within a set before harvesting –Issue: Not expressing how many records are within a set which can be expressed via a completeListSize attribute in a resumptionToken or within the set description. Knowing when a set structure has been substantially changed –Issue: Changes in a set structure has not been communicated

No single best practice for the organization of sets. Realistically: data providers organize sets in a way which best meets the needs of their primary service provider and can be easily done within their own internal workflows. Useful to organize the metadata items into sets according to the collections of resources they represent. –Concept of collections varies and not completely clear in Europeana. –Useful for harvester to understand notion of collection for data providers Sets: recommendations

Repository implementation following OAI- PMH v2.0 + tested Inform Europeana harvesting responsible of any repository changes / maintenance No regular harvesting schema determined yet “SLA” between data providers and harvesters Basic requirements

Unavailability / unreliability of repository server Implementation of OAI-PMH v2 incomplete –resumptionToken not supported –Only ListIdentifiers XML syntax errors Character encoding errors Short lifetime of resumptionToken Common issues

TEL/Europeana OAI-PMH Harvester – Offline documentation –Harvester –Java standalone application with GUI –Multiple harvesting jobs –Resuming unfinished jobs –Logging –No scheduling, No configuration interface Tools / Software

REPOX - Repository + Harvester Java standalone application with web GUI Multiple harvesting jobs, Scheduler Statistics Management of XML metadata repository –Versioning and identification of records –Different metadata format –User interface to create metadata crosswalks: Schema mapper Tools / Software

OAIcat from OCLC Framework conforming to the OAI-PMH v2.0 Repository + Harvesting Java web application Scheduling, logging Limited scalability (~2M records) Tools / Software

Other implementations in different languages to plug-in into a Library Management System: –PHP: OAIbiblio data provider implementation of the OAI-PMH, version 2.0. This toolkit can be easily customized to communicate with an already existing, multi-table MySQL database –PERL: Celestial OAI aggregator/cache application that imports OAI metadata from version 1.0,1.1,2.0 OAI-compliant repositories, and re-exposes that metadata through either an aggregated or per-repository OAI- compliant 2.0 interface. Celestial requires oai-perl v2, MySQL, Perl 5.6.x and a CGI-capable web server –Ruby: ruby-oai Includes a client library, a server/provider library and a interactive harvesting shell –Python: pyoai package enables high-level access to an OAI-PMH Metadata Repository and also implements a framework for quickly creating OAI-PMH compliant servers Tools / Software (TELplus D2.1)

ESE XML validation schemas developed by partners Tools / Software

The Open Archives Initiative Protocol for Metadata Harvesting v2.0 col.html col.html TELplus D2.1, “ OAI-PMH implementation and tools guidelines ”, 21 pages –Protocol overview and description of main concepts –OAI-PMH implementation in libraries –References Resources

Wiki “Best Practices for OAI Data Provider Implementations and Shareable Metadata”: Excellent source of guidelines, tutorials, recommendations, implementation softwares and tools, references etc... dex.php/Main_Page dex.php/Main_Page Resources

Requirements: –Europeana OAI-PMH Harvesting –Europeana OAI-PMH Repositories ESE XML validation schema Europeana OAI-PMH data providers registry & forum/mailing list –Local systems –OAI-PMH repository solution –Contact Documentation in Europeana context

Thank you Questions? Remarks?...