Download presentation
Presentation is loading. Please wait.
Published byTyrone Boyd Modified over 9 years ago
1
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop
2
Harvesting in Europeana: workflow and requirements Best-practices Recommendations Common issues Tools / Software Resources Documentation Table Of Content
3
1.Determine collections to be contributed Questionnaire Harvesting in Europeana
4
2.Obtain OAI-PMH repository parameters: –Absolute minimum (enough for fully implemented, tested and documented OAI repositories) Server base URL –Very useful to have: Mapping between described collection(s) and OAI- PMH set(s) Prefix of metadata format to use preferably for Europeana (if not described in ListMetadataFormats response): ex: oai_dc, mods, tel, ese Harvesting in Europeana
5
3.Configuration of harvester 4.Full harvest with ListRecords request –Records collected in XML files ≤ 10MB –Harvest stored in SVN Harvesting in Europeana
7
Compliancy to OAI-PMH 2.0 protocol specifications http://www.openarchives.org/OAI/openarchivesprotocol.html. http://www.openarchives.org/OAI/openarchivesprotocol.html Follow implementation guidelines OAI-PMH v2 for repository implementers http://www.openarchives.org/OAI/2.0/guidelines-repository.htm http://www.openarchives.org/OAI/2.0/guidelines-repository.htm Full functional tests!! Best-practices: implementation
8
OAI validation = Your OAI repository correctly implements the OAI-PMH! Correct response to all OAI-PMH requests: with arguments, various error conditions, every XML schema of every OAI response is valid,... Best-practices: OAI validation
9
Follow the Open Archive Initiative Protocol Testing Validate your server using the validator supplied by the OAI. http://www.openarchives.org/data/registerasprovider.html Without registering clicking checkbox "only validate and do not register (you may then register later)." Recommended approach to OAI validation
10
http://www.openarchives.org/data/registerasprovider.html
11
#Protocol_Conformance_Testing
12
http://www.openarchives.org/data/registerasprovider.htmlhttp://www.openarchives.org/data/registerasprovider.html => bottom of the page
14
Set = "an optional construct for grouping items for the purpose of selective harvesting.“ Issues and recommendations: sets
15
Number of obstacles related to sets: Interpreting how a repository has organized sets and determining which sets to harvest –Issue: setName not human understandable and/or no setDescription provided. –Issue: Large number of sets to sort through. Knowing when there are records that belong to no sets –Issue: Items that belong to no sets are included in the OAI repository. Knowing when there are empty sets –Issue: Data provider exposes sets with no records.
16
Number of obstacles related to sets: Understanding relationships between sets –Issue: Relationships between sets are not expressed. Mechanism to express relationships between hierarchical sets But no mechanism to express relationships between overlapping sets! The only way to know: harvest the identifiers or records which contain the header information sets record belongs to
17
Number of obstacles related to sets: Knowing how many records there are within a set before harvesting –Issue: Not expressing how many records are within a set which can be expressed via a completeListSize attribute in a resumptionToken or within the set description. Knowing when a set structure has been substantially changed –Issue: Changes in a set structure has not been communicated
18
No single best practice for the organization of sets. Realistically: data providers organize sets in a way which best meets the needs of their primary service provider and can be easily done within their own internal workflows. Useful to organize the metadata items into sets according to the collections of resources they represent. –Concept of collections varies and not completely clear in Europeana. –Useful for harvester to understand notion of collection for data providers Sets: recommendations
19
Repository implementation following OAI- PMH v2.0 + tested Inform Europeana harvesting responsible of any repository changes / maintenance No regular harvesting schema determined yet “SLA” between data providers and harvesters Basic requirements
20
Unavailability / unreliability of repository server Implementation of OAI-PMH v2 incomplete –resumptionToken not supported –Only ListIdentifiers XML syntax errors Character encoding errors Short lifetime of resumptionToken Common issues
21
TEL/Europeana OAI-PMH Harvester – Offline documentation –Harvester –Java standalone application with GUI –Multiple harvesting jobs –Resuming unfinished jobs –Logging –No scheduling, No configuration interface Tools / Software
22
REPOX - http://repox.ist.utl.pt/ Repository + Harvester Java standalone application with web GUI Multiple harvesting jobs, Scheduler Statistics Management of XML metadata repository –Versioning and identification of records –Different metadata format –User interface to create metadata crosswalks: Schema mapper Tools / Software
23
OAIcat from OCLC - http://www.oclc.org/research/software/oai/cat.htm http://www.oclc.org/research/software/oai/cat.htm Framework conforming to the OAI-PMH v2.0 Repository + Harvesting Java web application Scheduling, logging Limited scalability (~2M records) Tools / Software
24
Other implementations in different languages to plug-in into a Library Management System: –PHP: OAIbiblio data provider implementation of the OAI-PMH, version 2.0. This toolkit can be easily customized to communicate with an already existing, multi-table MySQL database –PERL: Celestial OAI aggregator/cache application that imports OAI metadata from version 1.0,1.1,2.0 OAI-compliant repositories, and re-exposes that metadata through either an aggregated or per-repository OAI- compliant 2.0 interface. Celestial requires oai-perl v2, MySQL, Perl 5.6.x and a CGI-capable web server –Ruby: ruby-oai Includes a client library, a server/provider library and a interactive harvesting shell –Python: pyoai package enables high-level access to an OAI-PMH Metadata Repository and also implements a framework for quickly creating OAI-PMH compliant servers Tools / Software (TELplus D2.1)
25
ESE XML validation schemas developed by partners Tools / Software
26
The Open Archives Initiative Protocol for Metadata Harvesting v2.0 http://www.openarchives.org/OAI/openarchivesproto col.html http://www.openarchives.org/OAI/openarchivesproto col.html TELplus D2.1, “ OAI-PMH implementation and tools guidelines ”, 21 pages –Protocol overview and description of main concepts –OAI-PMH implementation in libraries –References Resources
28
Wiki “Best Practices for OAI Data Provider Implementations and Shareable Metadata”: Excellent source of guidelines, tutorials, recommendations, implementation softwares and tools, references etc... http://webservices.itcs.umich.edu/mediawiki/oaibp/in dex.php/Main_Page http://webservices.itcs.umich.edu/mediawiki/oaibp/in dex.php/Main_Page Resources
34
Requirements: –Europeana OAI-PMH Harvesting –Europeana OAI-PMH Repositories ESE XML validation schema Europeana OAI-PMH data providers registry & forum/mailing list –Local systems –OAI-PMH repository solution –Contact Documentation in Europeana context
35
Thank you Questions? Remarks?... Julie.Verleyen@kb.nl
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.