1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.

Slides:



Advertisements
Similar presentations
Heinrich Stamerjohanns Institute for Science Networking Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University.
Advertisements

The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
OAI in DigiTool DigiTool Version 3.0.
Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Introducing Symposia : “ The digital repository that thinks like a librarian”
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
NAL-Institutional Repository: A Case Study CSIR Metadata Harvester I.R.N. Goudar Head, ICAST, NAL National Symposium on Open Access and.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
IWIR-CRIS '06 Data retrieval in PURE Data retrieval in the 4-year old PURE CRIS project at 9 universities.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SCIELO AS AN OPEN ARCHIVE: the development of SciELO / OpenArchives data provider interface Prof. Carlos H. Marcondes Federal Fluminense University/ Information.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
© 2006 Cisco Systems, Inc. All rights reserved.1 Connection 7.0 Serviceability Reports Todd Blaisdell.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Introduction to KE EMu
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
DSpace - Digital Library Software
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Architecture Review 10/11/2004
Getting a Leg Up on OAI for the NSDL
An Overview of Data-PASS Shared Catalog
Flanders Marine Institute (VLIZ)
HR Portal Team Dr. Ashraf Armoush Supervisor Ala’eddeen Awwad
OAI and Metadata Harvesting
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Open Archive Initiative
Presentation transcript:

1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February 2009

2 Overview Introduction Introduction The main requirements for OAI-PMH harvester The main requirements for OAI-PMH harvester Selection and rational Selection and rational Requirements for Data Providers Requirements for Data Providers OAI framework workflow and the six verbs OAI framework workflow and the six verbs AGRIS Network and OAI-PMH AGRIS Network and OAI-PMH Setup of a harvester Setup of a harvester Installation Installation Technical details Technical details Main functions Main functions Management and trouble shooting Management and trouble shooting Results, summary and conclusions Results, summary and conclusions Next steps Next steps

3 Introduction Main role of a harvester: To set up a mechanism for automatic gathering of metadata and saving it in a common place (central repository) as a file system or database

4 The main requirements for OAI-PMH harvester To retrieve and define remote OAI data providers for harvesting, To retrieve and define remote OAI data providers for harvesting, To collect data from them according to the rules and requirements of OAI-PMH protocol (usually it is done automatically) To collect data from them according to the rules and requirements of OAI-PMH protocol (usually it is done automatically) To ensure saving of this data at the central file system or database repository for further indexing and search at the service provider (portal) To ensure saving of this data at the central file system or database repository for further indexing and search at the service provider (portal)

5 Many harvesters available as OSS Selection (Pro and cons) Selection (Pro and cons)  PKP harvester PKP harvester PKP harvester  OCLC harvester OCLC harvester OCLC harvester Evaluation and testing Evaluation and testing  PKP harvester PKP harvester PKP harvester  OCLC harvester OCLC harvester OCLC harvester Selection of OCLC harvester and its adaptation to the existing AGRIS flow Selection of OCLC harvester and its adaptation to the existing AGRIS flow

6 The requirements for OAI-PMH Data providers Exposing data over Internet according to the 6 verbs of OAI-PMH Exposing data over Internet according to the 6 verbs of OAI-PMH To allow selective harvesting by date/set To allow selective harvesting by date/set Use of Resumption Tokens for flow control Use of Resumption Tokens for flow control To ensure a response compression, validation and normalization of the data. To ensure a response compression, validation and normalization of the data.

7 OAI framework HA RV ES TE R R E P O SI T O RI E S OAI-PMH request for selective harvesting:Datestamp,Set OAI-PMH XML records Service provider Data provider DP – ensures that the Internet accessible institutional repositories expose metadata for their digital objects to harvesters following OAI-PMH rules SP – operates harvester as means of collecting metadata and provides extended services using harvested metadata The quality of the service is proportional to the quality of the data harvested.

8 Workflow: database - OAI-PMH-harvester Harvester ISISOAI (OAI plug-in/ Java layer) WWWISIS or wxis CDS/ISIS database XML response Service providerData provider Script interaction to database Script: bin/oaiagris.exe?database=agris&search_type=query&query=ID=UY &table=mont&lang=oai&format_n ame=oaidchttp://www4.fao.org/cgi- bin/oaiagris.exe?database=agris&search_type=query&query=ID=UY &table=mont&lang=oai&format_n ame=oaidc OAI request Request: uruguay%3AUY uruguay%3AUY

9 OAI-PMH: the six verbs VerbFunction Identify Describes the repository ListMetadataFormats Gives all metadata formats supported by this repository ListSets Describes the possible subsets defined by repository (semantic or type of doc.) ListIdentifiers Lists record identifiers for given set/date- range/metadata format from this repository ListRecords Gives all records for given set/date- range/metadata format from this repository GetRecordGet a single record by identifier

10 OAIagris Data agregator hosting metadata (KAINet) OAIcat Not on Internet Accessible on Internet OAIagris Service provider OAI -DC OAI - AGRIS AP Data Harvester AGRIS Service provider FAOBIB OAI AGRIS AP Data Harvester Service provider OAISter Data Harvester Data provider Repository OAIagris Local database OAI DC File system XML repository Data provider Harvester Service provider KAINet Service provider AGRIS services AGRIS network

11 Technical details Customized Java application on the top of OCLC Harvester2 that provides an OAI-PMH harvester framework Customized Java application on the top of OCLC Harvester2 that provides an OAI-PMH harvester framework Harvester2 Open Source Software (OSS) ready to be included in the CVS repository Open Source Software (OSS) ready to be included in the CVS repository Framework used in this project: Framework used in this project:  Hibernate (Object Relation Mapping (ORM) for RDBMS independency), persistence layer  Quartz (for the scheduling framework)  Prototype framework AJAX for the Web user interface (mainly used for AGRIS centers information)  RDBMS (MySQL) database to keep statistics

12 Setup of a harvester Installation Installation Register data providers to be harvested Register data providers to be harvested(parameters) Establish schedule procedure (parameters) Establish schedule procedure (parameters) Define output files and where to be saved Define output files and where to be saved

13 Installation: Installation of Tomcat Installation of Tomcat Installation of Java Installation of Java Installation of MySQL Installation of MySQL Installation of harvester Installation of harvester

14 Functionalities: Scheduler Scheduler Data Provider Data Provider  Add new  List/ Modify/ Delete Statistics Statistics  List Data Providers  Trace Log

15 Define parameters for each Data Provider Activate or Deactivate data provider Activate or Deactivate data provider Title * Title * Description Description URL * URL * Data Provider's Name Data Provider's Name Administrator's Administrator's Metadata Format * Metadata Format * Set Specification Set Specification Start Date / YYYY / MM DD Start Date / YYYY / MM DD

16 Define data providers (DP) Requires Title and URL to identify DP Requires Title and URL to identify DP Dynamic recognition of the data provider’s parameters using OAI-PMH verb (Identify, Listset, metadataPrefix) Dynamic recognition of the data provider’s parameters using OAI-PMH verb (Identify, Listset, metadataPrefix) Additional information taken from the AGRIS data providers (mdb file) Additional information taken from the AGRIS data providers (mdb file)  center code (CC), name and acronym  description of the participating center  search in AGRIS portal etc.

17 Parameters for metadata format and subset selection Available subsets as defined in ListSets OAI- PMH and selection of the one suitable for AGRIS (if not selected the whole database will be harvested) Available formats for storage from ListMetadataFormats:  AGRIS AP  DC  others

18 Defining schedule for each data provider Continuous (runs every N minutes) Continuous (runs every N minutes) Daily (runs every day at a given time) Daily (runs every day at a given time) Weekly (runs every week at a given day and time) Weekly (runs every week at a given day and time) Monthly (runs every month at a given day and time) Monthly (runs every month at a given day and time)

19 Data storage parameters * Identify format/type of storage * Identify format/type of storage * File prefix for the data provider * File prefix for the data provider *

20 List of defined data providers List/Delete or Modify the parameters for a data provider List/Delete or Modify the parameters for a data provider Trace log for each data provider Trace log for each data provider

21 List of Data providers defined for harvesting

22 Scheduler /status of the harvesting As for topic Two

23 Define a Data Provider for harvesting

24

25 List of Data providers expanded for delete or modify

26 Statistics:Trace log

27 Statistics: Trace log

28 Results from the harvesting/Trace logs

29 Structure of the result XML files Ordered by Data provider by format by subset

30 Result file from FAOBIB harvesting

31 Management of the harvesting Status (active/not active) Status (active/not active) Management of errors Management of errors Statistics kept in the MySQL database including: Statistics kept in the MySQL database including:  the last range harvested;  the date of last harvesting done for starting the next harvesting  number of records harvested;  name of the XML files generated Administration Administration

32 What was done until now: Harvester developed (shown to the group) Harvester developed (shown to the group) Testing with more than 15 different repositories (SciELO, Orton Library, FAOBIB, BIBSYS, National Library of Portugal, hosted WEBAGRIS databases (Uruguay, Peru) Testing with more than 15 different repositories (SciELO, Orton Library, FAOBIB, BIBSYS, National Library of Portugal, hosted WEBAGRIS databases (Uruguay, Peru) Fixing of bugs and a lot of new FAO requirements (or changes) Fixing of bugs and a lot of new FAO requirements (or changes) Full documentation and installation package available Full documentation and installation package available

33 List of additional works done: Error handling: in case of bad AGRIS AP xml the process should stop after 3rd trial that produces empty xml adding “monthly” as period for harvesting in the scheduler as possible parameter Changing RDBMS keeping statistics to MySQL Introducing login and password Enable changing of the path for the XML files Adding number of records harvested on the initial display of DP Additional modifications of the menus Adding of additional parameters (CC, Name, acronym etc.) for data provider taken from mdb for AGRIS data providers Changing the naming of the produced output files and including the center code Cleaning of OAI part and the wrong namespaces in the XML result Adding of activate/ deactivate function Improvement of the statistics

34 Testing and implementation Testing. Installation in FAO (under common accessible server GILS09) for further testing Creation of distribution package and documentation Presenting to the management and other colleagues in FAO Installation to another server or just redirecting of the output to the existing directory for AGRIS production Mechanism for including in the AGRIS production cycle Trouble shooting for OAI-PMH repositories

35 Summary / Conclusions The goal of the harvester Benefits for AGRIS Possibility to use it with other FAO OA project Future implementation and use in house and by our partners

36 What next Help AGRIS centres to install OAI-PMH plug-in and expose outside firewall. Facilitating host services for some Data Providers Installing harvester to other aggregators from AGRIS harvesting to AGRIS portal Follow up actions

37 Close New way of organization of AGRIS harvesting It is not an user interface but a scheduler. Not a search interface Its success depend on the OAI-PMH plug-in exported data quality.

38 Thank you