Programmatic Interaction with Open Access Repositories

Slides:



Advertisements
Similar presentations
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Advertisements

OAI in DigiTool DigiTool Version 3.0.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
1 Uppsala University Library Eva Müller Peter Hansson Stefan Andersson Uwe Klosa Electronic Publishing Centre Krister Östlund Waller project.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Malaysian Grid for Learning October DC 2004, Shanghai, China. © 2004 MIMOS Berhad. All Rights Reserved Metadata Management System DC2004: International.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° WP3 - Strengthen.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
VIVO and Scholarly Repositories: Synergistic Opportunities.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Data Repositories.
Data Repositories and Science Gateways for Open Science Presenter: Roberto Barbera – UNICT and INFN EGI Community Forum Bari – 11 November 2015.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Energising Scientific.
DSpace - Digital Library Software
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
The Open Access Repository of INFN Roberto Barbera and Rita Ricceri – INFN
1 ABCD as a digital library tool An introduction on the concept and implementation by Egbert de Smet Univ. of Antwerp.
The Sci-GaIA project Prof. Roberto Barbera – University of Catania - Italy 2 November 2016 – Sci-GaIA Workshop on “Open Science.
Programmatic Interaction with Open Access Repositories
VIRTA Publication Information Service
EthERNet Research & Education Repository
EthERNet Repository - Final report
Promoting and Preserving FIU Research and Scholarship
OceanDocs Digital Repository of Marine Science Research Outputs
EthERNet Research & Education Repository
ACEPRD Plant Repository – Intermediate report
Information modeling and infrastructures for metadata
Open Science Approaches to Modelling & Simulation
Technical Meeting with CNR and INAF 7 October 2014
Repository Software - Standards
ACS 2016 Moving research forward with persistent identifiers
Tim Smith CERN Geneva, Switzerland
VI-SEEM Data Repository
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Open Access Repository INFN Roberto Barbera (roberto
VI-SEEM Data Repository
EthERNet Research & Education Repository
ACEPRD Plant Repository - Final report
Jay Bhatt Drexel University Libraries
OAI and Metadata Harvesting
eCulture Science Gateway – reloaded
Metadata for research outputs management Part 2
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Implementing an Institutional Repository: Part II
Context Interoperability Submission Search Preservation
Tech introduction.
Malte Dreyer – Matthias Razum
Márton Németh – László Drótos How to catalogue a web archive?
The NADRE project Prof. Roberto Barbera (University of Catania – Italy - 15 October 2018 – Third NADRE Training Workshop –
The NADRE services Mr. Mario Torrisi (PI4 – Italy –
Implementing an Institutional Repository: Part II
The NADRE services Mr. Mario Torrisi (PI4 – Italy –
Programmatic interaction with the Invenio-based NADRE Repository
The NADRE services Mr. Mario Torrisi (PI4 – Italy –
The NADRE services Mr. Mario Torrisi (PI4 – Italy –
Programmatic interaction with the Invenio-based NADRE Repository
How to Implement an Institutional Repository: Part II
Advanced hands-on on programmatic access to an Open Access Repository
RCSI institutional repository rcsi
Presentation transcript:

Programmatic Interaction with Open Access Repositories Roberto Barbera and Carla Carrubba – University of Catania - Italy (roberto.barbera@ct.infn.it, carla.carrubba@ct.infn.it) e-Research Summer Hackfest – Catania (Italy)

Introduction: definitions and context Part 2 Outline Part 1 Introduction: definitions and context Part 2 Manually resource upload by submit interface Programmatic interaction with an Open Access Repository using APIs for data Searching Downloading Uploading MARCXML tags overview Programmatic interaction with an Open Access Repository using the OAI-PMH-standard protocol Part 3 Get authorship of research products

Part 1

Concepts and definitions (Source: Wikipedia) Open Access repositories are powered by Digital Asset Management Systems (DAMSes), which are “intertwined structures incorporating both software and hardware that take care of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets” A digital asset in essence is “anything that exists in a binary format and comes with the right to use” “Types of digital assets include, but are not exclusive to, photography, logos, illustrations, animations, audio-visual media, presentations, spreadsheets, Word and/or PDF documents, data and a multitude of other digital formats and their respective metadata”

Some of the most common DAMSes Home page License CKAN http://ckan.org/ Free CONTENTdm http://www.oclc.org/contentdm.en.html Commercial Digibib   Digital Commons http://digitalcommons.bepress.com/ Commercial (hosted service) DigiTool http://www.exlibrisgroup.com/category/DigiToolOverview DiVA-Portal http://www.diva-portal.org Free (hosted service) dLibra http://dingo.psnc.pl/dlibra/ Drupal https://www.drupal.org/ DSpace http://www.dspace.org/ Earmas http://www.earmas.net/ EPrints http://www.eprints.org/software/ EQUELLA Repository http://www.equella.com/ ETD-db http://scholar.lib.vt.edu/ETD-db/index.shtml Fedora http://www.fedora-commons.org/ Fez http://apsr.anu.edu.au/currentprojects/fez06.htm Greenstone http://www.greenstone.org/ HAL https://hal.archives-ouvertes.fr/ Invenio http://invenio-software.org/ Islandora/Fedora http://islandora.ca/ intraLibrary http://www.intrallect.com/solutions/managing_content/ MyCoRe http://www.mycore.de/ Open Repository http://www.openrepository.com/ OPUS http://www.kobv.de/entwicklung/software/opus-4/ PURE https://www.st-andrews.ac.uk/staff/research/pure/ SciELO http://scielo.org/php/index.php VITAL https://www.iii.com/products/vital WEKO http://weko.wou.edu.my XooNIps http://xoops.org/modules/repository/ Others, more business or social oriented, are listed at www.capterra.com/digital-asset-management-software/

Sci-GaIA Task 3.1: Support the creation of federated and interoperable Open Access Document and Data Repositories in Africa, compliant with EU and other international guidelines Planned activities: Identification of already existing Open Access Document and Data Repositories in the region and inclusion in web based directories such as OpenDOAR and the CHAIN-REDS Knowledge Base Promotion of the Open Access Initiative (OAI) standards and of the OpenAIRE guidelines to make contents (both papers and data) stored on the African repositories more discoverable, searchable and hence visible worldwide Federation, through the use of Linked Data standards and Semantic Web technologies, of African Open Access Document and Data Repositories and to make them accessible and searchable from a unique entry point included in the project website Feasibility study for the creation of a pilot service to issue Persistent Identifiers (PIDs) compliant with the Handle System to be associated to documents and data Provision of a ready-to-install-and-configure appliance to quickly build and populate Open Access Repositories compliant with OAI, OpenDOAR and OpeAIRE standards/guidelines

The Sci-GaIA Open Access Repository Requirements: Open source Distributed under a free license Deployable on a local infrastructure (i.e., not a hosted service) Standard compliant Well supported Scalable, up to O(106) – O(107) resources (to begin with) Choice: Invenio (latest stable version: v1.2.1 + Sci-GaIA add-ons) Motivations: Fully compliant with all most important library standards, e.g. DCMI, Marc21 and OAI-PMH; Co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and used as institutional repository by about 30 scientific institutions worldwide; INSPIRE, SCOAP3 and ZENODO (the OpenAIRE flagship archive) repositories are based on Invenio; The CERN Document Server operates since 2002 and manages about 1.3 million records; UNESCO and UEMOA are leading an initiative to create a virtual library based on Invenio in 8 African countries (Benin, Burkina Faso, Côte d’Ivoire, Guinea Bissau, Mali, Niger, Senegal and Togo).

The Sci-GaIA Open Access Repository (http://oar.sci-gaia.eu/) authentication federated Resources can be: Manually uploaded Automatically harvested and ingested from external sources Sci-GaIA add-ons to Invenio: The possibility to mint DataCite Digital Object Identifiers (DOIs) and assign them to the records stored in the OAR If existing, direct links to the altmetrics of each of the records contained in the OAR The correct metadata structure and the right OAI-PMH endpoint configuration to make the OAR compliant with version 3.0 of the OpenAIRE Guidelines

Compliance with standards (Full conforming with Open Archive Initiative’s standards & registered as an OpenDOAR data provider)

The Knowledge Workflow First demonstrated @ ICT2015

Research packages

The Sci-GaIA OAR itself as a research package 6 clones of the Sci-GaIA OAR are being deployed, both in Africa and Europe

Part 2

Submit a resource

Image submit Item 1 Item 2

There are three kind of APIs you can use: XML API JSON API Python API Programmatic Interaction with an (Invenio-based) Open Access Repository Search Engine API There are three kind of APIs you can use: XML API JSON API Python API

Programmatic Interaction XML API Syntax: GET /search?param1=value1&param2=value2&param3=value3… Example: Get the first 10 records in XML format http://oar.sci-gaia.eu/search?jrec=1&rg=10&of=xm where jrec= jump to record ID (e.g. 1 for first hit) rg=records-in-group-of (e.g. 10 hits per page) of= output format (e.g. Xm for XML format)

Programmatic Interaction XML API Set ‘jrec’ and ‘rg’ appropriately to paginate the output Example: http://oar.sci-gaia.eu/search?of=xm&jrec=1&rg=10 http://oar.sci-gaia.eu/search?of=xm&jrec=11&rg=10 http://oar.sci-gaia.eu/search?of=xm&jrec=22&rg=10 Do not set “rg” to high – there is a server-wide safety limit for it

Programmatic Interaction XML API Example: Get the first 10 records that contains the string “Sci-GaIA Winter School” in the title: http://oar.sci-gaia.eu/search?p=Sci-GaIA%20Winter%20School&f=title&jrec=0&rg=10&of=xm where: p=pattern (e.g. your query) f= field to search within (e.g. “title”, “athors”..) Get a record from a given DOI http://oar.sci-gaia.eu/search?p=doi:10.15169/sci-gaia:1466352420.24&of=xm Get all records uploaded from a given date (e.g. 2016-03-21) to another given date (e.g. today) http://oar.sci-gaia.eu/search?of=xm&d1=2016-03-21&d2=2016-07-05 where d1=first date YYYY-mm-dd format d2=second date YYYY-mm-dd format

Output of : http://oar. sci-gaia. eu/search

Programmatic Interaction JSON API You can ask for JSON output format “of=recjson” to obtain it Use the same parameters as XML API Example: Get a record from a DOI: http://oar.sci-gaia.eu/search?p=doi:10.15169/sci-gaia:1466352420.24&of=recjson Get all records uploaded from a given date (e.g. 2016-03-21) to another given date (e.g. today): http://oar.sci-gaia.eu/search?d1=2016-03-21&d2=2016-07-05&of=recjson where d1=first date YYYY-mm-dd format d2=second date YYYY-mm-dd format

Output of : http://oar. sci-gaia. eu/search

Programmatic Interaction JSON API Example: Get only the abstract, title and authors of resources: http://oar.sci-gaia.eu/search?of=recjson&ot=abstract,title,authors where ot=output tags (e.g. ‘’ to get all fields, ‘title’ to get titles only)

Programmatic Interaction Python API Invenio Search Engine can be called from within your Python programs via both a high-level and low-level API interface. Use the same parameters as XML and JSON API To know more about Python, XML and JSON API visit this guide: http://oar.sci-gaia.eu/help/hacking/search-engine-api

Programmatic Interaction Download records We need: PUBLIC KEY PRIVATE KEY SIGNATURE Calculate signature: myquery= http://oar.sci-gaia.eu/search?apikey=PUBLIC-KEY&jrec=0&rg=10&of=xm Signature=HMAC-SHA1(myquery,Private-Key) http://oar.sci-gaia.eu/search?apikey=PUBLIC-KEY&jrec=0&of=xm&rg10&signature=SIGNATURE Provided by the system We have to calculate

Programmatic Interaction Upload records We have to: Send an authorisathion request for your IP address to admin@sci-gaia.eu Create a MARCXML file as input (e.g. your_file.xml) Example: curl –T your_file.xml http://oar.sci-gaia.eu/batchuploader/robotupload/insert -A invenio_webupload -H “Content-Type: application/marcxml+xml” To know more about Upload: http://oar.sci-gaia.eu/help/admin/bibupload-admin-guide#2

YOUR_FILE.XML MARC format is the standard in the library world <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record xmlns="http://www.loc.gov/MARC21/slim"> </record> … </collection>

your_file.xml <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record xmlns="http://www.loc.gov/MARC21/slim"> <datafield tag=“ " ind1=" " ind2=" "> <subfield code=“”></subfield> ….. </datafield> …… </record> </collection>

your_file.xml <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record xmlns="http://www.loc.gov/MARC21/slim"> <datafield tag="024" ind1="7" ind2=" "> <subfield code="a">DOI identifier</subfield> <subfield code="2">Type of identifier</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">First author</subfield> <subfield code="v">Affiliation</subfield> <subfield code="w">Country</subfield> <subfield code="j">orcid</subfield> …… </record> </collection>

http://oar.sci-gaia.eu/help/admin/howto-marc your_file.xml To know more about MARCXML tags http://oar.sci-gaia.eu/help/admin/howto-marc

Programmatic interaction Search Engine based on the OAI-PMH-standard protocol The Sci-GaIA OAR OAI-PMH endpoint is publicly available at: http://oar.sci-gaia.eu/oai2d Get detailed information about the available sets: http://oar.sci-gaia.eu/oai2d?verb=Identify Get the list of Dublic Core records: http://oar.sci-gaia.eu/oai2d?verb=ListRecords&metadataPrefix=oai_dc Get a record from the oai-identifier: http://oar.sci-gaia.eu/oai2d?verb=GetRecord&identifier=oai:oar.sci-gaia.eu:5&metadataPrefix=oai_dc

Output of : http://oar.sci-gaia.eu/oai2d?verb=GetRecord&identifier=oai:oar.sci-gaia.eu:8&metadataPrefix=oai_dc

Part 3

“Who’s this science of?” How to provide authorship to research products?

ORCID (www.orcid.org – becoming a “de facto” standard) More than 2.2 million ORCID IDs so far

Digital Object Identifiers Thanks to UNICT, the Sci-GaIA OAR has an official prefix of: Unlimited numbers of sub-prefixes/DOIs can be created/minted All records in the OAR can be “claimed” in the ORCID profiles of their authors

Authorship of research products with OAR and ORCID (www.orcid.org)

Altmetrics (www.altmetrics.com) The Sci-GaIA OAR automatically links its records to their altmetrics

Thank you! sci-gaia.eu info@sci-gaia.eu

Invenio (software and documentation) Marc 21 OAI-PMH ORCID References DAMS introduction http://www.sci-gaia.eu/osp-oar/ DataCite http://www.datacite.org Dublin Core http://dublincore.org/ Invenio (software and documentation) http://invenio-software.org/ Marc 21 https://en.wikipedia.org/wiki/MARC_standards OAI-PMH https://www.openarchives.org/OAI/openarchivesprotocol.html ORCID http://www.orcid.org Sci-GaIA OAR Installation and configuration guide http://oar-sci-gaia.readthedocs.io/en/latest/