Research Output Repositories

Slides:



Advertisements
Similar presentations
Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham.
Advertisements

DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
Theo Andrew, Edinburgh University Library Choosing Suitable Open-Source Repository Software Choosing Suitable Open Source Repository Software Theo Andrew.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
Technical Framework Charl Roberts University of the Witwatersrand Source: Repositories Support Project (JISC)
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Your online classroom. Powerhouse Campus o Custom Class dashboards o Links with Moodle, Studywiz, Bb, ClickView & all web apps o Links your school library.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of Pretoria.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Digital Library Architecture and Technology
DAMS: An Overview July 13, 2011 Karen Estlund Head, Digital Library Services.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
“Filling the digital preservation gap” an update from the Jisc Research Data Spring project at York and Hull Jenny Mitcham Digital Archivist Borthwick.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
BMC Open Access Colloquium, 8 February Morgan: "Open Access Repositories"
(in 5 Minutes or Less) Isaac Gilman Scholarly Communications & Research Services Librarian Pacific University (Ore.)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The DPubS Development Project: Building an Open Source Electronic Publishing System David Ruddy Cornell University Library.
EPrints 10 Years of Digital Preservation. What is EPrints For?  EPrints offers a safe, open and useful place to store, share and manage material in the.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
May 2, 2013 An introduction to DSpace. Module 1 – An Introduction By the end of this module, you will … Understand what DSpace is, and what it can be.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Open Repository Claire Bundy OAI6 Geneva Overview BioMed Central: who we are About Open Repository Is Open Repository right for you? Questions and.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
William J Nixon Setting up a Repository. Introduction Key Features to consider (and review) Wide Range of Technology Available –Best fit for purpose –Clear.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Promoting and Preserving FIU Research and Scholarship
Moving on : Repository Services after the RAE
Working with Feature Layers
OceanDocs Digital Repository of Marine Science Research Outputs
The importance of being Connected
New features and customization options
An Overview of Data-PASS Shared Catalog
An Introduction to Tessella and The Safety Deposit Box Platform
VI-SEEM Data Discovery Service
Introduction, Features & Technology
Overview: Fedora Architecture and Software Features
VI-SEEM Data Repository
VI-SEEM Data Repository
Jay Bhatt Drexel University Libraries
Introduction to DSpace
eCulture Science Gateway – reloaded
Implementing an Institutional Repository: Part II
DPubS: An Open Source Electronic Publishing System
Malte Dreyer – Matthias Razum
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
GISELA & CHAIN Workshop Digital Cultural Heritage Network
DataverseNL Laura Huis in ’t Veld & Paul Boon Dutch Dataverse
Dataverse for citing and sharing research data
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Research Output Repositories Tomasz Miksa, TU Wien

This presentation is NOT about recommending you a specific system Agenda What is a repository system? How to compare repository systems? What systems are out there? What can we learn from Portugal? (How to introduce a repository system?) This presentation is NOT about recommending you a specific system

What is a repository system?

Introduction Expectations evolved over time from digitization to preservation of e-Science experiments

manage, share, access, and archive researchers’ datasets specialized to aggregating disciplinary data general collecting over larger knowledge areas such as social sciences repository allows examination, proof, review, transparency, and validation of a researcher’s results http://www.infotoday.com/cilmag/apr16/Uzwyshyn--Research-Data-Repositories.shtml

https://phaidra.univie.ac.at

https://phaidra.univie.ac.at/view/o:423816

http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5993/

https://data.ccca.ac.at/dataset/oks15-bias-corrected-euro-cordex-models-global-radiation/resource/88d350e9-5e91-4922-8d8c-8857553d5d2f?view_id=eeeb1e17-c707-46eb-bf24-dd5ed169f1c6

Repositories scope Specialised General Aggregate experts’ data disciplinary data e.g. DNA sequencing General covering large knowledge areas e.g. social sciences Aggregate experts’ data globally locally university country

Architecture Not different than other system which has frontend and baceknd.

Architecture Not different than other system which has frontend and baceknd.

Architecture Conceptually like any web systems consisting of Example frontend backend Example online shopping

How to compare repository systems? I am going to go through a list of things to consider.

Infrastructure Externally hosted solution Locally hosted solution own ICT infrastructure required IT staff required developers, system administrators Externally hosted solution outsourcing of infrastructure lack of control where the data is can the external party be trusted? Open source or Proprietary who owns the code? is it allowed to introduce changes? Community support number of similar instances forum and mailing lists professional support

Front-end Design Out of the box or development needed? Customisable? Fedora Commons is just a backend Customisable? branding Multi-lingual support? Mobile-optimized design?

Content Organization Aggregations and Collections Metadata proceedings, department outputs, etc. help in navigating the repository faceted search Metadata what are the default standards? how easy to add another standard?

Content Organization & Multimedia Single object may have many representations Content presentation PDF Viewer Video streaming Image previews Audio playback Slideshows

Content Discovery Inside of the repository Outside of the repository advanced search full text indexing graphical navigation geolocation Outside of the repository OAI-PMH Search engines optimization Google scholar indexing DOI and persistent identifiers Social Features and Notifications Share, Bookmark, Comment, RSS

Publication tools Customisable submit forms Publishing workflow specify required information for a submission metadata license etc. Publishing workflow roles editor, reviewer notifications Batch processing

Access Control & Authentication IP ranges, user accounts, Access Control Lists Embargo periods Authentication Not an issue for open access Possible integrations to be considered LDAP CAS System accounts Shibboleth

Interoperability OAI-PMH - Open Archives Initiative Protocol for Metadata Harvesting query to discover repository contents only for metadata not for depositing SWORD - Simple Web-service Offering Repository Deposit deposit to multiple repositories at once deposit by third party systems (e.g. lab equipment) Export to Mendeley, DataCite, RefWorks, BibTeX, etc.

Reporting Needed for feedback and building the case Download reports Active users Google analytics integration

Preservation Back-ups LOCKSS compatibility Preservation tools file system backup import / export functionality LOCKSS compatibility Lots of Copies Keep Stuff Safe peer to peer network Preservation tools format migration tools risk management tools preservation specific metadata collection PREMIS, METS

A comparison of research data management platforms: architecture, flexible metadata and interoperability https://doi.org/10.1007/978-3-319-16486-1 Institutional Repository Software Comparison https://works.bepress.com/jean_gabriel_bankier/22/ Open Source Software for Digital Preservation Repositories: a Survey https://arxiv.org/abs/1707.06336 Research Data Repositories: Review of current features, gap analysis, and recommendations for minimum requirements https://www.rdc-drc.ca/wp-content/uploads/Review-of-Research-Data-Repositories-2015.pdf Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra https://dx.doi.org/10.14288/1.0075768

Common survey and report shortcomings Simplifications ‘customisable’ metadata adding an XML template vs reprogramming the software Not up to date systems evolve fast Superficial deployment mode - sometimes both options exist Zenodo as a service or open source at GitHub community ‘support’ check how many posts there are and how fast people got answers Currently no survey to compare them all

Best way is to get your hands dirty Sort out the basics What is the purpose of the repository? Do I need more than one repository? Which functionality is a must and which one is nice to have? How should the system integrate with the existing infrastructure? Browse through official websites wiki pages GitHub issues mailing lists Contact people who already have an instance Make test deployment of few systems install configure populate with sample data evaluate Be agile!

What repository systems are out there?

Fedora based NOT a Linux OS distribution! Fedora commons provide backend only Content model RDF linked data Persistent identifier Versioning Requires further development Fedora provides an environment for low-level management of digital objects. In the Fedora model, an object consists of a persistent identifier, an XML document for object properties, and a number of datastreams each holding a content file or metadata record (the object can have more than one of each). Each object can assert relationships to any number of other objects through statements in RDF Linked Open Data format. Fedora does not however provide any high-level management functions for ingest, workflow management, preservation, search, or user authorisation. Rather, it is designed for developers to build support for these, using software layered on top of the Fedora environment.

Fedora based Hydra -> Samvera Islandora Phaidra Ruby on Rails frontend Apache Solr for indexing Islandora Drupal frontend Virtual machines available for download Integrates well with Archivematica Phaidra Perl + Catalyst Each installation becomes a partner of a consortium

Archivematica Digital preservation system Automates the process of preparing digital objects for ingest into a repository, for example: Scan for viruses Generate METS Generate DIP (e.g. migrate) Integrates with repository systems For dissemination e.g. Islandora, DSpace Open source digital preservation system that automates the process of preparing digital objects for ingest into a repository. It supports ingest of prepared information packages into archival storage, and their access for dissemination via a repository or content management system. Supports repository integration e.g. with Dspace, Islandora. Also supports archival storage integration, e.g. with Arkivum, Duraspace. https://wiki.archivematica.org/Workflows https://wiki.archivematica.org/Release_1.6.0

Archivematica

Archivematica https://wiki.archivematica.org/Format_policies

DSpace Open source Popular Large open source community, scholarly publication repositories some data repositories DataShare or Dryad Large open source community, variety of support and consultancy providers Aims for ‘turnkey’ local installation, but can be complex to set up and maintain if customisation is required Repository  Catalogue  Preservation  Repository system providing ingest, access, data management and preservation features. DSpace structures the ingested content into collections, within a hierarchy of ‘communities’. This is intended to reflect typical research organisation structures. Collections contain items, which in turn comprise one or more files. Pros: Support for wide range of content types, access controls, interoperability standards, virtualised storage. Preservation functions. Submission workflows customisable per collection. Customisable search. Wide take-up for scholarly publication repositories, and some data repositories including Edinburgh DataShare, and the Dryad open data repository. Large well established open source community, variety of support and consultancy providers. Cons: Aims for ‘turnkey’ local installation, but can be complex to set up and maintain if customisation is required. Limited version control. Few data viewers available. Further information: from provider DuraSpace. Also list of registered service providers

CKAN - Comprehensive Knowledge Archive Network Repository for datasets Store data Or holds metadata for datasets hosted externally Pros: faceted search unrestricted metadata data viewers customisable well-established open source community wide take-up in government sector Cons: Lacks support for OAI-PMH CKAN (Comprehensive Knowledge Archive Network) is an open source data portal platform, that is, software for building a catalogue and repository for datasets. The system can store datasets, or simply hold metadata for datasets hosted externally. Pros: Strong support for discovery services such as faceted search, unrestricted metadata. Data viewers. Versioning. Extensively customisable. Well-established open source community, wide take-up in government sector including data.gov.uk Cons: Local installation is complex. Platform has relatively limited take-up for research data catalogues. Lacks support for OAI-PMH standard protocol for harvesting metadata.

Zenodo Runs on the open source Invenio platform Developed by CERN Enables ‘upload’ of any file data, publication or code Enables compliance for European (H2020) projects Integration with Dropbox and GitHub Source code available on GitHub Customisation requires software development No access or download statistics No data viewers Repository service launched within the OpenAIREplus project as part of a Europe-wide research infrastructure, and run on the open source Invenio platform. Enables ingest of any file or ‘upload’ whether data, publication or code into a ‘community’ or workspace claimed by a registered user. Multiple files are organised into named collections. Pros: Free to use. Straightforward linking of research outputs and related information. Enables compliance and offers basic research analysis and reporting via OpenAire for European (H2020) projects. Flexible licensing. Open metadata standards. Access controls. Integration with DropBox. Cons: Institutional customisation requires software development. Provides no access or download statistics. Metadata inflexible. No versioning. No data viewers. Basic succession plan.

Dataverse Repository system Own instance -> becomes part of community 22 installations, e.g. Harvard, DANS (NL) Supports citation, versioning, Specific disciplinary metadata standards Easy to install Vagrant (virtual machine) Popular in social science domain Enables organisations to host a storage and access system for research materials. The software creates self-contained ‘dataverses,’ each of which is designed to support individual researchers or research groups. Pros: Support for citation. Access controls. Versioning. Support for specific disciplinary metadata standards. Data viewers. Basic preservation functions. Designed to be easy to install. Cons: Currently low take-up outside quantitative social science domains. Limited-take up in UK.

EPrints Available since 2000 Wide take-up for scholarly publication repositories But also supports research data (ReCollect) Support for wide range of content types, metadata schema, interoperability standards EPrints Services not-for-profit commercial services organisation Help build your own repository host for you Module designed for research data repository purpose, resulting from Jisc projects at Universities of Southampton and UK Data Service at University of Essex. Recollect is currently a plug-in requiring a clean EPrints 3+ installation. EPrints 4 offers the Recollect functionality as an optional repository ‘layer’. Recollect adapts the base EPrints platform data model to enable research data and documentation files to be organised as ‘collections’. Pros: Support for wide range of content types, metadata schema, interoperability standards, virtualised storage (via plug-in). Customisable search. Wide take-up for scholarly publication repositories, and some data repositories including UKDS ReShare, University of Bath, and the London School of Hygiene and Tropical Medicine. EPrints has large well-established open source community, and several support and consultancy providers. Cons: Less extensible than DSpace. Lacks collaboration support. Few data viewers available

Commercial solutions Figshare Figshare for Institutions Preservica Publically available repository Free of charge Alternative to zenodo Figshare for Institutions Dedicated, partially customised instance hosted by figshare Loss of institutional control Proprietary API instead of common standards Uncertain succession plan Preservica

Repository registries Directory of Open Access Repositories – DOAR Based on registrations http://www.opendoar.org/ Registry of Open Access Repositories – ROAR Automatically harvested list based on OAI-PMH http://roar.eprints.org/ Projection of DOAR and ROAR onto google maps http://maps.repository66.org

List of repository software (not instances) http://wiki.lib.sun.ac.za/index.php?title=List_of_Repository_Software

Portugal for Palestine

Scientific Open Access Repository of Portugal (RCAAP) Objectives Increase the visibility, accessibility and dissemination of Portuguese research results Facilitate access to information about Portuguese scientific output Integrate Portugal in the wide range of international initiatives in this domain https://www.rcaap.pt

RCAAP Portal Meta-repository Currently: Runs on DSpace Aggregates metadata from Portuguese and Brazilian repositories Actual data remains inside of these repositories Currently: 1,5 million documents 126 Repositories Runs on DSpace

RCAAP repository types Local Repository installation, configuration and operation of a repository with own facilities and infrastructure Hosting Service for Institutional Repositories (SARI) Software as a Service Centrally hosted hardware, hosting, connectivity, foundation systems, applications, security, backup service, monitoring Institution administrates, define policies, gets customization Common Repository Shared with others Intended to institutions whose scientific production does not justify the creation of a repository also serves as a incubator repositories.

Conclusion

Conclusion Analysed functions Test systems on your own You may need more than one system You may decide on the scope based on domain, institution, etc. Depends on what skills you have and how much money you have Fedora developed on your own vs paid service

References A comparison of research data management platforms: architecture, flexible metadata and interoperability https://doi.org/10.1007/978-3-319-16486-1 Institutional Repository Software Comparison https://works.bepress.com/jean_gabriel_bankier/22/ Open Source Software for Digital Preservation Repositories: a Survey https://arxiv.org/abs/1707.06336 Research Data Repositories: Review of current features, gap analysis, and recommendations for minimum requirements https://www.rdc-drc.ca/wp-content/uploads/Review-of-Research-Data-Repositories-2015.pdf Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra https://dx.doi.org/10.14288/1.0075768