Download presentation
Presentation is loading. Please wait.
1
Research Output Repositories
Tomasz Miksa, TU Wien
2
This presentation is NOT about recommending you a specific system
Agenda What is a repository system? How to compare repository systems? What systems are out there? What can we learn from Portugal? (How to introduce a repository system?) This presentation is NOT about recommending you a specific system
3
What is a repository system?
4
Introduction Expectations evolved over time
from digitization to preservation of e-Science experiments
5
manage, share, access, and archive researchers’ datasets
specialized to aggregating disciplinary data general collecting over larger knowledge areas such as social sciences repository allows examination, proof, review, transparency, and validation of a researcher’s results
7
https://phaidra.univie.ac.at/view/o:423816
12
Repositories scope Specialised General Aggregate experts’ data
disciplinary data e.g. DNA sequencing General covering large knowledge areas e.g. social sciences Aggregate experts’ data globally locally university country
13
Architecture Not different than other system which has frontend and baceknd.
14
Architecture Not different than other system which has frontend and baceknd.
15
Architecture Conceptually like any web systems consisting of Example
frontend backend Example online shopping
16
How to compare repository systems?
I am going to go through a list of things to consider.
17
Infrastructure Externally hosted solution Locally hosted solution
own ICT infrastructure required IT staff required developers, system administrators Externally hosted solution outsourcing of infrastructure lack of control where the data is can the external party be trusted? Open source or Proprietary who owns the code? is it allowed to introduce changes? Community support number of similar instances forum and mailing lists professional support
18
Front-end Design Out of the box or development needed? Customisable?
Fedora Commons is just a backend Customisable? branding Multi-lingual support? Mobile-optimized design?
19
Content Organization Aggregations and Collections Metadata
proceedings, department outputs, etc. help in navigating the repository faceted search Metadata what are the default standards? how easy to add another standard?
20
Content Organization & Multimedia
Single object may have many representations Content presentation PDF Viewer Video streaming Image previews Audio playback Slideshows
21
Content Discovery Inside of the repository Outside of the repository
advanced search full text indexing graphical navigation geolocation Outside of the repository OAI-PMH Search engines optimization Google scholar indexing DOI and persistent identifiers Social Features and Notifications Share, Bookmark, Comment, RSS
22
Publication tools Customisable submit forms Publishing workflow
specify required information for a submission metadata license etc. Publishing workflow roles editor, reviewer notifications Batch processing
23
Access Control & Authentication
IP ranges, user accounts, Access Control Lists Embargo periods Authentication Not an issue for open access Possible integrations to be considered LDAP CAS System accounts Shibboleth
24
Interoperability OAI-PMH - Open Archives Initiative Protocol for Metadata Harvesting query to discover repository contents only for metadata not for depositing SWORD - Simple Web-service Offering Repository Deposit deposit to multiple repositories at once deposit by third party systems (e.g. lab equipment) Export to Mendeley, DataCite, RefWorks, BibTeX, etc.
25
Reporting Needed for feedback and building the case Download reports
Active users Google analytics integration
26
Preservation Back-ups LOCKSS compatibility Preservation tools
file system backup import / export functionality LOCKSS compatibility Lots of Copies Keep Stuff Safe peer to peer network Preservation tools format migration tools risk management tools preservation specific metadata collection PREMIS, METS
27
A comparison of research data management platforms: architecture, flexible metadata and interoperability Institutional Repository Software Comparison Open Source Software for Digital Preservation Repositories: a Survey Research Data Repositories: Review of current features, gap analysis, and recommendations for minimum requirements Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra
28
Common survey and report shortcomings
Simplifications ‘customisable’ metadata adding an XML template vs reprogramming the software Not up to date systems evolve fast Superficial deployment mode - sometimes both options exist Zenodo as a service or open source at GitHub community ‘support’ check how many posts there are and how fast people got answers Currently no survey to compare them all
29
Best way is to get your hands dirty
Sort out the basics What is the purpose of the repository? Do I need more than one repository? Which functionality is a must and which one is nice to have? How should the system integrate with the existing infrastructure? Browse through official websites wiki pages GitHub issues mailing lists Contact people who already have an instance Make test deployment of few systems install configure populate with sample data evaluate Be agile!
30
What repository systems are out there?
31
Fedora based NOT a Linux OS distribution!
Fedora commons provide backend only Content model RDF linked data Persistent identifier Versioning Requires further development Fedora provides an environment for low-level management of digital objects. In the Fedora model, an object consists of a persistent identifier, an XML document for object properties, and a number of datastreams each holding a content file or metadata record (the object can have more than one of each). Each object can assert relationships to any number of other objects through statements in RDF Linked Open Data format. Fedora does not however provide any high-level management functions for ingest, workflow management, preservation, search, or user authorisation. Rather, it is designed for developers to build support for these, using software layered on top of the Fedora environment.
32
Fedora based Hydra -> Samvera Islandora Phaidra
Ruby on Rails frontend Apache Solr for indexing Islandora Drupal frontend Virtual machines available for download Integrates well with Archivematica Phaidra Perl + Catalyst Each installation becomes a partner of a consortium
33
Archivematica Digital preservation system
Automates the process of preparing digital objects for ingest into a repository, for example: Scan for viruses Generate METS Generate DIP (e.g. migrate) Integrates with repository systems For dissemination e.g. Islandora, DSpace Open source digital preservation system that automates the process of preparing digital objects for ingest into a repository. It supports ingest of prepared information packages into archival storage, and their access for dissemination via a repository or content management system. Supports repository integration e.g. with Dspace, Islandora. Also supports archival storage integration, e.g. with Arkivum, Duraspace.
34
Archivematica
35
Archivematica
36
DSpace Open source Popular Large open source community,
scholarly publication repositories some data repositories DataShare or Dryad Large open source community, variety of support and consultancy providers Aims for ‘turnkey’ local installation, but can be complex to set up and maintain if customisation is required Repository Catalogue Preservation Repository system providing ingest, access, data management and preservation features. DSpace structures the ingested content into collections, within a hierarchy of ‘communities’. This is intended to reflect typical research organisation structures. Collections contain items, which in turn comprise one or more files. Pros: Support for wide range of content types, access controls, interoperability standards, virtualised storage. Preservation functions. Submission workflows customisable per collection. Customisable search. Wide take-up for scholarly publication repositories, and some data repositories including Edinburgh DataShare, and the Dryad open data repository. Large well established open source community, variety of support and consultancy providers. Cons: Aims for ‘turnkey’ local installation, but can be complex to set up and maintain if customisation is required. Limited version control. Few data viewers available. Further information: from provider DuraSpace. Also list of registered service providers
37
CKAN - Comprehensive Knowledge Archive Network
Repository for datasets Store data Or holds metadata for datasets hosted externally Pros: faceted search unrestricted metadata data viewers customisable well-established open source community wide take-up in government sector Cons: Lacks support for OAI-PMH CKAN (Comprehensive Knowledge Archive Network) is an open source data portal platform, that is, software for building a catalogue and repository for datasets. The system can store datasets, or simply hold metadata for datasets hosted externally. Pros: Strong support for discovery services such as faceted search, unrestricted metadata. Data viewers. Versioning. Extensively customisable. Well-established open source community, wide take-up in government sector including data.gov.uk Cons: Local installation is complex. Platform has relatively limited take-up for research data catalogues. Lacks support for OAI-PMH standard protocol for harvesting metadata.
38
Zenodo Runs on the open source Invenio platform
Developed by CERN Enables ‘upload’ of any file data, publication or code Enables compliance for European (H2020) projects Integration with Dropbox and GitHub Source code available on GitHub Customisation requires software development No access or download statistics No data viewers Repository service launched within the OpenAIREplus project as part of a Europe-wide research infrastructure, and run on the open source Invenio platform. Enables ingest of any file or ‘upload’ whether data, publication or code into a ‘community’ or workspace claimed by a registered user. Multiple files are organised into named collections. Pros: Free to use. Straightforward linking of research outputs and related information. Enables compliance and offers basic research analysis and reporting via OpenAire for European (H2020) projects. Flexible licensing. Open metadata standards. Access controls. Integration with DropBox. Cons: Institutional customisation requires software development. Provides no access or download statistics. Metadata inflexible. No versioning. No data viewers. Basic succession plan.
39
Dataverse Repository system
Own instance -> becomes part of community 22 installations, e.g. Harvard, DANS (NL) Supports citation, versioning, Specific disciplinary metadata standards Easy to install Vagrant (virtual machine) Popular in social science domain Enables organisations to host a storage and access system for research materials. The software creates self-contained ‘dataverses,’ each of which is designed to support individual researchers or research groups. Pros: Support for citation. Access controls. Versioning. Support for specific disciplinary metadata standards. Data viewers. Basic preservation functions. Designed to be easy to install. Cons: Currently low take-up outside quantitative social science domains. Limited-take up in UK.
40
EPrints Available since 2000
Wide take-up for scholarly publication repositories But also supports research data (ReCollect) Support for wide range of content types, metadata schema, interoperability standards EPrints Services not-for-profit commercial services organisation Help build your own repository host for you Module designed for research data repository purpose, resulting from Jisc projects at Universities of Southampton and UK Data Service at University of Essex. Recollect is currently a plug-in requiring a clean EPrints 3+ installation. EPrints 4 offers the Recollect functionality as an optional repository ‘layer’. Recollect adapts the base EPrints platform data model to enable research data and documentation files to be organised as ‘collections’. Pros: Support for wide range of content types, metadata schema, interoperability standards, virtualised storage (via plug-in). Customisable search. Wide take-up for scholarly publication repositories, and some data repositories including UKDS ReShare, University of Bath, and the London School of Hygiene and Tropical Medicine. EPrints has large well-established open source community, and several support and consultancy providers. Cons: Less extensible than DSpace. Lacks collaboration support. Few data viewers available
41
Commercial solutions Figshare Figshare for Institutions Preservica
Publically available repository Free of charge Alternative to zenodo Figshare for Institutions Dedicated, partially customised instance hosted by figshare Loss of institutional control Proprietary API instead of common standards Uncertain succession plan Preservica
42
Repository registries
Directory of Open Access Repositories – DOAR Based on registrations Registry of Open Access Repositories – ROAR Automatically harvested list based on OAI-PMH Projection of DOAR and ROAR onto google maps
44
List of repository software (not instances)
45
Portugal for Palestine
46
Scientific Open Access Repository of Portugal (RCAAP)
Objectives Increase the visibility, accessibility and dissemination of Portuguese research results Facilitate access to information about Portuguese scientific output Integrate Portugal in the wide range of international initiatives in this domain
47
RCAAP Portal Meta-repository Currently: Runs on DSpace
Aggregates metadata from Portuguese and Brazilian repositories Actual data remains inside of these repositories Currently: 1,5 million documents 126 Repositories Runs on DSpace
48
RCAAP repository types
Local Repository installation, configuration and operation of a repository with own facilities and infrastructure Hosting Service for Institutional Repositories (SARI) Software as a Service Centrally hosted hardware, hosting, connectivity, foundation systems, applications, security, backup service, monitoring Institution administrates, define policies, gets customization Common Repository Shared with others Intended to institutions whose scientific production does not justify the creation of a repository also serves as a incubator repositories.
49
Conclusion
50
Conclusion Analysed functions Test systems on your own
You may need more than one system You may decide on the scope based on domain, institution, etc. Depends on what skills you have and how much money you have Fedora developed on your own vs paid service
51
References A comparison of research data management platforms: architecture, flexible metadata and interoperability Institutional Repository Software Comparison Open Source Software for Digital Preservation Repositories: a Survey Research Data Repositories: Review of current features, gap analysis, and recommendations for minimum requirements Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and Hydra
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.