Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey

Slides:



Advertisements
Similar presentations
Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham.
Advertisements

Technical Highlights 25th August 2011 Sebastian Peters German National Library of Science and Technology.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
Finding a Software System to Support ETDs Susan Gibbons Digital Initiatives Librarian University of Rochester.
A. Grigorov, A. Georgiev, M. Petrov, S. Varbanov, K. Stefanov Building a Knowledge Repository for Life-long Competence Development.
Administration & Workflow
FAO and UNESCO-IOC/IODE Combine Efforts in their Support of Open Access Written by Marc Goovaerts, U. Hasselt, BE.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard Society for Scholarly Publishing 37 th Meeting,
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
DSpace - Digital Library Software
DSpace System Architecture 11 July 2002 DSpace System Architecture.
ONTARIO COUNCIL of UNIVERSITY LIBRARIES Ebook RFP scoring matrix.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Open Repository Claire Bundy OAI6 Geneva Overview BioMed Central: who we are About Open Repository Is Open Repository right for you? Questions and.
William J Nixon Setting up a Repository. Introduction Key Features to consider (and review) Wide Range of Technology Available –Best fit for purpose –Clear.
Session 3 Metadata & Workflow
REDCap General Overview
NRF Open Access Statement
João Aguiar Castro, Nélson Pereira, João Rocha da Silva, Cristina Ribeiro INESC TEC—Faculdade de Engenharia da Universidade do.
Building A Repository for Digital Objects
OceanDocs Digital Repository of Marine Science Research Outputs
Information modeling and infrastructures for metadata
An Overview of Data-PASS Shared Catalog
VI-SEEM Data Discovery Service
Introduction, Features & Technology
Flexible Extensible Digital Object Repository Architecture
Institutional role in supporting open access, open science, open data
Flexible Extensible Digital Object Repository Architecture
VI-SEEM Data Repository
Jay Bhatt Drexel University Libraries
Institutional Repository at NIO: Inspiration to Implementation
Outline Pursue Interoperability: Digital Libraries
Introduction to Implementing an Institutional Repository
Introduction to DSpace
Sophia Lafferty-hess | research data manager
OpenML Workshop Eindhoven TU/e,
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Implementing an Institutional Repository: Part II
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Data Model.
Research Data Management
EDDI12 – Bergen, Norway Toni Sissala
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Institutional Repositories
Dataverse for citing and sharing research data
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
Research Data Dr Aoife Coffey, Research Data Coordinator
SDMX IT Tools SDMX Registry
Palestinian Central Bureau of Statistics
Presentation transcript:

Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey Comparative study of Data Repository Software with Reference to Harvesting Data in the Context of Library and Information Science Prof. Dr. Mohan Raj Pradhan Ms. Parbati Pandey

Introduction Data is becoming more important to business decisions. This requires tools that can collect, store and help analyze data. Data repository is a tool that is common in scientific research but also useful for managing business data. Data repository is also known as a data library or data archive. Data repository is a large database infrastructure that can collect, manage, and store data sets for data analysis, sharing and reporting. A data set is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question.

Data repository : the what IT infrastructure (cloud based/online) set up to manage, share, access, maintain, and archive datasets. An application database specialized in storing metadata of data files/datasets/databases. Differs from publication repository mainly in its ability to: Store metadata at different level/hierarchy. Store and ingest data files in various formats for long-term preservation Dspace is publication repository/institutional repository i.e. text-based. http://www.infotoday.com/cilmag/apr16/Uzwyshyn--Research-Data-Repositories.shtml

Data Repository: The Why Easy information discovery Easy and efficient access More contact and intensify impact Persistent access (through persistent URL) i.e. make data citable through the assignment of DOI (Digital Object Identifier) Long-term storage and preservation Allows unprecedented use, analysis and finding through interoperability and interlinking with other repositories. Most major federal grant agencies require data access as mandatory part of the grant proposal /oversite process (NIH, NSF, NEH, USDA) Negative data are data that do not enable us to reject our null hypothesis. Such data are often difficult to publish because it is not possible to prove the null hypothesis. Every active research scientist has a large drawer where these data languish.

Data Repository: The Why Collecting all data at one place Statistics on downloads and citations

What makes Data Management Repositories useful? Makes available faculty, departmental and institutional research Allows publication of negative data

Research Data Repository software Characteristics Hosted locally or remotely on a server Software contains collaborative options Open source or proprietary software Wide variety of data types (Excel to SPSS to various discipline specific formats)

Perceived Benefits of Data Repository Can share publications and research data Make research data more widely available Statistics available on downloads and citations of data Savings various versions of dataset (data lifecycle) Collecting all data in one place

Research data management tools A survey was done to identify currently implemented standards, requirements and features related to research data repositories. Based on this, five well-known platform is chosen in this study, namely DSpace, CKAN, Zenodo, Figshare and Dataverse. These tools are considered and evaluated them according to a set of key aspects: architecture, metadata handling capabilities, interoperability, content dissemination, search features and community acceptance.

Architecture Class Feature DSpace CKAN Figshare Zenodo Dataverse Deployment Installation package Service Storage location Local or remote Remote Maintenance costs Infrastructure management Monthly fee e-mail based- free of cost Open Source √ × Platform customization Community policies Embargo period Private storage Content versioning Pre-reserving DOI

Architecture… Class Feature DSpace CKAN Figshare Zenodo Dataverse Metadata Required fields Title, Date of issue Title Author, title, categories description Type, DOI, author, title, description Title, Author, Description, Contact Email, Subject, and DOI Exporting schemas Any pre-loaded schema × DC DC, MARCXML XML Schema flexibility Flexible Fixed Validation √  √ Versioning

Architecture… With ckanext- harvest installer Class Feature DSpace Figshare Zenodo Dataverse Dissemination API √ OAI-PMH Compliance With ckanext- harvest installer Faceted search

Architecture Most of the above mentioned software are open source based and have given some flexibility to the users. Speedy and simple deployment of the used software is a crucial part for the implementation. Open source software can be installed in house whereas platforms like Figshare and Zenodo are to be installed and implemented by the help of the developer. Dspace, Dataverse & CKAN have better control in the recorded data as they are open source.

Architecture… The proprietary software viz Figshare or Zenodo are not viable platform for the researchers and the institution as they have to rely on the developers. DSpace, CKAN, Dataverse and Zenodo permit a customization with improvements ranging from small interface modification to the development of new data imagining plugins to satisfy the needs of their users: while Zenodo allows parametrization settings such as community-level can be further customized. DSpace, Zenodo and Dataverse permit users to stipulate embargo period whereas CKAN and Figshare have options for reserved storage to let researchers control the data publication mode.

Metadata Zenodo and Figshare software are able to export records that comply with established metadata schemas (Dublin Core and MARC-XML respectively). DSpace goes further by exporting DIPs (Dissemination Information Package) that include METS metadata records, thus enabling the ingestion of these packages into a long-term preservation workflow.

Metadata… Although CKAN and Dataverse metadata records do not follow any standard schema, the platform allows the inclusion of a dictionary of key- value pairs that can be used to record domain specific metadata as a complement to generic metadata descriptions. Neither platform natively supports collaborative validation stages where curators and researchers enforce the correct data and metadata structure, but Zenodo allows the users to create a highly curated area within communities, as highlighted in the “validation” feature. Every deposit will have to be validated by the community curator, if the policy of a particular community specifies manual validation. There is an important issue to tracking content changes in data management. CKAN provides an auditing trail of each deposited dataset by showing all changes made to it since its deposit.

Interoperability and Dissemination All of the evaluated platforms allow the development of external clients and tools as they already provide their own APIs for exposing metadata records to the outside community, but there are some differences regarding standards compliance. Zenodo and DSpace natively comply with the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol. This is a widely-used protocol that promotes interoperability between repositories while also streamlining data dissemination, and is a valuable resource for harvesters to index the contents of the repository.

Advantage of DSpace Can comply with domain-level metadata schemas Is open-source and has a wide supporting community Has an extensive, community maintained documentation Can be fully under institutions control Structured metadata representation Complaint with OAI-PMH Supports Dublin Core, and MARCXML for metadata exporting

Advantage of CKAN Is open-source and widely supported by the developer community Features extensive and comprehensive documentation Allows deep customization of its features

Advantage of CKAN… Can be fully under institutions control Supports unrestricted (non standards-compliant) metadata Has faceted search with fuzzy-matching Records datasets change logs and versioning information

Advantage of Figshare Gives credit to authors through citations and references Can export reference to Mendeley, DataCite, RefWorks, Endnote, NLM and Reference Manager Records statistics related to citations and shares Does not require any maintenance

Advantage of Zenodo Allows creating communities to validate submissions Supports Dublin Core, MARC and MARCXML for metadata exporting Can export references to BibTeX, DataCite, DC, EndNote, NLM, RefWorks Complies with OAI-PMH for data dissemination Does not require any maintenance Includes metadata records in the searchable fields

Advantage of Dataverse Is open-source and widely supported by the developer community Data Citation automatically generated Multiple Publishing Workflows Faceted Search as well as tags can be used for searches

Advantage of Dataverse… Already defines roles and also custom roles can be designed and assigned to the users Branding, metadata based facets, sub-dataverses, featured dataverses, Re-format, Summary Statistics, and Analysis for Tabular Files integration with TwoRavens

Advantage of Dataverse… Mapping of Geospatial files and integration with WorldMap Restricted Files as well as ability to request access to restricted files three level of Metadata i.e. description/citation, domain-specific or custom fields, file metadata

Advantage of Dataverse… Search API, data deposit API etc Notifications will be generated to the user and also will be communicated by mail for access request, roles, and when data is published CC0 waiver default, terms of use can be customised by user, and download statistics Can export reference to EndNote XML, RIS Format, or BibTeX Format

Conclusion Dataverse, CKAN and DSpace’s open-source licenses were highlighted that allow them to be updated and customized, while keeping the core functionalities intact. There is live demo of Dataverse, CKAN, DSpace and Zenodo. CKAN is mainly used by governmental institutions to disclose their data, its features. DSpace enables system administrators to parametrize additional metadata schemas that can be used to describe resources.

Conclusion… Dspace is often compared with Dataverse and is used for storing scientific data. Zenodo and Figshare provide ways to reserve a permanent link and a DOI, even if the actual dataset is under embargo at the time of first citation. Dspace, Dataverse and CKAN can be installed in an institutional server instead of relying on external storage provided by contracted services.

Conclusion… Dataverse repository software focuses mainly on social science data, its improvisational tools to analyze and explore only for tabular data. Geospatial data is handled by the Dataverse and with the help of worldmap Dataverse also has some features like Guestbook template which allows to record the details of the users downloading the data

Re-Mix Harvesting XML format Notification through e-mail whenever update is made. LiveDVD-Koha, DSpace, VuFind, SubjectsPlus, and WordPress. Plugin of VuFind.

References Amorim, Ricardo Carvalho; Castro, João Aguiar; Rocha, João; Ribeiro, C. (2015). A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential. In L. P. R. Alvaro Rocha, Ana Maria Correia, Sandor Costanzo (Ed.), Maturity, Benefits and Project Management Shaping Project Success (pp. 101–111). Springer International Publishing. https://doi.org/10.1007/978-3-319-16486-1_10 Amorim, R. C., Castro, J. A., Rocha da Silva, J., & Ribeiro, C. (2017). A comparison of research data management platforms: architecture, flexible metadata and interoperability. Universal Access in the Information Society, 16(4), 851–862. https://doi.org/10.1007/s10209-016-0475-y Breu, F., Guggenbichler, S., Wollmann, J. (2008). Research and Advanced Technology for Digital Libraries. Vasa. Brook, C. (2018). What is a Data Repository. Retrieved June 30, 2019, from https://digitalguardian.com/blog/what-data-repository Devarakonda, R., Palanisamy, G., Green, J. M., & Wilson, B. E. (2011). Data sharing and retrieval using OAI-PMH. Earth Science Informatics, 4(1), 1–5. https://doi.org/10.1007/s12145-010-0073-0 Institute for Quantitiative Social Sciences. (2019). Features : The Dataverse Project. Retrieved June 30, 2019, from https://dataverse.org/software- features Lyon, L. (2007). Dealing with Data : Roles , Rights , Responsibilities and Relationships Consultancy Report. JISC Digital Repositories Conference, Manchester, June 2007, (June), 1–65. Mahato, S. S., & Gajbe, S. B. (2018). A Comparative study of Open source data repository software: Dataverse and CKAN. Library Herald, 56(1), 36. https://doi.org/10.5958/0976-2469.2018.00005.2 Rocha da Silva, J., Ribeiro, C., Correia Lopes, J., da Silva, J. R., Ribeiro, C., Lopes, J. C., … Correia Lopes, J. (2012). Managing multidisciplinary research data: Extending DSpace to enable long-term preservation of tabular datasets. IPres 2012 Conference, 105–108. Retrieved from https://ipres.ischool.utoronto.ca/sites/ipres.ischool.utoronto.ca/files/iPres 2012 Conference Proceedings Willis, C., Hill, C., E-mail, N. C., Greenberg, J., Hill, C., E-mail, N. C., … E-mail, N. C. (2012). Analysis and Synthesis of Metadata Goals. Journal of the American Society for Information Science and Technology, 63(8), 1505–1520. https://doi.org/10.1002/asi