Dataverse for citing and sharing research data UbuntuNet-Connect 2018, 22 - 23 November Zanzibar, Tanzania. Sonia Barbosa, Harvard University, USA Obiajulu Odu, UiT The Arctic University of Norway
What is Dataverse Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows scholars to replicate others' work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. Developed at Harvard University’s Institute for Quantitative Social Science (IQSS)
Some Dataverse Features Data sharing and archiving with control and recognition for data producers Support for all file types Persistent data citations Persistent IDs: DOI (DataCite) HDL (Handle.net) Data restrictions Customized branding
Some Dataverse Features Rich data support for certain file formats Tabular Data ingest supports the following file formats: SPSS (POR and SAV formats), STATA, R, XLSX only, CSV (comma-separated values) Metadata extraction and subsetting Metadata extraction for flexible image transport system (FITS) data Subsetting for social network data (GraphML)
Some Dataverse Features Data management, standards, and archival best practices Data versioning General and domain-specific metadata following metadata standards Traffic and downloads tracking with Guestbook feature Permanent storage of data in preservation formats Geographically distributed preservation copies
Some Dataverse Features OAI--‐PMH: Harvesting metadata (DC, DDI) From other Dataverse installations From other OAI-DC compliant repositories Shibboleth authentication Oauth Login ORCID GitHub Google
What is Dataverse and Dataset? Schematic Diagram of a Dataverse og Dataset in Dataverse 4.x Container for Datasets and/or Dataverses Container for data, documentation, and code. Dataverses can contain other Dataverses
Simple Workflows
Data Citation Example
Some Organizational Models Global: Harvard Dataverse Consortium: The Texas Digital Library (TDL) is a consortium of higher education institutions in Texas. National: DataverseNo - Institutes pay a fixed fee for participation in DataverseNO, to which storage costs for the data are added. Every participating institute determines its own policy. Singel Institution, Journals Courses, Private archive
Structure at Participating Institution 1. University Library supports and trains researchers in data curation and data management plans 4. Controlled storage at IT-Dept. – Trusted Digital Repository 3. University Library reviews metadata, and see that DOI is allocated and dataset is in order 2. The researchers curate their data during the project and deposit it into Dataverse Persistent format in the archive University Library supports the researcher. Back- office support from the Library
Plan and Activities Training and workshops Support services Outreach strategies Promotions Infrastructure development and needs.
Adopted by institutions worldwide
Developers, Integration, interoperability The Dataverse Development Community is an active group of internal and external contributors to the Dataverse software codebase. Dataverse APIs cover almost every piece of functionality available to users in the UI. You can integrate an existing application with Dataverse or use an API to interact with all the data in a Dataverse installation Data explorer integration tool to visualize Dataverse DDI Metadata
Some Incentives and Benefits Dataverse provides incentives for researchers to share: Recognition & credit via data citations Control over data & branding Fulfill Data Management Plan requirements Benefits for Researchers Using Dataverse Safe and long-term data storage in preservation format. Allow users to download your data in any format and run many advanced statistical methods online.
Some Major Future Plans Sensitive Data Support through DataTags Embargo - Researchers will be able to set an availability schedule for their data. Preserve File Hierarchy - Researchers can preserve a dataset's files' directory structure, for easy import, computation, and navigation. Make Data Count Integration - Dataverse will integrate with Make Data Count and report standardized usage metrics. Global Dataverse Community Consortium
Any questions / yoyote ya sporrs Thank you / Asante! Any questions / yoyote ya sporrs Contact: Obiajulu Odu <obiajulu.odu@uit.no Sonia Barbosa <sbarbosa@g.harvard.edu> Learn more: dataverse.org and Forum Try out Dataverse: demo.dataverse.org or test.dataverse.no
References https://dataverse.org/ https://dataverse.no/ https://dataverse.harvard.edu/ https://ils.unc.edu/digccurr/curategear2015-talks/crabtree.pdf https://dataverse.org/files/dataverseorg/files/dlfdataverse_quigley.p df Forum: https://groups.google.com/forum/#!forum/dataverse-community