Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project.

Slides:



Advertisements
Similar presentations
The Australian National Data Service Ross Wilkinson Feb 26 th
Advertisements

© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
VCC3 Proposal Organisation of the tasks Sophie David, Jean-Luc Minel 28 th -29 th August 2012, Dublin.
Protocol Author Process People Technology
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
Technology and Economic Development Intellectual Property Issues in Research Jim Baker Director Office of Technology and Economic Development
Publication Issues GCP for clinical trials in India R.Raveendran Chief Editor Indian Journal of Pharmacology.
VIVO and Linked Open Data December 13, 2010 Dean B. Krafft Chief Technology Strategist and Director of IT Cornell University Library.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
University of Southampton, U.K.
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Data preservation & the Virtual Observatory Bob Mann Wide-Field Astronomy Unit Royal Observatory Edinburgh
Data Publishing Workflows: Strategies and Standards
University East Saraejvo Introduction  In accordance with the Law on Higher Education Republic of Srpska, the University of East Sarajevo organizes.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
The Registration of Clinical Trials Deborah A. Zarin, M.D. Director, ClinicalTrials.gov May 2007.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
CALGB Informational Session June 22, 2007 David Hurd, MD Interim Chair Data Audit Committee.
Mike Conlon Here’s Mike on a conference call from his home. Mike spends a lot of time on conference calls from his home, and from coffee shops in and around.
Sage Bionetworks Mission Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
PURR: A RESEARCH DATA CURATION SERVICE MODEL USING HUBZERO Courtney Earl Matthews Digital Data Repository Specialist HUBBUB 2012 Purdue University.
THEME 1: Improving the Experimentation and Discovery Process Unprecedented complexity of scientific enterprise Is science stymied by the human bottleneck?
VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida
CONFLICTS OF INTEREST PRESENTED BY THE UMMC OFFICE OF INTEGRITY AND COMPLIANCE.
3 June 2010National Academies - BRDI1 Research Data and Information: Recent Developments and Continuing NIH Interests Jerry Sheehan Assistant Director.
UVa Library Research Data Services
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
8 October 2009Microbial Research Commons1 Toward a biomedical research commons: A view from NLM-NIH Jerry Sheehan Assistant Director for Policy Development.
Moving from a locally-developed data model to a standard conceptual model Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
| 14 | Role for Libraries in Data Curation & Preservation | ODE Workshop, Tartu, 27 June Role for Libraries in Data Curation & Preservation Sabine.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Background Researchers and funders continue to be concerned about the lack of archiving of scientific data. Such data can be useful to researchers, educators,
SHARE (SHared Access Research Ecosystem) Tyler Walters Co-Chair, SHARE Steering Group (a joint committee of the ARL, the AAU, and the APLU) Eric Celeste.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Bioinformatics and Computational Biology
Citizen Science: What is it?. The terms citizen science and citizen scientists entered the Oxford English Dictionary in June Citizen science is.
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
DOE Data Management Plan Requirements
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Data Management Lesley A. Brown Director of Proposal Development.
Critical Care Department St. Michael’s Hospital Research Program © 2003 Critical Care Dept, St. Michael’s Hospital.
Federal Funder open data and literature requirements January 15, 2016 RAWG Meeting.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
Erich Gombocz VP, Chief Scientific Officer IO Informatics, Inc. NCBO Seminar Series – Wednesday, August 5, 2009 – 10:00 AM PDT.
Building on VIVO and going the next step: Adding or Linking to Local and National repositories and/or research data; research resources and core facilities;
Aalto Research Data Management Policy Ella Bingham 8 April 2016 This work is licensed under the Creative Commons Attribution 4.0 International License.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
© CDISC 2015 Paul Houston CDISC Europe Foundation Head of European Operations 1 CTR 2 Protocol Representation Implementation Model Clinical Trial Registration.
Ethical Considerations Dr. Richard Adanu Editor-in-Chief International Journal of Gynecology and Obstetrics (IJGO)
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Jeff Moon Data Librarian &
Paul Houston CDISC Europe Foundation Head of European Operations
Institutional role in supporting open access, open science, open data
An ecosystem of contributions
Carolina Mendoza-Puccini, MD
LOD reference architecture
Developing Institutional Data Repositories
Research data lifecycle²
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project Director, Duraspace

Where does scientific data come from?

5. Harvesting

Get a world map showing temperature sensors

What are the scientific data processes?

LibrariesInstitutesAgenciesCorporations Domain Scientist Data Scientist Other Scientist Archivists preserve data At Agencies, Libraries, Institutes, Universities, Corporations, Hospitals, … Some semantics, strong preservation, variable access policies and procedures Scientists create data Limited/local machine-readable semantics, irregular preservation, highly variable means of production Scientists share data Disclosure, discovery, data use agreements, some curation requirements, some formatting requirements Scientists prepare for reuse Access control, final format and semantic alignment Scientists reuse data Strong semantics, irregular preservation, variable processes, reproducible findings

What is sharing? Providing scientific data that others can use

Is sharing important?

Yes: Scientific argument -- linking Science is reductionist by nature -- different groups working on different parts of a problem. By combining results across the parts, we learn things.

A Reuse Scenario Find all faculty members whose genetic work is implicated in breast cancer VIVO stores information about faculty and associates to genes (via PubMed data). DisGeNet, and others, associate genes to diseases. Query resolves across VIVO and data sources it links to.

Data Linking Data linking continues to be a serious bottleneck for the expectations of increased productivity in the pharmaceutical and biotechnology domain. “Linked Life Data” integrates common public datasets that describe the relationships between gene, protein, interaction, pathway, target, drug, disease and patient and currently consist of more than 5 billion RDF statements. The dataset interconnects more than 20 complete data sources and helps to understand the “bigger picture” of a research problem by linking previously unrelated data from heterogeneous knowledge. From the LarKC (Large Knowledge Collider), 2011,

Yes: Scientific argument -- pooling By combining or pooling data from multiple experiments, we can perform a “mega-analysis,” increasing power to detect effects. Pooling is not currently common. Sharing sufficient for pooling is rare.

Yes: Scientific argument -- negative findings Negative findings are difficult to publish. This inherent bias in the literature could be counterbalanced by publishing _all_ data from well-conducted studies, regardless of their findings. This is a rationale behind the NCI Cancer Study Registry, and ClinicalTrials.gov, the US clinical trials registry.

Yes: Ethical argument Without sharing of knowledge, there would be no advance in science (But must the data be shared, or just the findings?)

Yes: Public funding argument The taxpayers paid for the products of research. The data do not belong to the researcher, they belong to the sponsor. (Quite an unpopular argument for scientists)

Why don’t scientists share their data?

“Competitive advantage” “I’m not finished with the data yet. If I share now, my competitors may publish or submit grant proposals using my data” “If you share and I don't, I have your data and my data and you don't”. (Prisoner’s Dilemma)

“Too difficult” Includes “I'm not paid to share my data” Sharing comes with costs -- creating metadata, formatting data, agreeing to access terms, data use agreements, licensing, and providing data. These costs may not be explicitly covered by sponsors. Scientists may feel that resources spent on data sharing are taking resources from data production.

“I don't know how to share my data” In some cases, the processes for sharing data may be quite complex, and unknown to the scientist

“No reward for sharing” Not on my vitae Not in my promotion packet My sponsor does not penalize me for not sharing my data

“My sponsor/employer won't let me share” Depending on the nature of the research, the sponsor, and the employer, the scientist may be prevented from sharing

“Shared data cannot be reused” The context of the shared data is missing. Only the scientists who originally produced the data understand the data sufficiently well for meaningful reuse.

The Context of the Data

The paper contains some of the context Perhaps we outgrew the paper -- the context became complex Recent concern regarding reproducibility of science demonstrate limitations of the paper for providing adequate context

The protocol contains some of the context The protocol may not be complete Protocols often use local conventions that may not provide context to others May lack specificity required for reuse (versions of software used to produce data from instruments, InChI for chemical compounds) Protocols are often not published If published, they may not machine-readable If they are machine-readable, they may not be linked to the collected data

What should be done to improve scientific data sharing?

LibrariesInstitutesAgenciesCorporations Domain Scientist Data Scientist Other Scientist Archivists preserve data At Agencies, Libraries, Institutes, Universities, Corporations, Hospitals, … Some semantics, strong preservation, variable access policies and procedures Scientists create data Limited/local machine-readable semantics, irregular preservation, highly variable means of production Scientists share data Disclosure, discovery, data use agreements, some curation requirements, some formatting requirements Scientists prepare for reuse Access control, final format and semantic alignment Scientists reuse data Strong semantics, irregular preservation, variable processes, reproducible findings