Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project.

Similar presentations


Presentation on theme: "Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project."— Presentation transcript:

1 Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project Director, Duraspace https://dx.doi.org/10.6084/m9.figshare.3144739.v1

2 Where does scientific data come from?

3

4

5 5. Harvesting

6 Get a world map showing temperature sensors

7

8 What are the scientific data processes?

9 LibrariesInstitutesAgenciesCorporations Domain Scientist Data Scientist Other Scientist Archivists preserve data At Agencies, Libraries, Institutes, Universities, Corporations, Hospitals, … Some semantics, strong preservation, variable access policies and procedures Scientists create data Limited/local machine-readable semantics, irregular preservation, highly variable means of production Scientists share data Disclosure, discovery, data use agreements, some curation requirements, some formatting requirements Scientists prepare for reuse Access control, final format and semantic alignment Scientists reuse data Strong semantics, irregular preservation, variable processes, reproducible findings

10 What is sharing? Providing scientific data that others can use

11 Is sharing important?

12 Yes: Scientific argument -- linking Science is reductionist by nature -- different groups working on different parts of a problem. By combining results across the parts, we learn things.

13 A Reuse Scenario Find all faculty members whose genetic work is implicated in breast cancer VIVO stores information about faculty and associates to genes (via PubMed data). DisGeNet, and others, associate genes to diseases. Query resolves across VIVO and data sources it links to.

14 Data Linking Data linking continues to be a serious bottleneck for the expectations of increased productivity in the pharmaceutical and biotechnology domain. “Linked Life Data” integrates common public datasets that describe the relationships between gene, protein, interaction, pathway, target, drug, disease and patient and currently consist of more than 5 billion RDF statements. The dataset interconnects more than 20 complete data sources and helps to understand the “bigger picture” of a research problem by linking previously unrelated data from heterogeneous knowledge. From the LarKC (Large Knowledge Collider), 2011, http://www.larkc.euhttp://www.larkc.eu

15 Yes: Scientific argument -- pooling By combining or pooling data from multiple experiments, we can perform a “mega-analysis,” increasing power to detect effects. Pooling is not currently common. Sharing sufficient for pooling is rare.

16 Yes: Scientific argument -- negative findings Negative findings are difficult to publish. This inherent bias in the literature could be counterbalanced by publishing _all_ data from well-conducted studies, regardless of their findings. This is a rationale behind the NCI Cancer Study Registry, and ClinicalTrials.gov, the US clinical trials registry.

17 Yes: Ethical argument Without sharing of knowledge, there would be no advance in science (But must the data be shared, or just the findings?)

18 Yes: Public funding argument The taxpayers paid for the products of research. The data do not belong to the researcher, they belong to the sponsor. (Quite an unpopular argument for scientists)

19 Why don’t scientists share their data?

20 “Competitive advantage” “I’m not finished with the data yet. If I share now, my competitors may publish or submit grant proposals using my data” “If you share and I don't, I have your data and my data and you don't”. (Prisoner’s Dilemma)

21 “Too difficult” Includes “I'm not paid to share my data” Sharing comes with costs -- creating metadata, formatting data, agreeing to access terms, data use agreements, licensing, and providing data. These costs may not be explicitly covered by sponsors. Scientists may feel that resources spent on data sharing are taking resources from data production.

22 “I don't know how to share my data” In some cases, the processes for sharing data may be quite complex, and unknown to the scientist

23 “No reward for sharing” Not on my vitae Not in my promotion packet My sponsor does not penalize me for not sharing my data

24 “My sponsor/employer won't let me share” Depending on the nature of the research, the sponsor, and the employer, the scientist may be prevented from sharing

25 “Shared data cannot be reused” The context of the shared data is missing. Only the scientists who originally produced the data understand the data sufficiently well for meaningful reuse.

26 The Context of the Data

27 The paper contains some of the context Perhaps we outgrew the paper -- the context became complex Recent concern regarding reproducibility of science demonstrate limitations of the paper for providing adequate context

28 The protocol contains some of the context The protocol may not be complete Protocols often use local conventions that may not provide context to others May lack specificity required for reuse (versions of software used to produce data from instruments, InChI for chemical compounds) Protocols are often not published If published, they may not machine-readable If they are machine-readable, they may not be linked to the collected data

29 What should be done to improve scientific data sharing?

30 LibrariesInstitutesAgenciesCorporations Domain Scientist Data Scientist Other Scientist Archivists preserve data At Agencies, Libraries, Institutes, Universities, Corporations, Hospitals, … Some semantics, strong preservation, variable access policies and procedures Scientists create data Limited/local machine-readable semantics, irregular preservation, highly variable means of production Scientists share data Disclosure, discovery, data use agreements, some curation requirements, some formatting requirements Scientists prepare for reuse Access control, final format and semantic alignment Scientists reuse data Strong semantics, irregular preservation, variable processes, reproducible findings


Download ppt "Improving Research Data Sharing and Reuse: Scientists and Repositories Michael Conlon, PhD Emeritus Faculty Member, University of Florida VIVO Project."

Similar presentations


Ads by Google