Download presentation
Presentation is loading. Please wait.
Published byEustace Chambers Modified over 9 years ago
1
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson GigaScience @bobbledavidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750
2
Up ahead The need for Open Data in science GigaScience and GigaDB Everything is data Open is accessible Literate programming So, what are we going to do with data? DOI:10.6084/m9.figshare.1439750
3
THE NEED FOR OPEN DATA IN SCIENCE DOI:10.6084/m9.figshare.1439750
4
Researcher bias Positive result bias 20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: 10.1371/journal.pmed.0020124 DOI:10.6084/m9.figshare.1439750
5
Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 5 DOI: 10.1038/ng.295 DOI:10.6084/m9.figshare.1439750
6
Software? http://reproducibility.cs.arizona.edu/ “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI:10.6084/m9.figshare.1439750
7
DOI: 10.1371/journal.pmed.1001747 85% of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI:10.6084/m9.figshare.1439750
8
What, why Open Data? Knowledge is open if anyone is – free to access, – use, – modify, – and share it – subject, at most, to measures that preserve provenance and openness. http://opendefinition.org/od/ DOI:10.6084/m9.figshare.1439750
9
FAIR Data http://datafairport.org/ DOI:10.6084/m9.figshare.1439750
10
GIGASCIENCE AND GIGADB DOI:10.6084/m9.figshare.1439750
11
The publishing tradition 1812 16651869 DOI:10.6084/m9.figshare.1439750
12
The publishing tradition Aimed at paper product Limited length Limited detail No supporting data No supporting code Poor images Limited figures DOI:10.6084/m9.figshare.1439750
13
Anatomy of a traditional Publication Data Idea Study Analysis Answer Metadata 13 DOI:10.6084/m9.figshare.1439750
14
Anatomy of an Open Data Publication 14 Data Idea Study Analysis Answer Metadata DOI:10.6084/m9.figshare.1439750
15
Multi-faceted publication Open-access journal Data Publishing Platform Data Analysis Platform Data Metadata Methods Analyses DOI:10.6084/m9.figshare.1439750
16
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 16 DOI:10.6084/m9.figshare.1439750
17
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 17 DOI:10.6084/m9.figshare.1439750
18
“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 18 DOI:10.6084/m9.figshare.1439750
19
Image Source: http://commons.wikimedia.org/wiki/File:System-Mechanic-California.jpg “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 19 DOI:10.6084/m9.figshare.1439750
20
EVERYTHING IS DATA DOI:10.6084/m9.figshare.1439750
21
Data is data DOI:10.1186/2047-217X-3-7 DOI:10.6084/m9.figshare.1439750
22
Software is data “For loading data from the provided datasets, a script that can load individual spectra or images is provided” DOI: DOI:10.6084/m9.figshare.1439750
23
Metadata is data Findable, reusable… Bioontologies/ISA-Tab – Standard language ORCID – Unique, traceable authors Fundref – Track funding outputs API’s – Easy search DOI:10.6084/m9.figshare.1439750
24
ACCESSIBLE, USABLE DATA DOI:10.6084/m9.figshare.1439750
25
Curation Not all science data is pretty ISA-Tab, SRA helps Peer reviewed data is better data http://bit.ly/1F47YZz DOI:10.6084/m9.figshare.1439750
26
Software pipelines Gigagalaxy.net Tool List Tool Parameters History/results DOI:10.6084/m9.figshare.1439750
27
Visualise pipelines DOI:10.6084/m9.figshare.1439750 Gigagalaxy.net
28
Reproducing results? SOAPdenovo2 S. aureus pipeline DOI: 10.1186/2047-217X-1-18 DOI:10.6084/m9.figshare.1439750
29
Easy installation Virtual machine – Pre-installed – Peer-reviewed – Reproducibility, frozen in time DOI:10.1186/2047-217X-3-23 DOI:10.6084/m9.figshare.1439750
30
Literate programming Data journalism for all! KnitR, iPython, project Jupyter DOI:10.1186/2047-217X-3-3 DOI:10.6084/m9.figshare.1439750
31
WHAT ARE WE GOING TO DO WITH DATA? DOI:10.6084/m9.figshare.1439750
32
Add value http://bit.ly/1JyTfxO DOI:10.6084/m9.figshare.1439750
33
Do science? Data – DOI: 10.5524/100034 Subsequent analysis – DOI: 10.1126/scitranslmed.3006086 Science journalism – http://bit.ly/1AXEkKJ Why not do part 2 as well? DOI:10.6084/m9.figshare.1439750
34
Summary Science has problems – so how good can science journalism be? Things are changing – slowly The future is bright The future is data-driven Data journalists will be the new scientists? DOI:10.6084/m9.figshare.1439750
35
THANKS! DOI:10.6084/m9.figshare.1439750 GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.