WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI:10.6084/m9.figshare.1439750.

Slides:



Advertisements
Similar presentations
Goals Rob Procter Dave Berry Anne Trefethen Paul Watson.
Advertisements

Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
Journals Full Text Resources Including MedIND. For Scholarly Information We start with Bibliographic Databases having references to journals and other.
Andy Nicholls – Head of Consultancy DevelopR – formalising R Development.
Data Publishing Workflows: Strategies and Standards
Knowledge and solutions for a changing world Adventures in computational reproducible research for ribosomal based community profiling Dave Beck
FROM DATA REPOSITORIES TO DATA JOURNALS – WHERE, WHEN AND HOW TO SUBMIT Andrew L. Hufton Managing Editor, Scientific Data Nature Publishing Group
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
INTRODUCTION TO RESEARCH DATA MANAGEMENT Robin Desmeules Janice Kung J W Scott Health Sciences Library University of Alberta Libraries.
Software workflows as research objects Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015 Slideshow-URL.
Publishing and crediting different shaped research objects the way Scott Edmunds, #FORCE2015.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Tools for reproducible and accessible science KnitR, VMs and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015 DOI for this talk: /m9.figshare
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard Society for Scholarly Publishing 37 th Meeting,
Data Citation: the next big thing… ?!?! 1 Victoria University 20 Nov
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Web 2.0 Tools Used in the Finance/Investment Management Industry.
EZID Easy Identifiers UC Curation Center California Digital Library.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
Royal Society of Chemistry activities to develop a data repository for chemistry-specific data Aileen Day, Alexey Pshenichnov, Ken Karapetyan, Colin Batchelor,
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Software Sustainability Institute Dealing with software: the research data issues 26 August.
WIRESCRIPT1 WIRESCRIPT Web Interactive REview of Scientific Culture, Research, Innovation Policy and Technology.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Software Sustainability Institute Software Attribution can we improve the reusability and sustainability of scientific software?
OpenUCT Initiative Addressing Academic Profile: New Tools and Services for Boosting Online Visibility Sarah Goodier OpenUCT 10 October 2013 OpenUCT Initiative.
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Data Science Background and Course Software setup Week 1.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
Facilitating Next Generation Science Collaboration: Marine Ecosystems Status Reports and Assessments June 24, 2014 IMBER – D2 Peter Fox (RPI/ Tetherless.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.
Requirements Engineering Requirements Validation and Management Lecture-24.
Title Presenter name Slideshow-URL Conference name Date.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
Open Science Framework Jeffrey Spies University of Virginia.
Pointers to more effective software & data in audio research Mark D Plumbley, Chris Cannam, Steve Welburn Centre for Digital Music Queen Mary, University.
Beyond the PDF: New modes of dissemination Experiments from PLOS Theo Bloom, Editorial Director for Biology, PLOS Amsterdam, March 2013.
Publication Ethics Webinar: Jan 2016 (Ethical) framework for author-driven publishing Dr Michaela Torkar Editorial Director, F1000Research
Open Access & Researcher Support UWTSD Partnership Librarians Conference 5 th May 2016.
OPEN SCIENCE PUBLISHING: BEYOND OPEN ACCESS MAX PLANCK OPEN ACCESS AMBASSADORS CONFERENCE, 4 December 2014 Michaela Torkar Editorial Director, F1000 Research.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Interactive Science Publishing: A Joint OSA-NLM Project Michael J. Ackerman National Library of Medicine John Childs Optical Society of America.
Updating image To update the background image: Go to ‘View’ Select ‘Slide Master’ Select the page with the image Right click on the image and select ‘Change.
ScienceOpen: Scientific Publishing for “Generation Open” Open Access Ambassadors Conference, December, Munich Dr. Stephanie Dawson, CEO.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Jennie Larkin, PhD Senior Advisor
Peter Li GigaScience GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis Peter Li GigaScience.
Indexing (and other good ideas)
DaViTPy (Data Visualization Toolkit – Python)
Edmunds GigaScience 2013 POSTER Open Access
Pasquale Pagano CNR, Italy
GigaDB – revolutionizing data dissemination, organization and use
Install WordPress Premium Theme & Customization. Every developer knows that WordPress is a free content management system, such as a easy blogging tool.
From Observational Data to Information (OD2I IG )
Contributor Roles, Open Badges
Preprints and literature provenance in Europe PMC
Research Data Dr Aoife Coffey, Research Data Coordinator
Preprints and literature provenance in Europe PMC
Presentation transcript:

WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare

Up ahead The need for Open Data in science GigaScience and GigaDB Everything is data Open is accessible Literate programming So, what are we going to do with data? DOI: /m9.figshare

THE NEED FOR OPEN DATA IN SCIENCE DOI: /m9.figshare

Researcher bias Positive result bias  20 teams do studies, 1 publishes p<0.05 Poorly explained analyses DOI: /journal.pmed DOI: /m9.figshare

Problem: Reproducibility Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced 5 DOI: /ng.295 DOI: /m9.figshare

Software? “The good news is that I was able to find some code. I am just hoping that it is a stable working version of the code... I have lost some data... The bad news is that the code is not commented and/or clean. So, I cannot really guarantee that you will enjoy playing with it.” 613 papers tested 123 successful reproductions DOI: /m9.figshare

DOI: /journal.pmed % of research resources are wasted! We must... favor... unbiased, transparent, collaborative research with greater standardization Share data, protocols, materials, software, other tools DOI: /m9.figshare

What, why Open Data? Knowledge is open if anyone is – free to access, – use, – modify, – and share it – subject, at most, to measures that preserve provenance and openness. DOI: /m9.figshare

FAIR Data DOI: /m9.figshare

GIGASCIENCE AND GIGADB DOI: /m9.figshare

The publishing tradition DOI: /m9.figshare

The publishing tradition Aimed at paper product Limited length Limited detail No supporting data No supporting code Poor images Limited figures DOI: /m9.figshare

Anatomy of a traditional Publication Data Idea Study Analysis Answer Metadata 13 DOI: /m9.figshare

Anatomy of an Open Data Publication 14 Data Idea Study Analysis Answer Metadata DOI: /m9.figshare

Multi-faceted publication Open-access journal Data Publishing Platform Data Analysis Platform Data Metadata Methods Analyses DOI: /m9.figshare

“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 16 DOI: /m9.figshare

“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 17 DOI: /m9.figshare

“Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 18 DOI: /m9.figshare

Image Source: “Deconstructed” Journal “Regular” Journal “Conscientious” Online Journal 19 DOI: /m9.figshare

EVERYTHING IS DATA DOI: /m9.figshare

Data is data DOI: / X-3-7 DOI: /m9.figshare

Software is data “For loading data from the provided datasets, a script that can load individual spectra or images is provided” DOI: DOI: /m9.figshare

Metadata is data Findable, reusable… Bioontologies/ISA-Tab – Standard language ORCID – Unique, traceable authors Fundref – Track funding outputs API’s – Easy search DOI: /m9.figshare

ACCESSIBLE, USABLE DATA DOI: /m9.figshare

Curation Not all science data is pretty ISA-Tab, SRA helps Peer reviewed data is better data DOI: /m9.figshare

Software pipelines Gigagalaxy.net Tool List Tool Parameters History/results DOI: /m9.figshare

Visualise pipelines DOI: /m9.figshare Gigagalaxy.net

Reproducing results? SOAPdenovo2 S. aureus pipeline DOI: / X-1-18 DOI: /m9.figshare

Easy installation Virtual machine – Pre-installed – Peer-reviewed – Reproducibility, frozen in time DOI: / X-3-23 DOI: /m9.figshare

Literate programming Data journalism for all! KnitR, iPython, project Jupyter DOI: / X-3-3 DOI: /m9.figshare

WHAT ARE WE GOING TO DO WITH DATA? DOI: /m9.figshare

Add value DOI: /m9.figshare

Do science? Data – DOI: / Subsequent analysis – DOI: /scitranslmed Science journalism – Why not do part 2 as well? DOI: /m9.figshare

Summary Science has problems – so how good can science journalism be? Things are changing – slowly The future is bright The future is data-driven Data journalists will be the new scientists? DOI: /m9.figshare

THANKS! DOI: /m9.figshare GigaScience team: Scott Edmunds Peter Li Chris Hunter Jesse Xiao Rob Davidson