Knowledge and solutions for a changing world Adventures in computational reproducible research for ribosomal based community profiling Dave Beck

Slides:



Advertisements
Similar presentations
Work Flows of the Online Review System Copernicus Office Editor Copernicus Publications | April 2014.
Advertisements

Strategies for solving scientific problems using computers.
15.2 Recombinant DNA.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
1 Explain What does Miller and Urey’s experiment tell us about the organic compounds needed for life Predict You just read that life arose from nonlife.
CS 3500 SE - 1 Software Engineering: It’s Much More Than Programming! Sources: “Software Engineering: A Practitioner’s Approach - Fourth Edition” Pressman,
Data Citation for the Social Sciences Mary Vardigan ICPSR CODATA Conference on Data Attribution and Citation August 22-23, 2011.
Research Methods, Overview 1 There are hundreds of Scientific fields and disciplines, ranging from the Physical Sciences, to the Life Sciences, to the.
Program Flow Charting How to tackle the beginning stage a program design.
Methane CH4 Greenhouse gas (~20x more powerful than CO2)
Program Flow Charting How to tackle the beginning stage a program design.
Research Proposal and Dissertation Daing Nasir Ibrahim.
ETL By Dr. Gabriel.
17-2 Earth’s Early History
Jake Blanchard – University of Wisconsin – August 2007.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Studying the State of Our Earth
Lesson 6.2 Exponential Equations
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Unit 1 The Basics of Biology. Goals of All Science Investigate and Understand the natural world Explain what happens in the natural world Predict what.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
NSF Data Management Plan Requirement Presentation May 25, 2011 William Mischo & Mary Schlembach.
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
Evolution & Classification of Microbes Unit 10
Science Fair How To Get Started… (
The History of Life on Earth. Grand Canyon Earth’s Early History How did the Earth form? – Scientists must explain past events in terms of processes.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
March 1st, 2006Prospective PNG PNG: Databases - Virtual Observatory.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Engineering Projects In Community Service Matt Mooney Community Based Research University of Notre Dame.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
FYP 446 /4 Final Year Project 2 Dr. Khairul Farihan Kasim FYP Coordinator Bioprocess Engineering Program Universiti Malaysia Perls.
Design and implementation Chapter 7 – Lecture 1. Design and implementation Software design and implementation is the stage in the software engineering.
Digital Archives You Can Do It! The Collective - March 2016 Paul Kelly - Digital Archivist - The Catholic University of America.
User Acceptance Testing The Hard Way Graham Thomas BCS SIGIST 10 th May 1996.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
High throughput biology data management and data intensive computing drivers George Michaels.
1 Multimedia Development Team. 2 To discuss phases of MM production team members Multimedia I.
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Insert picture of lake from 1st page of ch Chapter 1 Studying the State of Our Earth.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
13-2: Manipulating DNA Biology 2. Until very recently breeders could not change the DNA of the plants/animals they were breeding Scientists use DNA structure.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Section: The Case for Data Stewardship.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Save the Code? What to do with Short research codes
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Genetic Engineering.
exRNA Metadata Standards
Cellular Respiration Stage 1: Glycolysis
The History of Life on Earth
How to publish your research
Data Management: Documentation & Metadata
PMI Shelter Manual Review
Introduction to Computer Programming
Forging the Innovation Generation
Origin of Life What do you think the first organism was like?
Find API Usage Patterns
Cellular Respiration Stage 1: Glycolysis
Overview of Workflows: Why Use Them?
Re- engineeniering.
Genetic Egineering Isolation Cutting Ligation and Insertion
Computational Pipeline Strategies
Research Data Dr Aoife Coffey, Research Data Coordinator
Python4ML An open-source course for everyone
Presentation transcript:

Knowledge and solutions for a changing world Adventures in computational reproducible research for ribosomal based community profiling Dave Beck

Knowledge and solutions for a changing world Background Methane (CH 4 ) is a greenhouse gas –85x more potent than CO 2 –Atmospheric [CH 4 ] have increased 150% / 200 years

Knowledge and solutions for a changing world Chicago Minneapolis – St. Paul Bakken Shale (CH 4 flares)

Knowledge and solutions for a changing world Background Methane (CH 4 ) is a greenhouse gas –85x more potent than CO 2 –Atmospheric [CH 4 ] have increased 150% / 200 years Methane has been present on the planet since life began 3.6 billion years ago –Something must have evolved to consume methane –Evidence of this in bacterial record from 2.73 billion years ago Can we identify who the modern day bacteria are that consume methane? Can they be engineered to consume more?

Knowledge and solutions for a changing world Strategy Collect env. samples that metabolize CH 4 Enrich the communities for CH 4 utilizers Extract DNA from samples Sequence the 16S region of each sample (454) Extract, transform, load & clean –39 samples w/ 100,000s reads Perform sequence clustering Naïve Bayes taxonomy classification of seqs. Classical correspondence analysis of taxonomy abundance data –Understand how patterns of species originate from their metabolic interactions to utilize CH 4 Publish

Knowledge and solutions for a changing world Methods section

Knowledge and solutions for a changing world Deposit raw data Put the raw data into NCBI BioProject with metadata for the study

Knowledge and solutions for a changing world Deposit raw data Including sample metadata such as collection date, GPS coordinates and sequencing methodology / protocol

Knowledge and solutions for a changing world Deposit source code Transferred code from a local SVN repo to github.com

Knowledge and solutions for a changing world Deposit source code Added some documentation on pipeline requirements and basic usage

Knowledge and solutions for a changing world Publish (ISME Journal)

Knowledge and solutions for a changing world How did we do? Version control Replicable computations Data & code provenance, sharing & archiving –Data –Code Replicable environment –Requirements documentation –Virtual machine + - ?

Knowledge and solutions for a changing world How did we do? Version control –Transitioned from local SVN to Git after paper written +

Knowledge and solutions for a changing world How did we do? Version control Replicable computations –Used scripts for steps and to run the pipeline –Final figures tweaked by hand + + -

Knowledge and solutions for a changing world Generated figure

Knowledge and solutions for a changing world Final figure

Knowledge and solutions for a changing world How did we do? Version control Replicable computations Data & code provenance, sharing & archiving –Data –Code + +/-+/- + +

Knowledge and solutions for a changing world How did we do? Version control Replicable computations Data & code provenance, sharing & archiving –Data –Code Replicable environment –Requirements documentation –Virtual machine /-+/-

Knowledge and solutions for a changing world How did we do? Version control Replicable computations Data & code provenance, sharing & archiving –Data –Code Replicable environment –Requirements documentation –Virtual machine Can’t! The usearch tool used by the pipeline license forbids + + +/-+/

Knowledge and solutions for a changing world How did we do? Version control Replicable computations Data & code provenance, sharing & archiving –Data –Code Replicable environment –Requirements documentation –Virtual machine + + +/-+/ /-+/- + -

Knowledge and solutions for a changing world Lessons Use the same version control system from start to finish Waiting until the paper is accepted means the code DOI has to go in during proof stage Final figures in scripts can be hard but is worth the effort