A Comprehensive Workflow for Microbial Genome Sequencing From Swab to Publication Madison I. Dunitz 1, David A. Coil 1, Jenna M. Lang 1, Guillaume Jospin.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

Next-Generation Sequencing: Methodology and Application
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
GENOME SEQUENCING AND OBJECTIVES
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Metabarcoding 16S RNA targeted sequencing
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics Workshop.  We started by discussing what bioinformatics is and how it is used  We learned that DNA is the information about an organism.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
DNA BARCODING CHILLIES BIO-NERDS : Say Wah Yugraj Singh Tanja Obradovic Jenny Pham Lovita Bharossa Buai Chuol Diana Corzo.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Introduction to next generation sequencing Rolf Sommer Kaas.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
Organizing information in the post-genomic era The rise of bioinformatics.
RNA Sequencing I: De novo RNAseq
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Genomics.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Analyzing Time Course Data: How can we pick the disappearing needle across multiple haystacks? IEEE-HPEC Bioinformatics Challenge Day Dr. C. Nicole Rosenzweig.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
By Chris Paine Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Metagenome analysis Natalia Ivanova MGM Workshop February 2, 2012.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Culturable Bacterial Communities Analyzer DIANA VANESSA SARRIA-ZUNIGA ELIANA TORRES-ZELADA April 29, 2016.
Canadian Bioinformatics Workshops
Virginia Commonwealth University
Research Techniques Made Simple: Next-Generation Sequencing:
Introduction to Genes and Genomes with Ensembl
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Cancer Genomics Core Lab
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Tutorial for using Case It for bioinformatics analyses
Workshop on the analysis of microbial sequence data using ARB
Access to Sequence Data and Related Information
H = -Σpi log2 pi.
Mukoye B., Mangeni B. C., Ndong’a M. F. O. and Were H. K.
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Basic Local Alignment Search Tool (BLAST)
Introduction to Sequencing
Introduction to Bioinformatics
Presentation transcript:

A Comprehensive Workflow for Microbial Genome Sequencing From Swab to Publication Madison I. Dunitz 1, David A. Coil 1, Jenna M. Lang 1, Guillaume Jospin 1, Aaron E. Darling 2, Jonathan A. Eisen 1 UC Davis Genome Center 1 University of California, Davis; 2 ithree Institute, University of Sydney, Australia The sequencing and de novo assembly of microbial genomes has already yielded enormous scientific insight revolutionizing a diverse collection of fields, from epidemiology to ecology. In the past two decades increasing advances in DNA sequencing technology have led to the creation of a wide variety of options for DNA library preparation, sequencing and assembly. Each option comes with its own advantages and disadvantages in terms of complexity, expense, computing power, time, and experience required. This workflow was designed in an attempt to democratize the process of microbial sequencing and de novo assembly, in order to make them accessible to low funded labs or even classrooms on a massive scale. GenBank Submission Making a Phylogenetic Tree Annotation Library Preparation and Sequencing Assembly Verification of the Assembly Identify/Choose Microbe Sanger Sequence Processing Bench Work Swab Plate Dilution Streak (X2) Overnight Culture DNA Extraction 16s PCR Swab Plate Dilution Streak (X2) Overnight Culture DNA Extraction 16s PCR Create a BioProject at NCBI FASTA2AGP (custom script) Create a.sbt Template Tbl2asn Create a Whole Genome Shotgun Submission Submitting Raw Reads Create a BioProject at NCBI FASTA2AGP (custom script) Create a.sbt Template Tbl2asn Create a Whole Genome Shotgun Submission Submitting Raw Reads SeqTrace Geneious Custom Script SeqTrace Geneious Custom Script Options RAST Options RAST Obtain the Full-Length 16S Sequence Create an RDP Alignment Building the Tree Viewing the Tree Obtain the Full-Length 16S Sequence Create an RDP Alignment Building the Tree Viewing the Tree Library Preparation Kit Options Considerations in Library Preparation Multiplexing Library Preparation Kit Options Considerations in Library Preparation Multiplexing Download and Install A5 Running A5 Download and Install A5 Running A5 Interpretation of A5 stats Verification of 16S Sequence Phylosift Interpretation of A5 stats Verification of 16S Sequence Phylosift BLAST 16S Interpret the Results GOLD Align the 16S Sequences using Align Sequences Nucleotide BLAST BLAST 16S Interpret the Results GOLD Align the 16S Sequences using Align Sequences Nucleotide BLAST Introduction The objective of the present study was to design, test, troubleshoot, and publish a comprehensive workflow for taking a researcher from a swab to a microbial genome publication; enabling even a lab with limited resources and bioinformatics experience to perform it. Introduction The objective of the present study was to design, test, troubleshoot, and publish a comprehensive workflow for taking a researcher from a swab to a microbial genome publication; enabling even a lab with limited resources and bioinformatics experience to perform it. What Do You Think? Fill out a post-it note to let us know What Do You Think? Fill out a post-it note to let us know Results I.Bench Work We assume a starting point of wanting to isolate an organism from a particular environment and needing to identify it. Users starting with a known organism should proceed to "Library Preparation and Sequencing”. Here we cover the steps necessary to take a sample through plating, dilution streaking, overnight growth, creating a glycerol stock, 16s PCR and preparation for Sanger sequencing. II.Sanger Sequence Processing In this section we identify multiple software programs that allow the researcher to view and edit the genetic sequence. We detail the advantages and disadvantages of particular programs and explain how to quality trim the reads, reverse complement and align the reads and generate a consensus sequence. This is easiest to do visually via a chromatogram allowing the user to see the trace and process the sequences manually. III.Identify/Choose Microbe In a classroom or undergraduate research setting the researchers may not have a particular bacterial species in mind. In this case it is necessary to screen the 16S Sanger sequencing results for possible genome project candidates. We recommend starting with the BLAST results, then continuing onto the Genomes Online Database (GOLD), and simply Google searching. In many cases it will be handy to first build a phylogenetic tree to aid in identification since the 16S results may not be IV.Library Preparation and Sequencing The choice of sequencing technology and of library preparation method for genome sequencing is ever-changing and much- debated. Here we recommend using Illumina MiSeq for reasons of cost, depth of coverage, and length of reads. Furthermore, the assembly pipeline, A5-miseq, requires Illumina data and is optimized for the longer reads from the MiSeq. V.Assembly Genome assembly typically consists of data cleaning (quality filtering and adaptor removal), error correction, contig assembly, scaffolding, and verification of scaffolds/contigs. There are a large array of programs that can perform some, or most of these steps. These programs include commercial and open-source options. For this workflow we recommend using the open source A5 assembly pipeline which automates all of the steps described above with a single command. Results I.Bench Work We assume a starting point of wanting to isolate an organism from a particular environment and needing to identify it. Users starting with a known organism should proceed to "Library Preparation and Sequencing”. Here we cover the steps necessary to take a sample through plating, dilution streaking, overnight growth, creating a glycerol stock, 16s PCR and preparation for Sanger sequencing. II.Sanger Sequence Processing In this section we identify multiple software programs that allow the researcher to view and edit the genetic sequence. We detail the advantages and disadvantages of particular programs and explain how to quality trim the reads, reverse complement and align the reads and generate a consensus sequence. This is easiest to do visually via a chromatogram allowing the user to see the trace and process the sequences manually. III.Identify/Choose Microbe In a classroom or undergraduate research setting the researchers may not have a particular bacterial species in mind. In this case it is necessary to screen the 16S Sanger sequencing results for possible genome project candidates. We recommend starting with the BLAST results, then continuing onto the Genomes Online Database (GOLD), and simply Google searching. In many cases it will be handy to first build a phylogenetic tree to aid in identification since the 16S results may not be IV.Library Preparation and Sequencing The choice of sequencing technology and of library preparation method for genome sequencing is ever-changing and much- debated. Here we recommend using Illumina MiSeq for reasons of cost, depth of coverage, and length of reads. Furthermore, the assembly pipeline, A5-miseq, requires Illumina data and is optimized for the longer reads from the MiSeq. V.Assembly Genome assembly typically consists of data cleaning (quality filtering and adaptor removal), error correction, contig assembly, scaffolding, and verification of scaffolds/contigs. There are a large array of programs that can perform some, or most of these steps. These programs include commercial and open-source options. For this workflow we recommend using the open source A5 assembly pipeline which automates all of the steps described above with a single command. VI.Verification of the Assembly There are three portions to the verification of a genome assembly. The first is the overall "quality" of the assembly, assessed by examining the stats provided by A5 (e.g. number of contigs and contig N50). The second is verification that the organism sequenced is the organism of interest, simply by checking the 16S sequence with BLAST. The third is the "completeness” of the genome, which is difficult to measure except in cases where a close reference (representative example of the species’ genome) is available. VII.Annotation There are a number of different pipelines available for the annotation of bacterial genomes. These include Prokka, IMG, RAST, PGAP and others. Genomic annotation involves identifying protein coding regions and attempting to predict the genes being coded for and their biological function. VIII.Making a Phylogenetic Tree There are two points during the workflow where making a 16S phylogenetic tree may be useful. The first is after identification of candidate organisms by Sanger sequencing and the second is after assembly of the genome. The process is identical in both cases, but the additional length and improved quality of the post-assembly 16S sequence may generate a better tree. The tree can be used for identification of the candidate (e.g. is the candidate found in a single species clade), for naming of the candidate (does it fall in a clade containing only members of that species, and other members of the species are not found outside that clade), and for placement of the organism into a phylogenetic context. The outline of this step, is to use the Ribosomal Database Project (RDP) to generate an alignment of the sequence with close relative and an out-group, following by cleanup of the RDP headers, tree- building with FastTree and viewing/analysis of the tree in Dendroscope. IX.GenBank Submission This section describes how to submit contigs and scaffolds (if applicable) as a Whole Genome Shotgun (WGS) submission to Embank. We also recommend allowing Embank to annotate the genome themselves, since submitting RAST annotations to GenBank can be prohibitively complicated. The genomes are automatically shared with the DNA Data Bank of Japan (DDBJ) and EBML. In addition, genomes from GenBank are automatically pulled into IMG, and are annotated there as well. VI.Verification of the Assembly There are three portions to the verification of a genome assembly. The first is the overall "quality" of the assembly, assessed by examining the stats provided by A5 (e.g. number of contigs and contig N50). The second is verification that the organism sequenced is the organism of interest, simply by checking the 16S sequence with BLAST. The third is the "completeness” of the genome, which is difficult to measure except in cases where a close reference (representative example of the species’ genome) is available. VII.Annotation There are a number of different pipelines available for the annotation of bacterial genomes. These include Prokka, IMG, RAST, PGAP and others. Genomic annotation involves identifying protein coding regions and attempting to predict the genes being coded for and their biological function. VIII.Making a Phylogenetic Tree There are two points during the workflow where making a 16S phylogenetic tree may be useful. The first is after identification of candidate organisms by Sanger sequencing and the second is after assembly of the genome. The process is identical in both cases, but the additional length and improved quality of the post-assembly 16S sequence may generate a better tree. The tree can be used for identification of the candidate (e.g. is the candidate found in a single species clade), for naming of the candidate (does it fall in a clade containing only members of that species, and other members of the species are not found outside that clade), and for placement of the organism into a phylogenetic context. The outline of this step, is to use the Ribosomal Database Project (RDP) to generate an alignment of the sequence with close relative and an out-group, following by cleanup of the RDP headers, tree- building with FastTree and viewing/analysis of the tree in Dendroscope. IX.GenBank Submission This section describes how to submit contigs and scaffolds (if applicable) as a Whole Genome Shotgun (WGS) submission to Embank. We also recommend allowing Embank to annotate the genome themselves, since submitting RAST annotations to GenBank can be prohibitively complicated. The genomes are automatically shared with the DNA Data Bank of Japan (DDBJ) and EBML. In addition, genomes from GenBank are automatically pulled into IMG, and are annotated there as well.