Overview of Shotgun Sequence Analysis

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Section 14.1 Intro to Graph Theory. Beginnings of Graph Theory Euler’s Konigsberg Bridge Problem (18 th c.)  Can one walk through town and cross all.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Next Generation Sequencing, Assembly, and Alignment Methods
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Metagenomics Binning and Machine Learning
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.
Metagenomics Assembly Hubert DENISE
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Abstract Our current understanding of the taxonomic and phylogenetic diversity of cellular organisms, especially the bacteria and archaea, is mostly based.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
The iPlant Collaborative
No reference available
What is BLAST? Basic BLAST search What is BLAST?
MERmaid: Distributed de novo Assembler Richard Xia, Albert Kim, Jarrod Chapman, Dan Rokhsar.
Canadian Bioinformatics Workshops
tracking microbes at the strain level
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
What is BLAST? Basic BLAST search What is BLAST?
Virginia Commonwealth University
CSCI2950-C Genomes, Networks, and Cancer
Canadian Bioinformatics Workshops
Bioinformatics Overview
Metagenomic Species Diversity.
Assembly algorithms for next-generation sequencing data
Preprocessing Data Rob Schmieder.
Quality Control & Preprocessing of Metagenomic Data
BackTracking CS255.
Graph theory. Graph theory Leonard Euler (“Oiler”)
Considerations for metagenomics data analysis and summary of workflows
CAP5510 – Bioinformatics Sequence Assembly
Metafast High-throughput tool for metagenome comparison
Denovo genome assembly of Moniliophthora roreri
Optimizing Biological Data Integration
BIOL 433 Plant Genetics Term 2,
Professors: Dr. Gribskov and Dr. Weil
Basics of BLAST Basic BLAST Search - What is BLAST?
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Metagenomic assembly Cedric Notredame
Research in Computational Molecular Biology , Vol (2008)
Kallisto: near-optimal RNA seq quantification tool
Taxonomic profiling with MetaPhlAn2
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
Introduction to Genome Assembly
Metagenomics Image: Iverson et al. 2012, Science.
Taxonomic profiling with MetaPhlAn2
CS 598AGB Genome Assembly Tandy Warnow.
Bioinformatics Solutions Inc.
Genome Assembly.
Genome organization and Bioinformatics
Genome Sequencing and Assembly
Lecture 9 Genome Mapping By Ms. Shumaila Azam
The ability of the SOP to sequence and identify unknown samples.
BIOL 433 Plant Genetics Term 2,
Taxonomic identification and phylogenetic profiling
Euler and Hamilton Paths
Example usage of mockrobiota MC resource for marker gene and metagenome sequencing pipelines. Example usage of mockrobiota MC resource for marker gene.
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Genome resolved metagenomics
Nidhi Shah University of Maryland
Toward Accurate and Quantitative Comparative Metagenomics
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

Overview of Shotgun Sequence Analysis Ami S. Bhatt, MD PhD | Stanford University | H3A Microbiome Workshop University of Witwatersrand | March 29 – 31, 2017 Image courtesy of Fiona Tamburini

Outline Garbage in, garbage out (Quality filtering, etc) What is a k-mer Sequence  Taxonomy k-mer based marker gene based Sequence  longer sequences/contigs (Assembly) Gene/ORF prediction from short and long sequences Gene annotation 11/9/2019

AATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACC DARK MATTER

Pick your sequencing technique 16S sequencing – Gross taxonomic classification

Pick your sequencing technique 16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification

Pick your sequencing technique 16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification Metagenomic sequencing and full WGS analysis – Species/strain level classification, non-bacterial data, pathways

What is a k-mer? ATTTGCCGGTCTTTCTTTCCTGTCCGCAGTATATGTCTCCGGATTTTATGGTGT ATTTGCC, TTTGCCG, CTTTCCT are all k-mers located within the above sequence, where k = 7 (the number of bases) We use k-mers for CLASSIFICATION (taxonomic, functional) and ASSEMBLY 11/9/2019

How Kraken (k-mer based classification) works LCA = lowest common ancestor RTL = root to leaf

Why not just use BLASTn? Large tradeoff between SPEED and ACCURACY Alignment with BLAST is slow But the memory footprint for the reference database is pretty small Kraken is FAST and fairly ACCURATE But the memory footprint for the reference database is LARGE

11/9/2019

Marker gene based taxonomic classification* MetaPhlAn Marker gene based taxonomic classification* *essentially 16S sequencing on steroids Segata et al, Molecular Systems Biology (2013) 9, 666

De novo assembly

SPAdes & most other modern assemblers are de Bruijn Graph assembler Chaisson and Eichler, Nature Rev Genetics 2015

Assembly – theory and practice Node = landmass Edge = bridge Bridges of Königsberg problem Can every part of the city be visited by walking across each of the seven bridges exactly once such that one returns to the starting location at the end of the stroll? Compeau, Pevzner and Tesler; Nature Biotechnology 29, 987-991 (2011)

Assembly – theory and practice de Bruijn graph Make a graph where every (k-1)-mer is assigned to a vertex; connect each (k-1)-mer to the next (k-1)-mer by an edge; Edges of the graph represent all possible k-mers NP complete (not solvable quickly; No way to determine algorithmically if a problem is NP complete) Solvable Graph theory applied to genome assembly Compeau, Pevzner and Tesler; Nature Biotechnology 29, 987-991 (2011)

Why bother assembling metagenomic data? Sequence length accuracy of taxonomic classification Easier to identify full open reading frames for functional predictions Identify operon structure (related genes located next to one another) More accurate identification of genomic variations (structural and single nucleotide polymorphisms) 11/9/2019

What are they doing? PATHWAY ANALYSIS: translate reads, align to annotated references, quantify pathway abundance

Functional Classification* HUMANn2 Functional Classification* *mapping genes of identifiable function onto annotated pathway maps Segata et al, Molecular Systems Biology (2013) 9, 666

11/9/2019

Thank you! bhattlab.com | asbhatt@stanford.edu