Download presentation
Presentation is loading. Please wait.
1
Overview of Shotgun Sequence Analysis
Ami S. Bhatt, MD PhD | Stanford University | H3A Microbiome Workshop University of Witwatersrand | March 29 – 31, 2017 Image courtesy of Fiona Tamburini
2
Outline Garbage in, garbage out (Quality filtering, etc)
What is a k-mer Sequence Taxonomy k-mer based marker gene based Sequence longer sequences/contigs (Assembly) Gene/ORF prediction from short and long sequences Gene annotation 11/9/2019
3
AATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACC DARK MATTER
4
Pick your sequencing technique
16S sequencing – Gross taxonomic classification
5
Pick your sequencing technique
16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification
6
Pick your sequencing technique
16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification Metagenomic sequencing and full WGS analysis – Species/strain level classification, non-bacterial data, pathways
7
What is a k-mer? ATTTGCCGGTCTTTCTTTCCTGTCCGCAGTATATGTCTCCGGATTTTATGGTGT ATTTGCC, TTTGCCG, CTTTCCT are all k-mers located within the above sequence, where k = 7 (the number of bases) We use k-mers for CLASSIFICATION (taxonomic, functional) and ASSEMBLY 11/9/2019
8
How Kraken (k-mer based classification) works
LCA = lowest common ancestor RTL = root to leaf
9
Why not just use BLASTn? Large tradeoff between SPEED and ACCURACY
Alignment with BLAST is slow But the memory footprint for the reference database is pretty small Kraken is FAST and fairly ACCURATE But the memory footprint for the reference database is LARGE
10
11/9/2019
11
Marker gene based taxonomic classification*
MetaPhlAn Marker gene based taxonomic classification* *essentially 16S sequencing on steroids Segata et al, Molecular Systems Biology (2013) 9, 666
12
De novo assembly
13
SPAdes & most other modern assemblers are de Bruijn Graph assembler
Chaisson and Eichler, Nature Rev Genetics 2015
14
Assembly – theory and practice
Node = landmass Edge = bridge Bridges of Königsberg problem Can every part of the city be visited by walking across each of the seven bridges exactly once such that one returns to the starting location at the end of the stroll? Compeau, Pevzner and Tesler; Nature Biotechnology 29, (2011)
15
Assembly – theory and practice
de Bruijn graph Make a graph where every (k-1)-mer is assigned to a vertex; connect each (k-1)-mer to the next (k-1)-mer by an edge; Edges of the graph represent all possible k-mers NP complete (not solvable quickly; No way to determine algorithmically if a problem is NP complete) Solvable Graph theory applied to genome assembly Compeau, Pevzner and Tesler; Nature Biotechnology 29, (2011)
16
Why bother assembling metagenomic data?
Sequence length accuracy of taxonomic classification Easier to identify full open reading frames for functional predictions Identify operon structure (related genes located next to one another) More accurate identification of genomic variations (structural and single nucleotide polymorphisms) 11/9/2019
17
What are they doing? PATHWAY ANALYSIS: translate reads, align to annotated references, quantify pathway abundance
18
Functional Classification*
HUMANn2 Functional Classification* *mapping genes of identifiable function onto annotated pathway maps Segata et al, Molecular Systems Biology (2013) 9, 666
19
11/9/2019
20
Thank you! bhattlab.com |
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.