Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Shotgun Sequence Analysis

Similar presentations


Presentation on theme: "Overview of Shotgun Sequence Analysis"— Presentation transcript:

1 Overview of Shotgun Sequence Analysis
Ami S. Bhatt, MD PhD | Stanford University | H3A Microbiome Workshop University of Witwatersrand | March 29 – 31, 2017 Image courtesy of Fiona Tamburini

2 Outline Garbage in, garbage out (Quality filtering, etc)
What is a k-mer Sequence  Taxonomy k-mer based marker gene based Sequence  longer sequences/contigs (Assembly) Gene/ORF prediction from short and long sequences Gene annotation 11/9/2019

3 AATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCTGAATCCCGAGCTTATGCCAAATCTGTCCCGATCATTGACTCCGAATCCCGAGCTTATGCCACC DARK MATTER

4 Pick your sequencing technique
16S sequencing – Gross taxonomic classification

5 Pick your sequencing technique
16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification

6 Pick your sequencing technique
16S sequencing – Gross taxonomic classification Metagenomic sequencing and marker gene analysis – Higher resolution taxonomic classification Metagenomic sequencing and full WGS analysis – Species/strain level classification, non-bacterial data, pathways

7 What is a k-mer? ATTTGCCGGTCTTTCTTTCCTGTCCGCAGTATATGTCTCCGGATTTTATGGTGT ATTTGCC, TTTGCCG, CTTTCCT are all k-mers located within the above sequence, where k = 7 (the number of bases) We use k-mers for CLASSIFICATION (taxonomic, functional) and ASSEMBLY 11/9/2019

8 How Kraken (k-mer based classification) works
LCA = lowest common ancestor RTL = root to leaf

9 Why not just use BLASTn? Large tradeoff between SPEED and ACCURACY
Alignment with BLAST is slow But the memory footprint for the reference database is pretty small Kraken is FAST and fairly ACCURATE But the memory footprint for the reference database is LARGE

10 11/9/2019

11 Marker gene based taxonomic classification*
MetaPhlAn Marker gene based taxonomic classification* *essentially 16S sequencing on steroids Segata et al, Molecular Systems Biology (2013) 9, 666

12 De novo assembly

13 SPAdes & most other modern assemblers are de Bruijn Graph assembler
Chaisson and Eichler, Nature Rev Genetics 2015

14 Assembly – theory and practice
Node = landmass Edge = bridge Bridges of Königsberg problem Can every part of the city be visited by walking across each of the seven bridges exactly once such that one returns to the starting location at the end of the stroll? Compeau, Pevzner and Tesler; Nature Biotechnology 29, (2011)

15 Assembly – theory and practice
de Bruijn graph Make a graph where every (k-1)-mer is assigned to a vertex; connect each (k-1)-mer to the next (k-1)-mer by an edge; Edges of the graph represent all possible k-mers NP complete (not solvable quickly; No way to determine algorithmically if a problem is NP complete) Solvable Graph theory applied to genome assembly Compeau, Pevzner and Tesler; Nature Biotechnology 29, (2011)

16 Why bother assembling metagenomic data?
Sequence length accuracy of taxonomic classification Easier to identify full open reading frames for functional predictions Identify operon structure (related genes located next to one another) More accurate identification of genomic variations (structural and single nucleotide polymorphisms) 11/9/2019

17 What are they doing? PATHWAY ANALYSIS: translate reads, align to annotated references, quantify pathway abundance

18 Functional Classification*
HUMANn2 Functional Classification* *mapping genes of identifiable function onto annotated pathway maps Segata et al, Molecular Systems Biology (2013) 9, 666

19 11/9/2019

20 Thank you! bhattlab.com |


Download ppt "Overview of Shotgun Sequence Analysis"

Similar presentations


Ads by Google