Download presentation
Presentation is loading. Please wait.
Published byJocelin Strickland Modified over 6 years ago
1
Strain profiling with StrainPhlAn and PanPhlAn
Nicola Segata Strain profiling with StrainPhlAn and PanPhlAn Curtis Huttenhower Galeb Abu-Ali Ali Rahnavard STAMPS 2017 Harvard T.H. Chan School of Public Health Department of Biostatistics
2
Efficient assembly-free meta’omics by leveraging isolates
II III IV V I II III IV V II III II I IV I I II III II V IV V V Species pan-genomes 7,677 containing 18.6 million gene clusters Core genes Marker genes NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 Open reading frames 49.0 million total genes RepoPhlAn ChocoPhlAn
3
StrainPhlAn: metagenomic strain identification and tracking
4
A tool for strain level population genomics
China Denmark Estonia Finland Peru’ Hungary Italy Norway France Spain Sweden USA Germany P. copri as an example species Alignment length: 66k nt Median SNPs: 830 [3.6%] # pos. samples: 123
5
A tool for strain level population genomics
Alignment length: 62k nt Median SNPs: 830 [1.3%] # pos. samples: 123
6
Most bugs (in the gut) are dominated by one stable strain
7
Most bugs (in the gut) are dominated by one stable strain
8
There’s a lot of strain-level variation left to discover
Median divergence from reference markers
9
PanPhlAn: the approach
mapping Read Metagenomic sample Gene coverage Microbial pangenomes Cluster to Gene families Pan-gene family coverage Abundance-sorted pan-gene families Coverage Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes
10
PanPhlAn for “meta-epidemiology”
Metagenomes from [Loman et al., 2013]
11
Strain-level epidemiology of human-associated E. coli with PanPhlAn
STEC Scholz et al., Nature Methods, 2016 T2D (China) German outbreak Reference genomes Liver Cirr. (China) Infants (Italy) CRC (Europe) HMP (USA) Obesity (Europe) Neilsen (Europe) T2D (Finland) Rampelli (Africa) Liu (Mongolia) Tito (Peru) Segre (Skin) B1 B2 ~5,000 metagenomes (and counting) All continents Many EU countries A D
12
Multiple options for strain tracking in metagenomes
StrainPhlAn: Map reads to core markers and call SNPs. Requires ~10x coverage, ~0.1% error rate. PanPhlAn: Map reads to pan-genomes and identify absent genes. Requires ~1x coverage, ~1% error rate. Both work uniquely well for meta-analysis. Not sensitive to typical batch effects.
13
https://bitbucket.org/biobakery/biobakery/wiki/strainphlan
StrainPhlAn tutorial
15
There’s a lot of strain-level variation left to discover
Phylogenetic branch % spanned by reference vs. “wild” bugs
16
Gene-family distribution curves
Select samples with “step” distribution (colored curves) strain of species present Base coverage Reject non-step (gray) curves E. coli gene-families
17
Synthetic and semi-synthetic validation
Coverage Coverage Coverage Coverage Coverage
18
PanPhlAn on Eubacterium rectale
Only one Eubacterium rectale genome used here
19
PanPhlAn on Eubacterium rectale
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.