Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strain profiling with StrainPhlAn and PanPhlAn

Similar presentations


Presentation on theme: "Strain profiling with StrainPhlAn and PanPhlAn"— Presentation transcript:

1 Strain profiling with StrainPhlAn and PanPhlAn
Nicola Segata Strain profiling with StrainPhlAn and PanPhlAn Curtis Huttenhower Galeb Abu-Ali Ali Rahnavard STAMPS 2017 Harvard T.H. Chan School of Public Health Department of Biostatistics

2 Efficient assembly-free meta’omics by leveraging isolates
II III IV V I II III IV V II III II I IV I I II III II V IV V V Species pan-genomes 7,677 containing 18.6 million gene clusters Core genes Marker genes NCBI isolate genomes Archaea 300 Bacteria 12,926 Viruses 3,565 Eukaryota 112 Open reading frames 49.0 million total genes RepoPhlAn ChocoPhlAn

3 StrainPhlAn: metagenomic strain identification and tracking

4 A tool for strain level population genomics
China Denmark Estonia Finland Peru’ Hungary Italy Norway France Spain Sweden USA Germany P. copri as an example species Alignment length: 66k nt Median SNPs: 830 [3.6%] # pos. samples: 123

5 A tool for strain level population genomics
Alignment length: 62k nt Median SNPs: 830 [1.3%] # pos. samples: 123

6 Most bugs (in the gut) are dominated by one stable strain

7 Most bugs (in the gut) are dominated by one stable strain

8 There’s a lot of strain-level variation left to discover
Median divergence from reference markers

9 PanPhlAn: the approach
mapping Read Metagenomic sample Gene coverage Microbial pangenomes Cluster to Gene families Pan-gene family coverage Abundance-sorted pan-gene families Coverage Multi-copy genes Plateau of genes from one metagenome’s strain Absent genes

10 PanPhlAn for “meta-epidemiology”
Metagenomes from [Loman et al., 2013]

11 Strain-level epidemiology of human-associated E. coli with PanPhlAn
STEC Scholz et al., Nature Methods, 2016 T2D (China) German outbreak Reference genomes Liver Cirr. (China) Infants (Italy) CRC (Europe) HMP (USA) Obesity (Europe) Neilsen (Europe) T2D (Finland) Rampelli (Africa) Liu (Mongolia) Tito (Peru) Segre (Skin) B1 B2 ~5,000 metagenomes (and counting) All continents Many EU countries A D

12 Multiple options for strain tracking in metagenomes
StrainPhlAn: Map reads to core markers and call SNPs. Requires ~10x coverage, ~0.1% error rate. PanPhlAn: Map reads to pan-genomes and identify absent genes. Requires ~1x coverage, ~1% error rate. Both work uniquely well for meta-analysis. Not sensitive to typical batch effects.

13 https://bitbucket.org/biobakery/biobakery/wiki/strainphlan
StrainPhlAn tutorial

14

15 There’s a lot of strain-level variation left to discover
Phylogenetic branch % spanned by reference vs. “wild” bugs

16 Gene-family distribution curves
Select samples with “step” distribution (colored curves) strain of species present Base coverage Reject non-step (gray) curves E. coli gene-families

17 Synthetic and semi-synthetic validation
Coverage Coverage Coverage Coverage Coverage

18 PanPhlAn on Eubacterium rectale
Only one Eubacterium rectale genome used here

19 PanPhlAn on Eubacterium rectale


Download ppt "Strain profiling with StrainPhlAn and PanPhlAn"

Similar presentations


Ads by Google