Download presentation
Presentation is loading. Please wait.
Published byHelmuth Bäcker Modified over 6 years ago
1
Identifying personal microbiomes using metagenomic codes
Eric Franzosa Huttenhower Lab / Harvard T. H. Chan School of Public Health STAMPS / 12 August 2016
2
Microbiome identifiability background
2 + Are our microbial differences both large and consistent enough to stably and uniquely identify us? X +
3
…which results in frequent spurious (false positive) hits
Identification based on ecological distance 3 Does it scale? HMP 2012 My microbiome at t1, t2, t3 tends to be more similar to my t0 (reference) microbiome than that of a random individual... …which results in frequent spurious (false positive) hits [Person1] [Person2] [Person3] But as we consider we consider more and more individuals, it’s not hard to find someone who is closer to me than me. t3 t0 t2 t1 t0 t2 t1 t0 t3 PNAS 2010
4
An alternative approach: Metagenomic codes
Construct sets of microbial features that uniquely identify each person, even if individual features are not unique (e.g. “A”). We call these sets of features metagenomic codes.
5
The “hitting set” problem
Tiffany Imagine that the dots are features of my microbiome Koji Ovals enclose sets of features that other people are missing Levi Emma The red features are a unique code for me (everyone else is missing at least one of them) Tim A specific feature My microbiome Figure adapted from Selman B, Nature, 2008, 451:639
6
Definition of a metagenomic code
6 Find a set of features (code) that collectively distinguishes the ★’ed person (each other person missing at least one)
7
Finding codes restated as a hitting set problem
7 ! ! ! ! ! ! Find a subset of ★’ed person’s microbiome that intersects with the missing features (!) in each other person
8
D Codes need to be unique & robust (stable over time)
False Positive False Negative True Positive Unlike the classical hitting set problem, we want to prioritize robust hitting sets rather than minimal hitting sets.
9
Determinants of metagenomic feature stability
9 Taxa (16S OTUs) Species (Shotgun) Marker Genes (Shotgun) Abundance: fraction of DNA contributed by feature to sample Prevalence: fraction of people with the feature A minimal metagenomic hitting set would be enriched for low prevalence features and hence unstable Very stable over time Somewhat stable Coin toss Somewhat unstable Very unstable All raw data from:
10
High prevalence is associated with feature acquisition
10 Taxa (16S OTUs) Species (Shotgun) Marker Genes (Shotgun) Prevalence: fraction of people with the feature Likely to be acquired Unlikely to be acquired Not acquired
11
Add features to i’s code in rank order until:
A greedy approximation to minimum hitting set prioritizing differential abundance and robustness For each subject i Rank each feature j by “abundance gap” (i’s abundance minus next highest abundance) Add features to i’s code in rank order until: Code is unique in population AND No features are left OR A target code size is reached abundance abundance gap
12
Evaluation of metagenomic code uniqueness
12 Taxa (16S OTUs) Species (Shotgun) Marker Genes (Shotgun) Coding results Individuals Code size Can’t make unique code Can make unique code
13
Evaluation of metagenomic code stability
13 Taxa (16S OTUs) Species (Shotgun) Marker Genes (Shotgun) Coding results Individuals Code size Can’t make unique code False Negative False Negative + False Positive True Positive + False Positive True Positive
14
Effects of time interval (3-12 mos) appear to be minimal
14 Taxa (16S OTUs) Species (Shotgun) Marker Genes (Shotgun) Coding results Individuals Time Interval False Negative True Positive
15
Example of a stable gene code (from stool)
15 coding genes
16
Gene-based codes capture strain variation
in individuals’ most abundant (stable) bugs 16
17
Per-species marker gene contributions vary by body site
17 For the nares (skin) and fornix (vaginal) sites, a few species contribute many identifying genes At the oral and gut sites, identifying genes derive from a more diverse list of species
18
Independent evaluation of fingerprint uniqueness
18 M independent samples from single-visit HMP individuals X N fingerprints for multi-visit HMP individuals H is the number of hits (Xs) p = hit chance = H / (MN) if H > 0 else 1 / (MN+1) 1/p = sub-population size below which fingerprints tend to be unique
19
Independent evaluation of fingerprint uniqueness
19 Feature type Body site finger-prints unseen subjects false positives FPR Unique among N OTUs Anterior nares 63 84 13 0.0025 407 Buccal mucosa 45 82 7 0.0019 527 Posterior fornix 8 50 3 0.0075 133 Stool 76 101 10 0.0013 768 Supragingival plaque 68 398 Tongue dorsum 23 6 0.0032 314 Species 11 36 9 0.0227 44 32 0.0035 289 16 0.0204 49 31 43 5 0.0038 267 35 1 315 37 0.0263 38 Marker Genes 22 0.0091 110 0.0089 112 12 0.0156 64 42 262 39 0.0022 455 2 0.0012 814 OTU-based fingerprints are harder to hit at random, but less stable Gene-based codes are unique among 100s of people
20
Microbiome identification in perspective
20 Metagenomic Identification Human genomic identification Possible by encoding microbial strain variation Possible by profiling human “strain variation” Signatures unique among (at least) 100s of people Signatures unique among BILLIONS of people Stable over 6 months in up to 80% of people Stable over a lifetime in essentially everyone Applies to environments that don’t have a genome!
21
Microbiome identification in perspective
21 ...speaks to the personalization of the human microbiome, including its influence on health & disease HMP 2012
22
Resources Raw Human Microbiome Data (16S + Shotgun):
22 Raw Human Microbiome Data (16S + Shotgun): Curated HMP Data (This Work): Tools for meta’omics analysis:
23
Acknowledgements 23 The Huttenhower Lab huttenhower.sph.harvard.edu
Collaborators Dirk Gevers Kat Huang Brendan Bohannan James Meadow Katherine Lemon Alumni Support
24
EXTRAS
25
Microbiome identifiability background
25 Fierer 2010 [Person1] [Person2] [Person3] HMP 2012 Are our microbial differences both large and consistent enough to stably and uniquely identify us? Scoop Sequence Profile Metagenomics 101
26
This results in a lot of spurious (false positive) matches
Identification based on ecological distance doesn’t scale 26 [Person1] [Person2] [Person3] My microbiome at t1, t2, t3 tends to be more similar to my t0 (reference) microbiome than that of a random individual... This results in a lot of spurious (false positive) matches But as we consider we consider more and more individuals, it’s not hard to find someone who is closer to me than me. t3 t0 t2 t1 t0 t2 t1 t0 t3
27
Metagenomic features from HMP phase I and II
Basis Units Confident detection Relaxed absence Body sites Paired samples/site Mothur OTUs 16S Relative Abundance >10-3 >10-4 <10-5 18 30-100 MetaPhlAn species WMS 6 20-50 markers RPKM >5 >0.5 <0.05 KB window genome tiling Must focus on individuals sampled at 2+ time points to evaluate feature and fingerprint stability
28
Constructing minimal hitting sets
28
29
Strain profiling: Example of a False Negative
29
30
Strain profiling: Example of a False Positive
30
31
Independent evaluation of code uniqueness
31 Model the chance of a false positive in comparisons with an increasing number of previously unseen subjects. Strain-level codes should be unique to among 100s of individuals.
32
Identification based on global similarity doesn’t scale
32
33
Technical replicability
33
34
Sensitivity to parameter settings
34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.