Presentation is loading. Please wait.

Presentation is loading. Please wait.

TIPP: Taxonomic Identification And Phylogenetic Profiling

Similar presentations


Presentation on theme: "TIPP: Taxonomic Identification And Phylogenetic Profiling"— Presentation transcript:

1 TIPP: Taxonomic Identification And Phylogenetic Profiling
Nam-phuong Nguyen Computer Science And Engineering University Of California, San Diego I would like to first thank Tandy for introducing me and IGB hosting this event. I would also like to thank everyone in attendence. Today I will talk about TIPP a method I developed for what I call microbial forensics and i will describe what i mean in this talk

2 Precision Medicine Personalized treatment based upon the patients’ phenotypes and genotypes Precision Medicine Initiative launched with $215M in 2016 Many different aspects including genomics, epigenetics, microbiome Precision medicine is a new paradigm for healthcare, the idea to create personal treatments for patients based upon their phenotype and genotype this is a hot topic, and in 2016 the precision medicine initiative launched with 215M in funding precision medicine takes many different characteristics of the patient into account including genomics episgenetics, and the microbiome Image courtesy of gurdanhealth.com

3 Precision Medicine Personalized treatment based upon the patients’ phenotypes and genotypes Precision Medicine Initiative launched with $215M in 2016 Many different aspects including genomics, epigenetics, microbiome Precision medicine is a new paradigm for healthcare, the idea to create personal treatments for patients based upon their phenotype and genotype this is a hot topic, and in 2016 the precision medicine initiative launched with 215M in funding precision medicine takes many different characteristics of the patient into account including genomics episgenetics, and the microbiome Image courtesy of gurdanhealth.com

4 Human Microbiome 10 times more bacteria cells than human cells
Important role in regulating health Disruption associated with risk factors for diseases Analysis through metagenomics We are more bacteria cells then human cells so it's no surprise that bacteria plays a very important role in regulating our health Bacteria help us extract energy from our food and helps us maintain a healthy vaginal environment Dysbiosis or disruption of the microbiome is often associated with risk factors for diseases including bacteria vaginosis and dihearria Some of the key questions in understanding the microbiome is who is there and how much, and we call this an abundance profile, we answer these questions with metagenomics Image courtesy of humanlongevity.com

5 Metagenomics Analyzing DNA sequences from environmental sample
Typical datasets contain millions of reads I’m going to discuss this idea of microbial forensics under the framework of metagenomic

6 Fundamental Questions
What is the identity of a read? What is the microbial profile of a sample? What genes/functions are present? I’m going to discuss this idea of microbial forensics under the framework of metagenomic

7 Fundamental Questions
What is the identity of a read? What is the microbial profile of a sample? What genes/functions are present? I’m going to discuss this idea of microbial forensics under the framework of metagenomic

8 Metagenomic Taxon Identification
Objective: classify short reads in a metagenomic sample

9 Abundance Profiling Objective: distribution of the species (or genera, or families, etc.) within the sample For example, the distribution of a sample at the species level might be: Species A: 10% Species B: 25% Species C: 55% Species D: 1% Species E: 9% and the second related problem is known as abundance profiling. the connection between abundance profiling and identification is that we will be using identification to solve profiling

10 Genome-based profiling
Population of 2 bacteria, A and B. B has twice as large genome as A. A A True profile: 67% A, 33% B Profile estimated from reads: 50% A, 50%B B Ecoli genome variation can be as large as 20% we can try to take into account the genome size by estimating abundances based upon coverage, however, genomes of bacteria can vary in size, if the reads come from unsequenced organisms, this can be difficult

11 Single copy marker-based profiling
Population of 2 bacteria, A and B. B has twice as large genome as A. A A Each have a single copy of gene C True profile: 67% A, 33% B Profile estimated from reads: 67% A, 33%B B Our focus is on using phylogeny-based methods. Phylogeny based methods tries to infer the relationship of the query sequence to the known reference sequences using a phylogeny. This allows us to infer information about sequences from novel sequences

12 TIPP: Taxonomic Identification And Phylogenetic Profiling
Fragmentary unknown reads for a gene Known full length sequences for a gene, and an alignment and a tree ACCG CGAG CGG GGCT ACCT ensemble of HMMs+statistics AGG...GCAT (species1) TAGC...CCA (species2) TAGA...CTT (species3) AGC...ACA (species4) ACT..TAGAA (species5)

13 TIPP: Taxonomic Identification And Phylogenetic Profiling
Nguyen et al., Bioinformatics, 2014 Reads Assign to marker genes Marker genes Classify reads Compute profile

14 Abundance Profiling Objective: distribution of the species (or genera, or families, etc.) within the sample. Leading techniques: PhymmBL (Brady & Salzberg, Nature Methods 2009) NBC (Rosen, Reichenberger, and Rosenfeld, Bioinformatics 2011) MetaPhyler (Liu et al., BMC Genomics 2011), from the Pop Lab at the University of Maryland MetaPHlAn (Segata et al., Nature Methods 2012), from the Huttenhower Lab at Harvard mOTU (Bork et al., Nature Methods 2013) MetaPhyler, MetaPHlAn, and mOTU are marker-based techniques (but use different marker genes). Make a diagram to emphasis differences between genome-based and marker-based

15 “Hard” genome datasets (known genomes and high indel error)
On the hard datasets, where the reads come from known genomes (i.e., all methods have seen the genomes that the reads come from), but have high rates of sequencing errors, there was a large separation between the methods. What I'm showing is distance to the true profile on the y-axis so lower is better, and on the x-axis is the error at the different taxonomic levels. This column is for long reads, and this one for short reads. Note: NBC, MetaPhlAn, and MetaPhyler cannot classify any sequences from at least of the high indel long sequence datasets. mOTU terminates with an error message on all the high indel datasets.

16 “Novel” genome datasets
Red line Note: mOTU terminates with an error message on the long fragment datasets and high indel datasets.

17 TIPP Compared To Other Profiling Methods
TIPP is highly accurate, even in the presence of novel genomes and high sequencing error All other methods are less robust Accurate profiles can be estimated using only a portion of the reads

18 Do Individual Primates From The Same Species Have Personal Microbiomes?
To answer this question, we need longitudinal data from many individuals, so we went ahead and did that

19 Humans have personalized microbiome
Recent research has shown that individual humans have a personalized microbiome. In 2010, Fierer showed that you could identify who used which keyboard by comparing the residual contatct microbiome on a keyboard and the skin microbiome of the user. Fierer et al., PNAS 2010 showed that you can identify who had previously used a keyboard via the residual contact microbiome (three individuals in study)

20 Experimental Design Data collected by Patton’s Lab at U of Washington
Dataset (unpublished; in preparation) Data collected by Patton’s Lab at U of Washington Longitudinal study of the vaginal, rectal, and fecal microbiome in 39 female captive Pigtailed Macacas Weekly matched paired samples taken over a period of a month from each individual 16S rRNA amplicon sequencing TIPP (Nguyen et al. 2014) used to generate profiles Questions How to the microbiomes differ by body site and individual Can we identify an individual based upon the microbiome? Add picture of macacas

21 Experimental Design Week 1 Week 2 Week 3 Which individual?

22 Identification Results
vaginal', fecal', rectal',0.769 fecal+rectal', fecal+vaginal', rectal+vaginal', fecal+rectal+vaginal',0.917 2 matched paired samples very different from original donor donor

23 Future Directions Expanding the marker set, both in the number of species and genes Statistical approach to combining profiles from different marker genes Developing TIPP for virobiome Jigsaw analogy

24 Acknowledgements Illinois Tandy Warnow Rebecca Stumpf Bryan White
Mike Nute Brenda Wilson UCSD Siavash Mirarab UMD Mihai Pop Bo Liu U of Copenhagen Alonzo Alfaro-Núñez Tom Hansen Anders Hansen Funding NSF NSF NSF University of Alberta Double it

25 Questions? TIPP tutorial tomorrow at 10:00-11:00 in MR7
Instructions for downloading at Tutorial at tutorial.md I am a comp scientist that works on developing algorithms for biology


Download ppt "TIPP: Taxonomic Identification And Phylogenetic Profiling"

Similar presentations


Ads by Google