TIPP: Taxonomic Identification And Phylogenetic Profiling

Slides:



Advertisements
Similar presentations
Recombinant DNA technology
Advertisements

The NIH Human Microbiome Project
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The Microbiome and Metagenomics
explain how crime scene evidence is
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Todd J. Treangen, Steven L. Salzberg
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
Phylogenomics Symposium and Software School Co-Sponsored by the SSB and NSF grant
Ultra-large Multiple Sequence Alignment Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science.
Challenge and novel aproaches for multiple sequence alignment and phylogenetic estimation Tandy Warnow Department of Computer Science The University of.
15.2, slides with notes to write down
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas.
TIPP: Taxon Identification and Phylogenetic Profiling Tandy Warnow The Department of Computer Science The University of Texas at Austin.
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
Family of HMMs Nam Nguyen University of Texas at Austin.
Three approaches to large- scale phylogeny estimation: SATé, DACTAL, and SEPP Tandy Warnow Department of Computer Science The University of Texas at Austin.
TIPP: Taxon Identification using Phylogeny-Aware Profiles Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign.
Ultra-large alignments using Ensembles of HMMs Nam-phuong Nguyen Institute for Genomic Biology University of Illinois at Urbana-Champaign.
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas.
Canadian Bioinformatics Workshops
Produced by and for the Hot Science - Cool Talks Outreach Lecture Series of the Environmental Science Institute. We request that the use of any of these.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Ensembles of HMMs and their use in biomolecular sequence analysis Nam-phuong Nguyen Carl R. Woese Institute for Genomic Biology University of Illinois.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Metagenomic Species Diversity.
Bioethics Writing Assignment
New Approaches for Inferring the Tree of Life
Chalk Talk Tandy Warnow
Distance based phylogenetics
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Strain profiling with StrainPhlAn and PanPhlAn
15.2, slides with notes to write down
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Research in Computational Molecular Biology , Vol (2008)
Pipelines for Computational Analysis (Bioinformatics)
Human Cells Human genomics
Identifying personal microbiomes using metagenomic codes
Genomes and Their Evolution
Technical Aspects of Recombinant DNA and Gene Cloning
Taxonomic profiling with MetaPhlAn2
Tandy Warnow The University of Illinois
Genomic Data Manipulation
VISUALIZING COMPLEX BACTERIAL POPULATIONS IN ANIMAL MODELS
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
H = -Σpi log2 pi.
Large-Scale Multiple Sequence Alignment
TIPP and SEPP: Metagenomic Analysis using Phylogeny-Aware Profiles
TIPP: Taxon Identification using Phylogeny-Aware Profiles
Tandy Warnow Founder Professor of Engineering
Chapter 7 Multifactorial Traits
Microbiome studies for microbial disease pathogenesis research
Volume 10, Issue 4, Pages (October 2011)
CS 394C: Computational Biology Algorithms
Taxonomic identification and phylogenetic profiling
Algorithms for Inferring the Tree of Life
Unit Genomic sequencing
Restriction Fragment Length Polymorphism (RFLP)
explain how crime scene evidence is
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. A typical current computational meta'omic pipeline to.
Microbial composition of mother and infant samples and shared bacteria within mother-infant pairs. Microbial composition of mother and infant samples and.
TIPP and SEPP (plus PASTA)
Toward Accurate and Quantitative Comparative Metagenomics
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

TIPP: Taxonomic Identification And Phylogenetic Profiling Nam-phuong Nguyen Computer Science And Engineering University Of California, San Diego I would like to first thank Tandy for introducing me and IGB hosting this event. I would also like to thank everyone in attendence. Today I will talk about TIPP a method I developed for what I call microbial forensics and i will describe what i mean in this talk

Precision Medicine Personalized treatment based upon the patients’ phenotypes and genotypes Precision Medicine Initiative launched with $215M in 2016 Many different aspects including genomics, epigenetics, microbiome Precision medicine is a new paradigm for healthcare, the idea to create personal treatments for patients based upon their phenotype and genotype this is a hot topic, and in 2016 the precision medicine initiative launched with 215M in funding precision medicine takes many different characteristics of the patient into account including genomics episgenetics, and the microbiome Image courtesy of gurdanhealth.com

Precision Medicine Personalized treatment based upon the patients’ phenotypes and genotypes Precision Medicine Initiative launched with $215M in 2016 Many different aspects including genomics, epigenetics, microbiome Precision medicine is a new paradigm for healthcare, the idea to create personal treatments for patients based upon their phenotype and genotype this is a hot topic, and in 2016 the precision medicine initiative launched with 215M in funding precision medicine takes many different characteristics of the patient into account including genomics episgenetics, and the microbiome Image courtesy of gurdanhealth.com

Human Microbiome 10 times more bacteria cells than human cells Important role in regulating health Disruption associated with risk factors for diseases Analysis through metagenomics We are more bacteria cells then human cells so it's no surprise that bacteria plays a very important role in regulating our health Bacteria help us extract energy from our food and helps us maintain a healthy vaginal environment Dysbiosis or disruption of the microbiome is often associated with risk factors for diseases including bacteria vaginosis and dihearria Some of the key questions in understanding the microbiome is who is there and how much, and we call this an abundance profile, we answer these questions with metagenomics Image courtesy of humanlongevity.com

Metagenomics Analyzing DNA sequences from environmental sample Typical datasets contain millions of reads I’m going to discuss this idea of microbial forensics under the framework of metagenomic

Fundamental Questions What is the identity of a read? What is the microbial profile of a sample? What genes/functions are present? I’m going to discuss this idea of microbial forensics under the framework of metagenomic

Fundamental Questions What is the identity of a read? What is the microbial profile of a sample? What genes/functions are present? I’m going to discuss this idea of microbial forensics under the framework of metagenomic

Metagenomic Taxon Identification Objective: classify short reads in a metagenomic sample

Abundance Profiling Objective: distribution of the species (or genera, or families, etc.) within the sample For example, the distribution of a sample at the species level might be: Species A: 10% Species B: 25% Species C: 55% Species D: 1% Species E: 9% and the second related problem is known as abundance profiling. the connection between abundance profiling and identification is that we will be using identification to solve profiling

Genome-based profiling Population of 2 bacteria, A and B. B has twice as large genome as A. A A True profile: 67% A, 33% B Profile estimated from reads: 50% A, 50%B B Ecoli genome variation can be as large as 20% we can try to take into account the genome size by estimating abundances based upon coverage, however, genomes of bacteria can vary in size, if the reads come from unsequenced organisms, this can be difficult

Single copy marker-based profiling Population of 2 bacteria, A and B. B has twice as large genome as A. A A Each have a single copy of gene C True profile: 67% A, 33% B Profile estimated from reads: 67% A, 33%B B Our focus is on using phylogeny-based methods. Phylogeny based methods tries to infer the relationship of the query sequence to the known reference sequences using a phylogeny. This allows us to infer information about sequences from novel sequences

TIPP: Taxonomic Identification And Phylogenetic Profiling Fragmentary unknown reads for a gene Known full length sequences for a gene, and an alignment and a tree ACCG CGAG CGG GGCT … ACCT ensemble of HMMs+statistics AGG...GCAT (species1) TAGC...CCA (species2) TAGA...CTT (species3) AGC...ACA (species4) ACT..TAGAA (species5)

TIPP: Taxonomic Identification And Phylogenetic Profiling Nguyen et al., Bioinformatics, 2014 Reads Assign to marker genes Marker genes Classify reads Compute profile

Abundance Profiling Objective: distribution of the species (or genera, or families, etc.) within the sample. Leading techniques: PhymmBL (Brady & Salzberg, Nature Methods 2009) NBC (Rosen, Reichenberger, and Rosenfeld, Bioinformatics 2011) MetaPhyler (Liu et al., BMC Genomics 2011), from the Pop Lab at the University of Maryland MetaPHlAn (Segata et al., Nature Methods 2012), from the Huttenhower Lab at Harvard mOTU (Bork et al., Nature Methods 2013) MetaPhyler, MetaPHlAn, and mOTU are marker-based techniques (but use different marker genes). Make a diagram to emphasis differences between genome-based and marker-based

“Hard” genome datasets (known genomes and high indel error) On the hard datasets, where the reads come from known genomes (i.e., all methods have seen the genomes that the reads come from), but have high rates of sequencing errors, there was a large separation between the methods. What I'm showing is distance to the true profile on the y-axis so lower is better, and on the x-axis is the error at the different taxonomic levels. This column is for long reads, and this one for short reads. Note: NBC, MetaPhlAn, and MetaPhyler cannot classify any sequences from at least of the high indel long sequence datasets. mOTU terminates with an error message on all the high indel datasets.

“Novel” genome datasets Red line Note: mOTU terminates with an error message on the long fragment datasets and high indel datasets.

TIPP Compared To Other Profiling Methods TIPP is highly accurate, even in the presence of novel genomes and high sequencing error All other methods are less robust Accurate profiles can be estimated using only a portion of the reads

Do Individual Primates From The Same Species Have Personal Microbiomes? To answer this question, we need longitudinal data from many individuals, so we went ahead and did that

Humans have personalized microbiome Recent research has shown that individual humans have a personalized microbiome. In 2010, Fierer showed that you could identify who used which keyboard by comparing the residual contatct microbiome on a keyboard and the skin microbiome of the user. Fierer et al., PNAS 2010 showed that you can identify who had previously used a keyboard via the residual contact microbiome (three individuals in study)

Experimental Design Data collected by Patton’s Lab at U of Washington Dataset (unpublished; in preparation) Data collected by Patton’s Lab at U of Washington Longitudinal study of the vaginal, rectal, and fecal microbiome in 39 female captive Pigtailed Macacas Weekly matched paired samples taken over a period of a month from each individual 16S rRNA amplicon sequencing TIPP (Nguyen et al. 2014) used to generate profiles Questions How to the microbiomes differ by body site and individual Can we identify an individual based upon the microbiome? Add picture of macacas

Experimental Design Week 1 Week 2 Week 3 Which individual?

Identification Results vaginal',0.583 fecal',0.744 rectal',0.769 fecal+rectal',0.859 fecal+vaginal',0.846 rectal+vaginal',0.846 fecal+rectal+vaginal',0.917 2 matched paired samples very different from original donor donor

Future Directions Expanding the marker set, both in the number of species and genes Statistical approach to combining profiles from different marker genes Developing TIPP for virobiome Jigsaw analogy

Acknowledgements Illinois Tandy Warnow Rebecca Stumpf Bryan White Mike Nute Brenda Wilson UCSD Siavash Mirarab UMD Mihai Pop Bo Liu U of Copenhagen Alonzo Alfaro-Núñez Tom Hansen Anders Hansen Funding NSF 09-35347 NSF 08-20709 NSF 0733029 University of Alberta Double it

Questions? TIPP tutorial tomorrow at 10:00-11:00 in MR7 Instructions for downloading at https://github.com/smirarab/sepp/blob/master/README.TIPP.md Tutorial at https://github.com/smirarab/sepp/blob/master/tutorial/tipp- tutorial.md I am a comp scientist that works on developing algorithms for biology