Download presentation
Presentation is loading. Please wait.
Published byHelen Brooks Modified over 8 years ago
1
Microbiomes and Computational Medicine Bryan A. White
2
Microbes rule the biosphere People = 6.86 x 10 9 6,868,700,000 Bacteria in people (just GI Tract) 1.5 x 10 22 15,000,000,000,000,000,000,000 Stars = 10 24 1,000,000,000,000,000,000,000,000 Bacteria on Planet = 10 30 100,000,000,000,000,000,000,000,000,000
3
The human microbiome or, the “other human genome” image courtesy of the NIH HMP website http://nihroadmap.nih.gov/hmp/ 1x10 14 microbial cells (micrbiome) 3x10 6 microbial genes (metagenome) 1x10 13 human cells 2.5x10 4 human genes
4
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY The Human Microbiome Significant role in Health: Example in the Gastrointestinal tract They foster development of the mucosal wall. The development and maturation of the immune system is dependent on the presence of some members of the intestinal microbiota. Link to human health and disease. Essential for the metabolism of certain compounds as well as xenobiotics. Protection against epithelial cell injury. Regulation of host fat storage. Stimulation of intestinal angiogenesis.
5
Consequences of a Perturbed Microbiome? Peptic ulcers Kidney Stones Osteoporosis Obesit y Diabetes Bowel Disorders Cancer Pre-term birth
6
NIH Human Microbiome Project 2007 (The Jumpstart Component) 200 reference genomes at 4 sequencing centers in the USA Light and in-depth 16S rDNA sequencing A total of 250 subjects to be recruited with an estimated 30 sites per subject 2009 (RFA) Bring the entire reference collection up to 1000 genomes Genomic sequencing of viruses and small eukaryotes Metagenomic in depth sequencing on the same subjects Other RFA’s for development of tools and technologies to handle the HMP data Coordination with the International efforts Total ~$157M in NIH funding
7
The proliferation of human microbiome projects. Asher Mullard.Nature 453, 578-580 (2008)
8
Challenges with studying the human microbiome Involvement of clinicians – time, IRB, etc. Study groups – recruitment and maintenance Sample availability and quantity – Right sample? How do you get enough DNA? Data analysis with heavy emphasis on variable regions rather than full-length sequences Interpretation of data across different groups, worldwide Do we have enough reference genomes for scaffolding?
11
HMP Metagenomics Goal: Generate a healthy, well defined reference cohort of specimens that will be used to analyze the microbiome of healthy adults using metagenomics analysis and establish a reference data set. Features: Developed and executed study protocol Screened 554 subjects 300 enrollees; 150 females, 150 males Sampled 279 enrollees 2X; sampled 100 enrollees 3X Sampled body sites in healthy 18-40 year olds 5 body sites-oral cavity, nares, skin, GI tract, and vagina 15 sites sampled for males; 18 sites sampled for females Collected 17,040 primary specimens Processed at JCVI, Wash U, Broad and Baylor
12
“Healthy Cohort” Body Sites Saliva Tongue dorsum Hard palate Buccal mucosa Keratinized (attached) gingiva Palatine tonsils Throat Supragingival plaque Subgingival plaque Retroauricular crease, both ears (2) Antecubital fossa (inner elbow), both arms (2) Anterior right and left nares (pooled) Stool Posterior fornix, vagina Midpoint, vagina Vaginal introitus Gut Ski n Nasal Oral Vaginal (vaginal) Slide courtesy of NHGRI
13
Definition of Some Terms Microbiome – The collective microbial community, a microbial census of “who is there”. Metagenome – The total functional gene content, and therefore metabolic potential, a census of what genes are present in the microbiome Phylotypes – A microbial type at the Class, Family or Genus. May be a species or even a strain OTU - Operational taxonomic unit (97% Sequence Similarity of the 16S rDNA gene). A sequence based descriptor.
14
Terms
15
Methods used to investigate microbiomes Culture independent-based approaches – 16S rRNA and other phylogenetic marker surveys (who is there) Limited whole genome sequencing (reference genomes) – Single cell and single molecule sequencing on the horizon Subtractive hybridization studies (comparative genomics) Stable Isotope Probing – Active populations Metagenomic sequencing - functional gene content (i.e., metabolic potential) Meta-transcriptomics – which genes are expressed Metabolomics – what products are produced
17
Metabolomics DNA Microbiome RNA Metagenomics Metatranscriptomics 16s Survey Microbiome and Metagenomic Analysis
18
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY Biome specific signatures based on the phylogentic content (16S rDNA Analysis)
19
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY Pyrosequence rDNA Tags for Deep Hypervariable Region Amplicon Sequening
20
Figure 4. Rarefaction curves. Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667
23
Tree Generation Phylogenetic tree types Distance Matrix method UPGMA Neighbor joining Character State method Maximum likelihood 23
24
Phylogenetic tree? A tree represents graphical relation between organisms, species, or genomic sequence In Bioinformatics, it’s based on genomic sequence 24
25
What do they represent? Root: origin of evolution Leaves: current organisms, species, or genomic sequence Branches: relationship between organisms, species, or genomic sequence Branch length: evolutionary time (in cladogram, it doesn't represent time) 25
26
Rooted / Unrooted trees Rooted tree: directed to a unique node (2 * number of leaves) - 1 nodes, (2 * number of leaves) - 2 branches Unrooted tree: shows the relatedness of the leaves without assuming ancestry at all (2 * number of leaves) - 2 nodes (2 * number of leaves) - 3 branches https://www.nescent.org/wg_EvoViz/Tree 26
27
More tree types used in bioinformatics (from cohen article) Unrooted tree Rooted tree Cladograms: Branch length have no meaning Phylograms: Branch length represent evolutionary change Ultrametric: Branch length represent time, and the length from the root to the leaves are the same https://www.nescent.org/wg_EvoViz/Tree 27
28
How to construct a phylogenetic tree? Step1: Make a multiple alignment from base alignment or amino acid sequence (by using MUSCLE, BLAST, or other method) 28
29
How to construct a phylogenetic tree? Step 2: Check the multiple alignment if it reflects the evolutionary process. http://genome.cshlp.org/content/17/2/127.full 29
30
How to construct a phylogenetic tree? cont Step3: Choose what method we are going to use and calculate the distance or use the result depending on the method Step 4: Verify the result statistically. 30
31
Distance Matrix methods Calculate all the distance between leaves (taxa) Based on the distance, construct a tree Good for continuous characters Not very accurate Fastest method UPGMA Neighbor-joining 31
32
UPGMA Abbreviation of “Unweighted Pair Group Method with Arithmetic Mean” Originally developed for numeric taxonomy in 1958 by Sokal and Michener Simplest algorithm for tree construction, so it's fast! 32
33
Downside of UPGMA Assume molecular clock (assuming the evolutionary rate is approximately constant) Clustering works only if the data is ultrametric Doesn’t work the following case: 33
34
Neighbor-joining method Developed in 1987 by Saitou and Nei Works in a similar fashion to UPGMA Still fast – works great for large dataset Doesn’t require the data to be ultrametric Great for largely varying evolutionary rates 34
35
Downside of Neighbor-joining Generates only one possible tree Generates only unrooted tree 35
36
Character state methods Need discrete characters Maximum likelihood Maximum parsimony (will be covered by Kyle) 36
37
Maximum likelihood Originally developed for statistics by Ronald Fisher between 1912 and 1922 Therefore, explicit statistical model Uses all the data Tends to outperform parsimony or distance matrix methods 37
38
How to construct a tree with Maximum likelihood? Step 1: Make all possible trees depending on the number of leaves Step 2: Calculate likelihood of occurring with the given data L(Tree) = probability of each tree. optimizing branch length generating tree topology Step 3: Pick the tree that have the highest likelihood. 38
39
Sounds really great? Num of leaves Num of possible trees 3 1 515 102027025 1315058768725 208200794532637891559375 Maximum likelihood is very expensive and extremely slow to compute 39
40
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY What microbial species are shared between sites and different species? Dethlefsen et al. Nature 2007 vol. 449 (7164) pp. 811-818
42
In adults, each part of the body supports a distinct microbial community. With no apparent relationship with gender, age, weight, ethnicity or race. HMP Consortium (2012) “Structure, Function and Diversity of the Human Microbiome in an Adult Reference Population” The Human Microbiome Consortium.
44
Microbiome is acquired anew each generation. Dominguez-Bello et al. (2010). 1) Infants obtain microbes from mother or environment. Palmer et al. (2007) Koenig et al. (2010) 2) Microbial succession over ~1-2 yrs. 3) Microbiome becomes “adult-like” in ~1- 2 yrs. Dominguez-Bello et al. PNAS | June 29, 2010 | vol. 107 | no. 26 | 11975
45
N=1 N=3 N=1 N=5 N=1 Microbe:Microbe Metabolic Interactions Can Influence Composition
46
Co-abundance: Pearson correlations as a proxy for testing the interdependent structure of a microbiome Abundance of OTU A Abundance of OTU B Pearsons correlation = 10.9 0.70
47
Number of Connections Formed Not Influenced by OTU Abundance
48
Number of Connections Formed Not Influenced by OTU Prevalence
50
Random/Exponential vs.Scale –free Networks
51
Loss of Scale-free structure in Perturbed Howlers Slope = -1.2 Slope = -0.3
52
Scale-Free DD in Healthy Human Samples Slope = -1.2
53
Degree Distribution Not Affected by Natural Plasticity Slope = -1.2 Slope = -1.1 Slope = -1.3
54
Figure 4. Rarefaction curves. Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667
55
Biome specific signatures based on the functional gene content (Metagenome Wide Association Studies - MWAS) Hugenholtz and Tyson. 2008. Nature 455:481.
56
Figure 2. Topics in the study of the human microbiome with outstanding computational biology challenges. Gevers D, Pop M, Schloss PD, Huttenhower C (2012) Bioinformatics for the Human Microbiome Project. PLoS Comput Biol 8(11): e1002779. doi:10.1371/journal.pcbi.1002779 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002779
57
Figure 1. Environmental Shotgun Sequencing (ESS). Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667
58
Figure 3. Fragment assembly. Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667
59
NATURE| Vol 464|4 March 2010
60
Enterotype and Vagiotype Concept
61
Enterotypes M Arumugam et al. Nature 000, 1-7 (2011) doi:10.1038/nature09944
62
Vagiotypes Ravel et al. www.pnas.org/cgi/doi/10.1073/pnas.1002611107Ravel et al. www.pnas.org/cgi/doi/10.1073/pnas.1002611107 PNAS
63
INFORMATICS Tool development for data analysis: A distributed, scalable metagenomic analysis system using clouds Goll et al. Bioinformatics (2010) 26 (20): 2631-2632. JCVI Metagenomics Reports (METAREP) data mining metagenomic datasets from HMP rich web interface for analysis and comparison of annotated metagenomics datasets high-performance search engine to query large data collections Distributed, cloud-based design for METAREP Registry for metagenomic data at different institutes / labs, data queries run across all sites Metagenomic pipelines on the cloud, no need for local data centers, benefit for smaller labs Option to install pipelines on traditional data centers / clusters for security
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.