Download presentation
Presentation is loading. Please wait.
Published byNancy Hampton Modified over 8 years ago
1
Tara A.Gianoulis, Jeroen Raes April 13,2010 Presenter: Quan Zhang
2
Introduction Data collection Three methods Linear Model (LM) Canonical correlation analysis (CCA) Discriminative partition matching (DPM) Results : three case studies Energy conversion strategies Balancing amino acid Synthesis vs. Import Lipid and Glycan metabolism Conclusion Discussion
3
It is critical to understand: Environmental influence on microbial communities VS. how microbes reshape their environment. Direct sequencing: First large-scale technique that allows us to see the functions of these microbial communities Evidence for genomic adaptations: Comparative metagenomics approaches Sequence composition, genome size, evolutionary rates, metabolic capabilities in different environments
4
A one-dimensional representation of the environmental metabolic profiles for microbes sampled from nine environments. Dinsdale EA, et al. (2008) Functional metagenomic profiling of nine biomes. Nature 452:629–632
5
The previous studies used a rough definition for environment For example: marine vs. land This study treated environments explicitly as a set of continuous features For example: temperature, sample depth … Define metabolic footprint of distinct environments Footprint– The set of metabolic pathways that depend on or covary with the environment
6
Data collection Global Ocean Survey (GOS) dataset: filter size 0.1-0.8 µm Discard Sargasso Sea 11 Remaining 37 sites from CAMERA Environmental features temperature, sample depth, water depth, salinity and monthly average chlorophyll level Processing feature data average the salinity for all nonzero(except freshwater site) corroborate the missing measurements using World Ocean Database
8
Assign the peptides to a particular site using a mapping algorithm that cross-referenced between reads, scaffolds, and peptides based on predicted gene coordinates.
9
The “multiple sites” peptide distribution is similar to the distribution of all peptides, so this implies there are no major differences in assembly quality
10
Assign the peptide to a pathway Similarity search tool: BLASTP Database: STRING 7.0 ( current STRING 8.2) Threshold: bitscore>60, 80% consistency among top 5 hits Assign pathway frequency for each site
11
Build two matrices Rows are sites, columns are environmental features Rows are sites, columns are metabolic features
12
Determine the first order relationships between each pair of metabolic and environmental features Two directions: The environmental factors: variable; predicted from subset of pathway frequencies The pathway frequency : variable; predicted from environmental factors Determine the subset of predictive variables: Stepwise regression Akaike’s information criterion (AIC) Top 20 pathways showing the highest pairwise correlation were used Limitation: Views each feature in isolation There are hidden dependencies among the environmental features Ref: http://en.wikipedia.org/wiki
13
Predicting specific environmental parameters from subsets of metabolic pathways. Gianoulis T A et al. PNAS 2009;106:1374-1379 ©2009 by National Academy of Sciences
14
Canonical correlation analysis (CCA) Determines whether a global relationship between environmental and metabolic features exists Calculates the relative contribution of each feature to the global relationship by weighting both sets of features simultaneously. Discriminative partition matching(DPM) Analyzes whether groupings of sites based on similar environmental features also shared functional (pathway) similarities
15
Looks at the relationships between two groups of variables species variables vs. environment variables (community ecology) genetic variables vs. environmental variables (population genetics) Variables Units X’s Y’s Ref: http://myweb.dal.ca/hwhitehe/BIOL4062/redundancy.ppt
16
Given a linear combination of X variables: F = f 1 X1 + f 2 X2 +... + f p Xp and a linear combination of Y variables: G = g 1 Y 1 + g 2 Y 2 +... + g q Y q ----------------------------------------------------------------------------------------------------------- The first canonical correlation is: Maximum correlation coefficient between F and G, for all F and G F1={f 11,f 12,...,f 1p } and G 1 ={g 11,g 12,...,g 1q } are corresponding canonical variates (dimensions) ----------------------------------------------------------------------------------------------------------- The second canonical correlation is: Maximum correlation coefficient between F and G, for all F, orthogonal to F 1, and G, orthogonal to G 1 F2={f 21,f 22,...,f 2p } and G 2 ={g 21,g 22,...,g 2q } are corresponding second canonical variates (dimensions) Ref: http://myweb.dal.ca/hwhitehe/BIOL4062/redundancy.ppt
19
Energy conversion Amino acidmetabolism Lipid synthesis and glycanmetabolism
20
For environmental metadata Cluster sites based on their quantitative environmental metadata Two or more clusters For metabolism matrices Partition the sites in the metabolism matrices into 2 site sets Calculate the mean frequency of each pathway in each site set.
21
If the means of the pathway frequencies between 2 site sets were not significantly different: environment-based partitioning does not reflect functional differences If they do differ significantly: environmental features are related to that specific aspect of metabolism Specially, Benajamini-Hochberg was employed to correct p-value
22
When a two-sample t-test is performed on a gene, p-value is used to measure the significantly different level between two groups of samples. Ref: http://www.silicongenetics.com/Support/GeneSpring/GSnotes/analysis_guides/mtc.pdf
23
Similarities Both are used to explore relationships between metabolism and quantitative environmental parameters Differences DPM All environmental variables are equally important when defining the site sets Robust to noise May lose individual differences among sites and their relationships to the environment CCA Weights each environmental feature and each metabolic pathway independently More sensitive, but more susceptible to noise
24
NMI stands for Normalized Mutual Information NMI attempts to determine how well one classification is able to predict the second classification. If the NMI and transposed NMI scores are high, then either classification is good at predicting the other.
25
Energy conversion strategies Balancing amino acid Synthesis vs. Import Lipid and Glycan metabolism
26
Many of the environmentally-dependent pathways were associated with energy conversion. Ample diversification in energy conversion strategies observed Helps organisms maintain adequate energy levels despite changing environmental conditions
27
Light capture and electron transport ATP synthase
28
Phenomenon: Metabolic pathways associated with amino acid and cofactor transport and metabolism varied greatly with environment This variation may be a way to cope with the oligotrophic (nutrient-limited) nature of the oceans Example: changes in amino acid uptake strategies
29
Amino acid uptake is sensitive to light availability, which could be an additional factor in their variation.
30
We could say temperature and chlorophyll influenced the metabolism pathways mostly.
31
Phenomenon: correlation of amino acid biosynthesis pathways with the environment was unrelated to the energetic cost of synthesizing a particular amino acid Significant positive correlation between the structural correlation of the amino acid pathways and their dependence on potentially limiting cofactors Import of exogenous amino acids may be preferred when cofactors are limiting
32
Methionine is a central amino acid in oceanic microorganisms. Cobalamin is a methionine cofactor containing cobalt. Reduction of methoione is caused by cofactor limitation. Observation : synthesis of methionine and cobalamin amino acid transporters, methionine degradation Thus, methoionine has a significant role in shaping downstream environmental adaptations.
33
Lipid & glycans are important components in microbial cell membrane Like what people expected, lipid and glycan metabolism were related with environmental conditions. Explanation: Depth significantly contributed to lipid metabolism since microbes needed to choose the optimal buoyancy as a growth condition.
34
This method associates microbial community functions with quantitative, continuous features of the environment Metabolic pathway footprints can be used to predict environmental conditions when those data are not available
35
Only five environmental features ( temperature, sample depth, water depth, salinity and monthly average chlorophyll level) cannot fully describe the real-world environmental complexity <0.3% of proteins in GOS dataset were characterized as viral, but are expected to be much higher in reality Other questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.