Download presentation
Presentation is loading. Please wait.
1
Comparative Expression Moran Yassour +=
3
Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes interact Distinguish between accidentally regulated genes from those that are physiologically important
4
Construction of a gene- coexpression network. Evolutionarily diverse organisms with extensive microarray data: Homo sapiens Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae. We first associated genes from one organism with their orthologous counterparts in other organisms.
5
Evolution 101 Paralogs vs. Orthologs
6
Evolution 101 Paralogs vs. Orthologs
7
Construct a metagene Using this method, we assigned each gene to at most a single metagene. ignore non- reciprocal hits identify connected components Human gene Fly gene Worm gene Yeast gene best BLAST hit MEG
9
Some numbers In total we have 6307 metagenes (6591 human genes, 5180 worm genes, 5802 fly genes, and 2434 yeast genes.) We sought to identify pairs of metagenes that not only were coexpressed in one experiment and in one organism but that also showed correlation in diverse experiments in multiple organisms.
10
Edges in the graph HumanFlyWorm MEG1 MEG2 ? 1 5 4 3 2 5 3 2 4 1 1 2 4 5 3 MEG1MEG2 2 4 2 {2,4,2} significant ? (P-value <? 0.05) draw an edge
11
Statistical tests (1) – permuted metagenes Construction of a network from a set of permuted metagenes (random collection of genes from each organism) At P < 0.05, the real networks contained 3.5 ± 0.03 times as many interactions as the random networks contained
12
Statistical tests (2) – half the data Split microarray data into halves two networks We then counted the fraction of interactions that were significant in one network (P < 0.05), given that they were significant in the other network at P < p for various values of p. P = 0.05 41% significant expression interactions
13
Statistical tests (3) – noise stability We added increasing levels of Gaussian noise to the entire data set for each of the organisms. Real network negative log P-value Noise negative log P-value
14
Visualization x-y plane – negative logarithm of P value K-means clustering z axis – density of genes in the region
15
Example – Component 5 A total of 241 metagenes 110 of which were previously known to be involved in the cell cycle. 202 cell cycle metagenes in the network. P-value < 10 -85 Of the 241 cell cycle metagenes: 30 – regulating the cell cycle. 80 – terminal cell cycle functions. 131 – unknown.
16
Experimental validation (1) – expression data Five metagenes with a significant number of links to known cell proliferation genes. Measuring expression levels in dividing pancreatic cancer cells and in nondividing normal cells.
17
Experimental validation (2) – loss-of-function mutant loss-of-function mutant phenotype for one of these genes (C. elegans gene ZK652.1) RNA interference (RNAi) of ZK652.1 resulted in excess nuclei in the germ line, suggesting that the wild- type function of this gene is to suppress germline proliferation.
18
Multi-species vs. single species (1) For each gene (of the five metagenes), we constructed an organism-specific neighborhood. On average, the neighborhoods of these five genes were over four times more enriched for cell proliferation and cell cycle genes in the multiple-species network than they were in the best single-species neighborhood.
19
Multi-species vs. single species (2) Trying to link together genes that were previously known to be involved in a single function (coverage) excluding genes not known to participate in that function (accuracy)
20
Huge data The multiple-species network was built from more DNA microarray data (3182). Construction of the network out of only 979 DNA microarrays (as in the worm data set) gave similar results.
21
Summary - Multi is good We map only genes that have orthologs in other species and thus focuses strongly on core, conserved biological processes; Interactions in the multiple-species network imply a functional relationship based on evolutionary conservation. Nice to have – analysis of other components.
23
Goal Comparative study of large datasets of expression profiles from six evolutionarily distant organisms:
24
Goal Coexpression is often conserved. Comparing the regulatory relationships between particular functional groups in the different organisms. Comparing global topological properties of the transcription networks derived from the expression data, using a graph theoretical approach.
25
Homologous gene with preserved function
26
Coexpression conservation Coexpressed groups - yeast transcription modules For each yeast module we constructed five “homologue modules”.
27
Refining homologue modules The signature algorithm identifies those homologues that are coexpressed under a subset of the experimental conditions. Furthermore, it reveals additional genes that are not homologous with any of the original genes, but display a similar expression pattern under those conditions
28
Correlation distribution the distribution of the Z-scores for the average gene–gene correlation of all the “homologue modules”
29
Higher-order regulatory structures
30
Cell Cycle Experiments
31
Subsets of the data Correlations between the sets of conditions for randomly selected subsets of the data. Although the data is sparse, the findings reflect real properties of the expression network.
32
Decomposition of the expression data Decomposition of the expression data into a set of transcription modules using the iterative signature algorithm (ISA) Modules are colored according to the fraction of homologues they possess in the other organism Protein synthesis
33
Power-law connectivity distribution
34
Connections & Connectivity Connections between genes of similar connectivity are enhanced (red regions) Connections between highly and weakly connected genes are suppressed (blue)
35
Essentiality & Connectivity The likelihood of a gene to be essential increases with its connectivity.
36
Homology & Connectivity The highly connected genes are more likely to have homologues in the other organisms
37
Summary Similarity in lower resolution, differences in higher resolution: All expression networks share common topological properties (scale-free connectivity distribution, high degree of modularity). The modular components of each transcription program as well as their higher-order organization appear to vary significantly between organisms and are likely to reflect organism-specific requirements.
38
Future Gene expression studies Evolution studies
39
Thank you …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.