Download presentation
Presentation is loading. Please wait.
Published byAlban Webb Modified over 9 years ago
2
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions
3
How do the genes and their products interact to collectively perform a function? A B Gene G 35 RPM Inhibitor U2AF Gene G Molecular interaction networks
4
A network containing genes connected to each other whenever they physically or functionally interact Proteins that interact/co-complex (ribosomal, polymerase, etc.) Transcription factors and their target Enzymes catalyzing different steps in the same metabolic pathway Genes with correlation in expression Genes with similar phylogenetic profiles Functional ^
5
Arabidopsis is the primary model organism for plants Complex organization from molecular to whole organism level. A key challenge … Understanding the cellular machinery that sustains this complexity. In the current post-genomic times, a main aspect of this challenge is ‘ gene function prediction ’: Identification of functions of all the (~30, 000) genes in the genome.
6
Total of ~30,000 genes in the genome Extent of gene annotations in Arabidopsis ~15% with some experimental annotation ~8% with ‘expert’ annotation ~13% with annotations based on manually curated computational analysis ~14% with electronic annotations Leaving ~50% of the genome without any annotation Ashburner et al, (2000) Nat. Gen. Swarbreck et al (2008) Nuc. Acids. Res.
7
Exploit high-throughput data Integrating functional genomic data could lead to Network models of gene interactions that resemble the underlying cellular map. Typically these networks contain gene functional interactions Connecting pairs of genes that participate in the same biological processes. In such a network, the very place of a gene establishes the functional context that gene. ‘Guilt-by-association’ – genes of unknown functions can also be imputed with the function of their annotated neighbors.
8
Functional interaction networks Functional interaction network models have been developed for Arabidopsis. Lee et al. (2010) Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Very comprehensive in terms of using and integrating datasets in other organisms for application in plants. Integrated 24 datasets: 5 datasets from Arabidopsis and the rest from other models. AraNet: 19,647 genes, 1,062,222 interactions.
9
Goal of this study … We examine the state of network-based gene function prediction in Arabidopsis. Evaluate the performance of multiple prediction algorithms on AraNet. Assesses the influence of the number of genes annotated to a function and the source of annotation evidence. Compute the correlation of prediction performance with network properties. Evaluate prediction performance for plant-specific functions.
10
Network-based gene function prediction algorithms Propagation of functional annotations across the network Guilt-by-association using direct interactions Use positive and negative examples Use only positive examples SinkSource Hopfield FunctionalFlow – multiple phases Local FunctionalFlow – 1 phase Local+ Each gene in the network
11
Network-based gene function prediction
12
Function A Function B Network-based gene function prediction
13
Sink Source In this study … Recall : fraction of known examples predicted correctly TP (TP + FN) Precision : fraction of predictions that are correct TP (TP + FP)
14
Performance of different algorithms Computational gene function prediction precedes and guides experimental validation What we get is a ranked list of novel predictions An experimenter would choose a manageable number of top-scoring predictions to pursue Precision at the top of the prediction list We choose precision at 20% recall ( P20R ) as the measure of performance
15
Performance of different algorithms SS seems to be better than the other algorithms What about the influence of the number of genes in a function? 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence
16
Performance of different algorithms Third groupFirst groupSecond group Number of genes annotated with a function Number of functions Each group containing ~125 functions
17
Performance of different algorithms For ‘small’ functions, the algorithm does not matter! And, using just experimental annotations is better when you know little about a function. For ‘medium’ functions, SS is a little better and use of ‘electronic’ evidences is mixed. For ‘large’ functions -SS is clearly the best - Using all annotation is better
18
Performance of different algorithms All ECsSans IEA/ISS Wilcoxon test: SS vs. other algorithms Overall, SinkSource appears to be best algorithm.
19
Correlation of performance with network properties Performance on a particular function might depend on how its genes are organized / connected among themselves in the network. Number of nodes Number of components Fraction of nodes in the largest connected component Total edge weight Weighted density Average weighted degree Average segregation
20
Correlation of performance with network properties
22
Number of nodes = 9 Number of components = 3 Fraction of nodes in the largest connected component = 4/9 Total edge weight = 8 Weighted density = 8/36 Average weighted degree = 16/9
23
Correlation of performance with network properties Functional modularity: Average Segregation
24
Correlation of performance with network properties Avg. seg = 8/22 Avg. seg = 12/15 Functional modularity: Average Segregation
25
We have … Vector of SS P20R values for each function Vector of values of a particular topological property for each function Spearman rank correlation Correlation of performance with network properties Weighted density P20R
26
Correlation of performance with network properties Spearman rank correlation
27
Performance on plant-specific functions For ‘conserved’ functions -Performance is better than that for all functions -Using all annotations is better For ‘plant-specific’ functions -Performance is much worse compared to ‘conserved’ functions -Using only experimental annotations is better The underlying network is built based on data from multiple non-plant species 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence
28
Most predictable ‘conserved’ functions protein folding nucleotide transport innate immunity cytoskeleton organization, and cell cycle
29
Least predictable ‘conserved’ functions regulation of … Specialized functions
30
Most predictable ‘plant-specific’ functions cell wall modification auxin/cytokinin signaling, and photosynthesis Contribution from Arabidopsis datasets
31
Least predictable ‘plant-specific’ functions development, morphogenesis pattern formation phase transitions of various tissues, organs / growth stages
32
Conclusions Evaluated the performance of various prediction algorithms on AraNet. SinkSource is the overall best prediction algorithm. Measured the influence of the number of genes annotated to a function and the source of annotation evidence. All algorithms perform poorly when only a small number of genes are ‘known’ or when annotating very specific functions. When only a small number of genes are ‘known’, use only experimentally verified annotations to make new predictions. When a considerable number of genes are ‘known’, use all annotations to make new predictions.
33
Conclusions Measured the correlation of performance with network properties Several topological properties correlate well with performance. ‘Average segregation’ has the strongest correlation.
34
Conclusions Assessed performance on conserved/plant-specific functions Performance on basic ‘conserved’ functions is better than that for all the functions. Specialized ‘conserved’ functions are hard to predict. Performance on ‘plant-specific’ functions is very poor. Also a consequence of the fact that ‘plant-specific’ functions generally have small number of annotations.
35
Conclusions Avenues for improvement in functional interaction networks Build functional interaction networks that are based on a larger collection of plant datasets. If possible, rely as little as possible on data from other species. Avenues for future experimental work ‘Plant-specific’ functions and Specialized ‘conserved’ functions.
36
Acknowledgements Arjun Krishnan Brett Tyler Andy Pereira
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.