Presentation is loading. Please wait.

Presentation is loading. Please wait.

GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,

Similar presentations


Presentation on theme: "GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,"— Presentation transcript:

1 GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley, Chris Grouios and Quaid Morris Genome Biology 2008, 9:S4 Date:12/5/2015 Discussion leader: Stephen Rau Scribe:Harris Krause Computational Network Biology BMI 826/Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu

2 Problem overview Predicting protein function in real-time Computational approaches often use guilt-by-association algorithms which: 1.Are not very accessible 2.Need to be more accurate 3.Need to be more regularly updated

3 Approach Predict gene function as a binary classification problem From the multiple heterogeneous input data sources, assign each functional association network a positive weight that reflects its usefulness in predicting a given function of interest Construct a function-specific association network by taking the weighted average of the individual association networks Use a separate objective function to fit the weights

4 Approach (cont) Predict gene function from the composite network using a variation of the Gaussian field label propagation algorithm. Label propagation algorithm assign a score to each node in the network called the ‘discriminant value’ Reflects the computed degree of association that the node has to the seed list defining the given function.

5 Results GeneMANIA algorithm consists of two parts: 1.An algorithm, based on linear regression, for calculating a single, composite functional association network from multiple network derived from different genomic or proteomic data sources 2.A label propagation algorithm for predicting gene function given this composite network.

6 GeneMANIA label propagation algorithm Input 1.An association networks 2.A list of nodes with positive labels Possibly a list of nodes with negative labels 3.Initial label bias values Discriminant value assigned to each node by letting the initial label bias propagate through the association network to nearby nodes Discriminant values assigned to + and – labeled nodes deviate from the initial biases to account for noise - ?

7 GeneMANIA label propagation algorithm (cont) A cost function allows information about the node labels to propagate through the network to affect the discriminant values of genes that are not directly connected to the seed list. In the GeneMANIA algorithm we set the initial bias of unlabeled nodes to be the average bias of the labeled nodes: (n + - n - )/ (n + + n - ) where n + is the number of positive and n - is the number of negative examples.

8 Discriminant values Computed by solving the following objective function: f = argmin f ∑ ( f i – y i ) 2 + ∑ ∑ w ij (f i – f j ) 2 i j

9 GeneMANIA label propagation algorithm for large genomes Composite association network is the most time-consuming step. Conjugate gradient method used to solve system y = Af, f, the vector of discriminant values, A, the coefficient matrix y, the vector of node label biases.

10 Conjugate Gradient (CG) method 1.At each iteration t, the current estimate, f t, is multiplied by the matrix A. 2.If result of this matrix multiplication y t = Af t, is equal to y then f t is a correct solution 3.If y t does not equal y then the CG method calculates a new estimate, f t +1, based on the difference between y t and y. 4.Reduce the number, m, of non-zero elements in A to reduce runtime of CG 5.The runtime of each CG iteration is proportional to m m is the number of edges plus the number of nodes in the functional association network that A represents

11

12 GeneMANIA network integration Optimizes the network weights and calculates the discriminant values separately Runs the computationally intensive label propagation only once Regularized linear regression algorithm is robust to the inclusion of irrelevant and redundant networks. This helps when data sources cannot be carefully controlled such as in web repositories Ridge regression – 1.Find a vector of network weights α = [α 1,…, α d ] t 2.that minimizes the cost function (t - Ω α) t (t - Ω α) + (α – α) t S(α – α)

13 GeneMANIA network integration - variables α i = weight of the i th network t = vector derived from the initial label of the labeled nodes Ω = a matrix with columns corresponding to individual association networks α = the mean prior weight vector S = a diagonal precision matrix

14 Gene Function Prediction in mouse Employing regularization when using linear regression to combine multiple networks results in a drastic improvement in prediction accuracies in the most specific functional classes Demonstrated that in the binary classification of genes according to GO classes, the genes that are used as negative examples have a large impact on the prediction outcome with label propagation.

15

16 The effect of redundancy and random networks on equal weighting Constructed 20 redundant yeast networks by adding a slight amount of noise to the PfamA network Constructed two irrelevant networks by assigning association weights between 0 and 1, to a random set of 0.01% of the association weights and setting the rest of the associations to zero. Conducted function prediction… all networks assigned an equal weight

17

18 Discussion Demonstrated GeneMANIA is as accurate or more so in gene function prediction while sometimes requiring much less computation time. Can now perform on-demand function prediction and so able to use up-to-date annotation list and data sources Not tried the possibility of using a gene’s prior annotations to predict new ones?? A network representation is not the most efficient encoding of input data


Download ppt "GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,"

Similar presentations


Ads by Google