GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Neural networks Introduction Fitting neural networks
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Pattern Recognition and Machine Learning
Kriging.
Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
CMPUT 466/551 Principal Source: CMU
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Aspects of Conditional Simulation and estimation of hydraulic conductivity in coastal aquifers" Luit Jan Slooten.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Pattern Recognition and Machine Learning
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Data Mining Techniques Outline
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Curve-Fitting Regression
Optical flow and Tracking CISC 649/849 Spring 2009 University of Delaware.
© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Management Science
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Ensembles of Classifiers Evgueni Smirnov
9-1 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Multicriteria Decision Making Chapter 9.
Multicriteria Decision Making
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Biointelligence Laboratory, Seoul National University
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
EMIS 8373: Integer Programming Column Generation updated 12 April 2005.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
1 SYSTEM OF LINEAR EQUATIONS BASE OF VECTOR SPACE.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Semi-Supervised Clustering
CEE 6410 Water Resources Systems Analysis
Multiplicative updates for L1-regularized regression
One-layer neural networks Approximation problems
CSE 4705 Artificial Intelligence
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Hidden Markov Models Part 2: Algorithms
Ying shen Sse, tongji university Sep. 2016
Generally Discriminant Analysis
Label propagation algorithm
Presentation transcript:

GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley, Chris Grouios and Quaid Morris Genome Biology 2008, 9:S4 Date:12/5/2015 Discussion leader: Stephen Rau Scribe:Harris Krause Computational Network Biology BMI 826/Computer Sciences 838

Problem overview Predicting protein function in real-time Computational approaches often use guilt-by-association algorithms which: 1.Are not very accessible 2.Need to be more accurate 3.Need to be more regularly updated

Approach Predict gene function as a binary classification problem From the multiple heterogeneous input data sources, assign each functional association network a positive weight that reflects its usefulness in predicting a given function of interest Construct a function-specific association network by taking the weighted average of the individual association networks Use a separate objective function to fit the weights

Approach (cont) Predict gene function from the composite network using a variation of the Gaussian field label propagation algorithm. Label propagation algorithm assign a score to each node in the network called the ‘discriminant value’ Reflects the computed degree of association that the node has to the seed list defining the given function.

Results GeneMANIA algorithm consists of two parts: 1.An algorithm, based on linear regression, for calculating a single, composite functional association network from multiple network derived from different genomic or proteomic data sources 2.A label propagation algorithm for predicting gene function given this composite network.

GeneMANIA label propagation algorithm Input 1.An association networks 2.A list of nodes with positive labels Possibly a list of nodes with negative labels 3.Initial label bias values Discriminant value assigned to each node by letting the initial label bias propagate through the association network to nearby nodes Discriminant values assigned to + and – labeled nodes deviate from the initial biases to account for noise - ?

GeneMANIA label propagation algorithm (cont) A cost function allows information about the node labels to propagate through the network to affect the discriminant values of genes that are not directly connected to the seed list. In the GeneMANIA algorithm we set the initial bias of unlabeled nodes to be the average bias of the labeled nodes: (n + - n - )/ (n + + n - ) where n + is the number of positive and n - is the number of negative examples.

Discriminant values Computed by solving the following objective function: f = argmin f ∑ ( f i – y i ) 2 + ∑ ∑ w ij (f i – f j ) 2 i j

GeneMANIA label propagation algorithm for large genomes Composite association network is the most time-consuming step. Conjugate gradient method used to solve system y = Af, f, the vector of discriminant values, A, the coefficient matrix y, the vector of node label biases.

Conjugate Gradient (CG) method 1.At each iteration t, the current estimate, f t, is multiplied by the matrix A. 2.If result of this matrix multiplication y t = Af t, is equal to y then f t is a correct solution 3.If y t does not equal y then the CG method calculates a new estimate, f t +1, based on the difference between y t and y. 4.Reduce the number, m, of non-zero elements in A to reduce runtime of CG 5.The runtime of each CG iteration is proportional to m m is the number of edges plus the number of nodes in the functional association network that A represents

GeneMANIA network integration Optimizes the network weights and calculates the discriminant values separately Runs the computationally intensive label propagation only once Regularized linear regression algorithm is robust to the inclusion of irrelevant and redundant networks. This helps when data sources cannot be carefully controlled such as in web repositories Ridge regression – 1.Find a vector of network weights α = [α 1,…, α d ] t 2.that minimizes the cost function (t - Ω α) t (t - Ω α) + (α – α) t S(α – α)

GeneMANIA network integration - variables α i = weight of the i th network t = vector derived from the initial label of the labeled nodes Ω = a matrix with columns corresponding to individual association networks α = the mean prior weight vector S = a diagonal precision matrix

Gene Function Prediction in mouse Employing regularization when using linear regression to combine multiple networks results in a drastic improvement in prediction accuracies in the most specific functional classes Demonstrated that in the binary classification of genes according to GO classes, the genes that are used as negative examples have a large impact on the prediction outcome with label propagation.

The effect of redundancy and random networks on equal weighting Constructed 20 redundant yeast networks by adding a slight amount of noise to the PfamA network Constructed two irrelevant networks by assigning association weights between 0 and 1, to a random set of 0.01% of the association weights and setting the rest of the associations to zero. Conducted function prediction… all networks assigned an equal weight

Discussion Demonstrated GeneMANIA is as accurate or more so in gene function prediction while sometimes requiring much less computation time. Can now perform on-demand function prediction and so able to use up-to-date annotation list and data sources Not tried the possibility of using a gene’s prior annotations to predict new ones?? A network representation is not the most efficient encoding of input data