A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Brief introduction on Logistic Regression
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Dynamic Bayesian Networks (DBNs)
Chapter 2: Second-Order Differential Equations
CMPUT 466/551 Principal Source: CMU
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Business and Economics
Geometric Approaches to Reconstructing Time Series Data Final Presentation 10 May 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
Classification and Prediction: Regression Analysis
Radial Basis Function Networks
Variance and covariance Sums of squares General linear models.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Objectives of Multiple Regression
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Cs: compressed sensing
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Analysis of the yeast transcriptional regulatory network.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Lecture 10: Correlation and Regression Model.
A comparison of methods for characterizing the event-related BOLD timeseries in rapid fMRI John T. Serences.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Do Now (3x + y) – (2x + y) 4(2x + 3y) – (8x – y)
Course 8 Contours. Def: edge list ---- ordered set of edge point or fragments. Def: contour ---- an edge list or expression that is used to represent.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Inferring gene regulatory networks from multiple microarray datasets (Wang 2006) Tiffany Ko ELE571 Spring 2009.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Factoring bi-primes with Orion Dr. William Macready, Vice-President, Product Development July
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Chapter 7. Classification and Prediction
Research in Computational Molecular Biology , Vol (2008)
CJT 765: Structural Equation Modeling
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Linear Model Selection and regularization
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Label propagation algorithm
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques, Université de Montréal October 2003

DNA Microarrays Experiment design Noise reduction Normalization … Data analysis

Gene Expression Data

Beyond Clustering… Time series x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ ? Gene network

Comparative Framework Specie CSpecie BSpecie A

Harder Problem? This new problem seems more ambitious and harder to solve. BUT, we will show that, for closely related species (samples), the comparative framework can actually improve the quality of the solutions recovered. The repetitive nature of the data can be used to sort through some of the noise and some of the ambiguity.

Outline Gene network model Single network inference –Algorithm –Simulations Multiple networks inference –Algorithm –Simulations Conclusions

Gene Network Model We use linear differential equations to model the gene trajectories (Chen et al. ‘99, D’haeseleer et al. ‘99) : dx i (t) / dt = a 0 + a i,1 x 1 (t)+ a i,2 x 2 (t)+ … + a i,n x n (t) Several reasons for that choice: –Takes advantage of the continuous aspect of the data. –Allows for feed-back loops. –Low number of parameters implies that we are less likely to over fit the data. –Sufficient to model complex interactions between genes.

Small Network Example dx 1 (t) / dt = x 1 (t) dx 2 (t) / dt = x 3 (t) x 4 (t) dx 3 (t) / dt = x 1 (t) x 3 (t) dx 4 (t) / dt = x 1 (t) x 3 (t) x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _

Small Network Example dx 1 (t) / dt = x 1 (t) dx 2 (t) / dt = x 3 (t) x 4 (t) dx 3 (t) / dt = x 1 (t) x 3 (t) dx 4 (t) / dt = x 1 (t) x 3 (t) x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ interaction coefficient

Small Network Example dx 1 (t) / dt = x 1 (t) dx 2 (t) / dt = x 3 (t) x 4 (t) dx 3 (t) / dt = x 1 (t) x 3 (t) dx 4 (t) / dt = x 1 (t) x 3 (t) x 4 (t) x2x2 x1x1 x4x4 x3x3 _ _ + + _ _ + _ constant coefficient

Problem Revisited a 0,i a 1,i a 2,i a 3,i a 4,i x1x x2x x3x x4x Given the time-series data, can we find the interactions coefficients?

Linear Differential Equations Even under the simplest linear model, there are m(m+1) unknown parameters to estimate: m(m-1) directional effects m self effects m constant effects Number of data points is mn and we typically have that n << m (few time-points). To avoid over fitting, extra constraints must be incorporated into the model such as: Smoothness of the equations (D’haeseleer et al. ‘99) Sparseness of the network, i.e. few non-null interaction coefficients (Yeung et al. ‘02, De Hoon et al. ‘02)

Algorithm for Network Inference To recover the interaction coefficients, we use stepwise multiple linear regression. Why? –This procedure finds coefficient that significantly improve the fit in the regression. It limits the number of non-zero coefficients (i.e. it finds sparse networks) a feature we were seeking. –It is highly flexible and provides p-value scores which can be interpreted easily.

Partial F Test The procedure finds the interaction coefficients iteratively for each gene x i. A partial F test is constructed to compare the total square error of the predicted gene trajectory with a specific subset of coefficients being added or removed. If the p-value obtained from the test exceeds a certain cutoff, the subset of coefficients is significant and will be added or removed. The procedures iterates until no more subsets of coefficients are either added or removed.

Simulations Difficult to find coefficients that will produce realistic gene trajectories. We select coefficients such that the resulting trajectories satisfy 3 conditions: –They are bounded –The correlation of any pair is not too high –They are not too stable We added gaussian noise to model errors.

Gaussian Noise

regression procedure Network Inference a 0,i a 1,i a 2,i a 3,i a 4,i x1x x2x x3x x4x Procedure recovers perfectly this network with 4 genes and 10 interactions coefficients. x2x2 x1x1 x4x4 x3x3 _ _ + +_ _ + _

10 Genes Procedure also recovers perfectly this network with 10 genes and 22 interactions coefficients.

Multiple Networks Specie CSpecie BSpecie A

Types of Problems Multiple networks related by a graph or a tree can arise from various situations: –Different species –Different developments stages –Different tissues The goal is now not only to maximize the fit (with as few interactions as possible) but also to minimize an evolutionary cost on the graph of the networks.

Evolutionary Cost {1, 2} {1, 2, 3} {1}{1, 3}{1, 2, 3} sets of predicted regulators evolutionary event Evolutionary cost = 3

Multiple Network Inference The stepwise regression algorithm is modified to add/remove subsets of regulators directly on the edges of the graph. Partial F tests are computed on the vertices affected by this change the evaluate the change in fit. The p-values obtained are then modified based on the change in evolutionary cost. The p-values are finally combined into a scoring function using a Kolmogorov-Smirnov Test. The algorithm iteratively adds/removes the best scoring move when above/below a certain threshold.

Simulation Example

Simulation Results

Conclusions The comparative framework actually simplifies the inference process especially for instances of the problem with more genes, more noise or fewer time- points. The procedure could also be used for the revision of gene networks. Possibility of exploring different evolutionary models. We need to try the procedure on real data.