Gene Network Inference From Microarray Data

Slides:

Advertisements

Similar presentations

Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Dynamic Bayesian Networks (DBNs)

An Introduction to Variational Methods for Graphical Models.

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.

Introduction of Probabilistic Reasoning and Bayesian Networks

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Visual Recognition Tutorial

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Pattern Recognition and Machine Learning

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Tirgul 9 Amortized analysis Graph representation.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Genetic Networks ..

Development of Empirical Models From Process Data

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

6. Gene Regulatory Networks

1 gR2002 Peter Spirtes Carnegie Mellon University.

Linear and generalised linear models

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Reverse engineering gene networks using singular value decomposition and robust regression M.K.Stephen Yeung Jesper Tegner James J. Collins.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Basics of regression analysis

Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

Radial Basis Function Networks

Objectives of Multiple Regression

Gaussian process modelling

Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

Modeling and identification of biological networks Esa Pitkänen Seminar on Computational Systems Biology Department of Computer Science University.

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Outline Biological motivation Introduction to graph models and Bayesian network Case study “Module networks: identifying regulatory modules and their condition-specific.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Review of statistical modeling and probability theory Alan Moses ML4bio.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Introduction on Graphic Models

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Chapter 7. Classification and Prediction

Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

Estimating Networks With Jumps

Filtering and State Estimation: Basic Concepts

1 Department of Engineering, 2 Department of Mathematics,

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Markov Random Fields Presented by: Vladan Radosavljevic.

The loss function, the normal equation,

Chapter 20. Learning and Acting with Bayes Nets

Mathematical Foundations of BME Reza Shadmehr

Presentation transcript:

Gene Network Inference From Microarray Data

Copyright notice Many of the images in this power point presentation of other people. The Copyright belong to the original authors. Thanks!

Gene Network Inference

Level of Biochemical Detail Detailed models require lots of data! Highly detailed biochemical models are only feasible for very small systems which are extensively studied Example: Arkin et al. (1998), Genetics 149(4):1633-48 lysis-lysogeny switch in Lambda: 5 genes, 67 parameters based on 50 years of research, stochastic simulation required supercomputer

Example: Lysis-Lysogeny Arkin et al. (1998), Genetics 149(4):1633-48

Level of Biochemical Detail In-depth biochemical simulation of e.g. a whole cell is infeasible (so far) Less detailed network models are useful when data is scarce and/or network structure is unknown Once network structure has been determined, we can refine the model

Boolean or Continuous? Boolean Networks (Kauffman (1993), The Origins of Order) assumes ON/OFF gene states. Allows analysis at the network-level Provides useful insights in network dynamics Algorithms for network inference from binary data A B C C = A AND B 1

Boolean or Continuous? Boolean abstraction is poor fit to real data Cannot model important concepts: amplification of a signal subtraction and addition of signals compensating for smoothly varying environmental parameter (e.g. temperature, nutrients) varying dynamical behavior (e.g. cell cycle period) Feedback control: negative feedback is used to stabilize expression  causes oscillation in Boolean model

Deterministic or Stochastic? Use of concentrations assumes individual molecules can be ignored Known examples (in prokaryotes) where stochastic fluctuations play an essential role (e.g. lysis-lysogeny in lambda) Requires stochastic simulation (Arkin et al. (1998), Genetics 149(4):1633-48), or modeling molecule counts (e.g. Petri nets, Goss and Peccoud (1998), PNAS 95(12):6750-5) Significantly increases model complexity

Deterministic or Stochastic? Eukaryotes: larger cell volume, typically longer half-lives. Few known stochastic effects. Yeast: 80% of the transcriptome is expressed at 0.1-2 mRNA copies/cell Holstege, et al.(1998), Cell 95:717-728. Human: 95% of transcriptome is expressed at <5 copies/cell Velculescu et al.(1997), Cell 88:243-251

Spatial or Non-Spatial Spatiality introduces additional complexity: intercellular interactions spatial differentiation cell compartments cell types Spatial patterns also provide more data e.g. stripe formation in Drosophila: Mjolsness et al. (1991), J. Theor. Biol. 152: 429-454. Few (no?) large-scale spatial gene expression data sets available so far.

Data Requirements: Lower Bounds from Information Theory How many bits of information are needed just to specify the connection pattern of a network? N2 possible connections between N nodes  N2 bits needed to specify which connections are present or absent O(N) bits of information per “data point”  O(N) data points needed

Effect of Limited Connectivity Assume only K inputs per gene (on average)  NK connections out of N2 possible: possible connection patterns Number of bits needed to fully specify the connection pattern:  O(Klog(N/K)) data points needed

Comparison with clustering Use pairwise correlation comparisons as a stand-in for clustering As number of genes increases, number of false positives will increase as well  need to use more stringent correlation test If we want to use the same correlation cutoff value r, we need to increase the number of data points as N increases  O(log(N)) data points needed

Summary Fully connected N (thousands) Connectivity K Klog(N/K) (hundreds?) Clustering log(N) (tens) Additional constraints reduce data requirements: choice of regulatory functions limited connectivity Network inference is feasible, but does require much more data than clustering

Reverse Engineering Gene Network Methods Boolean network Relevance network (co-expression network) Bayesian network Graphical Gaussian models Differential equation

Gene Networks: reverse engineering Dynamical gene networks: discrete models-- Boolean networks Bayesian networks, Petri Net continuous models-- neural networks differential equations Static gene networks: statistical correlation analysis graph theory approach

Problems Static model: require less data but low accuracy Dynamical model: require more data but high accuracy Noise and time delay  master equations Problem: scarcity of time series data or dimensionality problem, e.g. number of genes typically far exceeds the number of time points for which data are available, making the problem an ill-posed one

Gene Co-expression Relation The relation of n gene expressions can be represented by an n×n symmetric correlation (e.g. Pearson correlation) matrix M. Coexistence of collectivity and noise: m=Mn+Mc. Strong correlation part Mc indicates modular collectivity. Weak correlation part Mn indicates “noise” between unrelated genes.

Relevance networks (Butte and Kohane, 2000) Choose a measure of association A(.,.) Define a threshold value tA For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

Relevance networks (Butte and Kohane, 2000)

Determining the Threshold by Random Matrix Theory Construct a series of correlation matrices with different cutoff values. For a certain cutoff, the absolute values less than the cutoff are set to zero Only the correlation coefficients with absolute values beyond the cutoff are kept . Calculate the NNSD of the series of correlation matrices. Determine the cutoff threshold by testing Fit-of-Goodness to Poisson distribution using Chi-square test.

Yeast Gene Co-expression Network at Cutoff 0.77 Red represents the major functional category of each module while purple, yellow and tan represent other functional categories, which are often clustered into sub-modules. Genes in lavender participate in processes closely related to genes in red. White nodes are unknown genes while black nodes are genes whose functional links to other genes are not currently understood. Green nodes are genes in metabolic processes, which are influenced by many biological processes. LightCyan nodes in Module 15 are genes involved in cell cycling regulation and related processes.

Graphical Gaussian Models GGMs are undirected probabilistic graphical models that allow the identification of conditional independence relations among the nodes under the assumption of a multivariate Gaussian distribution of the data. The inference of GGMs is based on a (stable) estimation of the covariance matrix of this distribution. A high correlation coefficient Cik between two nodes may indicate a direct interaction. The strengths of these direct interactions are measured by the partial correlation coefficient πik, which describes the correlation between nodes Xi and Xk conditional on all the other nodes in the network.

Graphical Gaussian Models 2 1 direct interaction strong partial correlation π12 Partial correlation, i.e. correlation conditional on all other domain variables Corr(X1,X2|X3,…,Xn) But usually: #observations < #variables

Graphical Gaussian Models To infer a GGM, one typically employs the following procedure. From the given data, the empirical covariance matrix is computed, inverted and the partial correlations ρik are computed. The distribution of | ρik | is inspected, and edges (i, k) corresponding to significantly small values of | ρik | are removed from the graph. The critical step in the application of this procedure is the stable estimation of the covariance matrix and its inverse. Schafer and Strimmer (2005) propose a novel covariance matrix estimator regularized by a shrinkage approach after extensively exploring alternative regularization methods based on bagging.

Further drawbacks Relevance networks and Graphical Gaussian models can extract undirected edges only. Bayesian networks promise to extract at least some directed edges. But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

Bayesian networks (BN) in brief Graphs in which nodes represent random variables (Lack of) Arcs represent conditional independence assumptions Present & absent arcs provide compact representation of joint probability distributions BNs have complicated notion of independence, which takes into account the directionality of the arcs

Bayes’ Rule Can rearrange the conditional probability formula to get P(A|B) P(B) = P(A,B), but by symmetry we can also get: P(B|A) P(A) = P(A,B) It follows that: The power of Bayes' rule is that in many situations where we want to compute P(A|B) it turns out that it is difficult to do so directly, yet we might have direct information about P(B|A). Bayes' rule enables us to compute P(A|B) in terms of P(B|A).

Bayesian networks NODES A Marriage between graph theory and probability theory. Directed acyclic graph (DAG) represents conditional independence relations. Markov assumption leads to a factorization of the joint probability distribution: B C EDGES D E F

Simple Bayesian network example, from “Bayesian Networks Without Tears” article P(hear your dog bark as you get home) = P(hb) = ?

Need prior P for root nodes and conditional Ps, that consider all possible values of parent nodes, for nonroot nodes

Major benefit of BN We can know P(hb) based only on the conditional probabilities of hb and its parent node. We don’t need to know/include all the ancestor probabilities between hb and the root nodes.

Independence assumptions Source of savings in # of values needed From our simple example: are ‘family-out’ and ‘hear-bark’ independent, i.e. P(hb|fo)=P(hb)? Intuition might say they are not independent…

Independence assumptions …but in fact they can be assumed to be independent if some conditions are met. Conditions are symbolized by presence/absence and direction of arrows between nodes. Knowing whether dog is or is not in the house is all that is needed to know probability of hearing a bark, so family being in or out is independent. This kind of independence assumption is what allows savings in how many numbers must be specified for probabilities.

Learning Bayesian Belief Networks The network structure is given in advance and all the variables are fully observable in the training examples. ==> Trivial Case: just estimate the conditional probabilities. The network structure is given in advance but only some of the variables are observable in the training data. ==> Similar to learning the weights for the hidden units of a Neural Net: Gradient Ascent Procedure The network structure is not known in advance. ==> Use a heuristic search or constraint-based technique to search through potential structures.

BN from microarray Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data,” Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N, Nature Genetics, June 2003

Results of SSR article Expression data set, from other researchers circa 2000, is for genes of yeast subjected to various kinds of stress Compiled list of 466 candidate regulators Applied analysis to 2355 genes in all 173 arrays of yeast data set This gave automatic inference of 50 modules of genes All modules were analyzed with external data sources to check functional coherence of gene products and validity of regulatory program Three novel hypotheses suggested by method were tested in bio lab and found to be accurate

Differential Equations Typically uses linear differential equations to model the gene trajectories: dxi(t) / dt = a0 + ai,1 x1(t)+ ai,2 x2(t)+ … + ai,n xn(t) Several reasons for that choice: lower number of parameters implies that we are less likely to over fit the data sufficient to model complex interactions between the genes

Small Network Example x2 x1 x4 x3 _ + dx1(t) / dt = 0.491 - 0.248 x1(t) dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t) dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t) dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

Small Network Example x1 x2 x3 x4 one interaction coefficient _ _ + _ dx1(t) / dt = 0.491 - 0.248 x1(t) dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t) dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t) dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

Small Network Example x2 x1 x4 x3 constant coefficients _ + dx1(t) / dt = 0.491 - 0.248 x1(t) dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t) dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t) dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

Issues with Differential Equations Even under the simplest linear model, there are m(m+1) unknown parameters to estimate: m(m-1) directional effects m self effects m constant effects Number of data points is mn and we typically have that n << m (few time-points). To avoid over fitting, extra constraints must be incorporated into the model such as: Smoothness of the equations Sparseness of the network (few non-null interaction coefficients)

Collins et al. PNAS Using SVD for a family of possible solutions Using robust regression to choose from them

Goal is to use as few measurements as possible Goal is to use as few measurements as possible. By this method (with exact measurements): M = O(log(N))

If the system is near a steady state, dynamics can be approximated by linear system of Differential Equations: xi = concentration of mRNA (reflects expression level of genes) λi = self-degradation rates bi = external stimuli ξi = noise Wij = type and strength of effect of jth gene on ith gene

Suppositions made: No time-dependency in connections (so W is not time-dependent), and they are not changed by the tests System near steady state Noise will be discarded, so exact measurements are assumed can be calculated exactly enough

System becomes: With A = W + diag(-λi) Compute by using several measurements of the data for X. (e.g. using interpolation) Goal = deduce W (or A) from the rest If M=N, compute (XT)-1, but mostly M << N (this is our goal: M = log(N))

Therefore, use SVD (to find least squares sol.): Here, U and V are orthogonal (UT = U-1) and W is diag(w1,…,wN) with wi the singular values of X Suppose all wi = 0 are in the beginning, so wi = 0 for i = 1…L and wi ≠ 0 (i=L+1...L+N)

Then the least squares (L2) solution to the problem is: With 1/wj replaced by 0 if wj = 0 So this formula tries to match every data point as closely as possible to the solution.

But all possible solutions are: with C = (cij)NxN where cij = 0 if j > L and otherwise just a scalar coefficient How to choose from the family of solutions ? The least squares method tries to match every datapoint as closely as possible → a not-so-sparse matrix with a lot of small entries.

Basing on prior biological knowledge, impose this on the solutions. e Basing on prior biological knowledge, impose this on the solutions. e.g.: when we know 2 genes are related, the solution must reflect this in the matrix Work from the assumption that normal gene networks are sparse, and look for the matrix that is most sparse thus: search cij to maximize the number of zero-entries in A

So: get as much zero-entries as you can therefore get a sparse matrix the non-zero entries form the connections fit as much measurements as you can, exactly: “robust regression” (So you suppose exact measurements)

Do this using L1 regression. Thus, when considering we want to “minimize” A. The L1 regression idea is then to look for the solution C where is minimal. This causes as many zeros as possible. Implementation was done using the simplex method (linear adjustment method)

Results: Mc = O(log(N)) Better than only SVD, without regression:

Thus, to reverse-engineer a network of N genes, we “only” need Mc = O(logN) experiments. Then Mc << N, and the computational cost will be O(N4) (Brute-force methods would have a cost of O(N!/(k!(N-k)!)) with k non-zero entries)

Discussion Advantages: Few data needed, in comparison with neural networks, Bayesian models No prior knowledge needed Easy to parallelize, as it recovers the connectivity matrix row by row (gene by gene) Also applicable to protein networks