Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Fast Algorithms For Hierarchical Range Histogram Constructions

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Model Assessment, Selection and Averaging

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.

Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

Overview Full Bayesian Learning MAP learning

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Mutual Information Mathematical Biology Seminar

. Learning Bayesian networks Slides by Nir Friedman.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Evaluating Hypotheses

Basic Data Mining Techniques

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

Similar Sequence Similar Function Charles Yan Spring 2006.

Maximum Likelihood (ML), Expectation Maximization (EM)

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.

.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Flat clustering approaches

Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

Classification Ensemble Methods 1

. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.

Module Networks BMI/CS 576 Mark Craven December 2007.

Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Data Mining Lecture 11.

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

CS498-EA Reasoning in AI Lecture #20

1 Department of Engineering, 2 Department of Mathematics,

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Parametric Methods Berlin Chen, 2005 References:

Volume 14, Issue 7, Pages (February 2016)

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Presentation transcript:

Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony

Outline  The Problem  Regulators  Module Networks  Learning Module Networks  Results  Conclusion

The Problem  Inferring regulatory networks from gene expression data. From: Into:

Regulators

Regulation types

Regulators example This is an example for a regulating module.

Known solution: Bayesian Networks The problem: Too many variables and too little data cause statistical noise to lead to spurious dependencies, resulting in models that significantly over fit the data.

From Bayesian To Module

Module Networks   We assume that we are given a domain of random variables X = {X1; : : : ;Xn}.   We use Val(Xi) to denote the domain of values of the variable Xi.   A module set C is a set of such formal variables M1; : : : ;MK. As all the variables in a module share the same CPD.   Note that all the variables must have the same domain of values!

  A module network template T = (S; θ) for C defines, for each module Mj in C: 1) a set of parents PaMj from X; 2) a conditional probability template (CPT) P( Mj | PaMj ) which specifies a distribution over Val (Mj ) for each assignment in Val (PaMj ).   We use S to denote the dependency structure encoded by {PaMj : Mj in C} and θ to denote the parameters required for the CPTs {P( Mj | PaMj ) : Mj in C}. Module Networks

 A for C is a function A : X → {1; : : : ;K} such that A(Xi) = j only if Val (Xi) = Val (Mj ).  A module assignment function for C is a function A : X → {1; : : : ;K} such that A(Xi) = j only if Val (Xi) = Val (Mj ).  A module network is defined by both the module network template and the assignment function.

Example   In our example, we have three modules M1, M2, and M3.   PaM1 = Ø, PaM2 = {MSFT}, and PaM3 = {AMAT; INTL}.   In our example, we have that A(MSFT) = 1, A(MOT) = 2, A(INTL) = 2, and so on.

Learning Module Networks   The iterative learning procedure attempts to search for the model with the highest score by using the expectation Maximization (EM) algorithm.   An important property of the EM algorithm is that each iteration is guaranteed to improve the likelihood of the model, until convergence to a local maximum of the score. Each iteration of the algorithm consists of two steps: M-stepE-step

 In the, the procedure is given a partition of the genes into modules and learns the best regulation program (regression tree) for each module.  The regulation program is learned via a combinatorial search over the space of trees.  The tree is grown from the root to its leaves. At any given node, the query which best partitions the gene expression into two distinct distributions is chosen, until no such split exists. Learning Module Networks cont. M-step

Learning Module Networks cont.  In the, given the inferred regulation programs, we determine the module whose associated regulation program best predicts each gene’s behavior.  We test the probability of a gene’s measured expression values in the dataset under each regulatory program, obtaining an overall probability that this gene’s expression profile was generated by this regulation program.  We then select the module whose program gives the gene’s expression profile the highest probability, and re-assign the gene to this module.  We take care not to assign a regulator gene to a module in which it is also a regulatory input. E-step

Bayesian score   When the priors satisfy the assumptions above, the Bayesian score decomposes into local module scores:  Where…

Bayesian score cont.   Where Lj(U,X, ӨMj:D ) is the Likelihood function.   Where P( ӨMj | Sj =u ) is the Priors.  Where Sj = U denotes that we chose a structure where U are the parents of module Mj.  Where Aj = X denotes that A is such that Xj = X.

Assumptions   Let P(A), P(S | A), P(Ө | S,A) be assignment, structure, and parameter priors.   P(Ө | S,A) satisfies parameter independence if   P(Ө | S,A) satisfies parameter modularity if for all structures S1 and S2 such that

Assumptions   P(Ө, S | A) satisfies assignment independence if P(Ө | S, A) = P(Ө | S) and P(S | A) = P(S).   P(S) satisfies structure modularity if where Sj denotes the choice of parents for module Mj, and ρj is a distribution over the possible parent sets for module Mj.   P(A) satisfies assignment modularity if where Aj is the choice of variables assigned to module Mj, and {αj : j = 1; : : : ;K} is a family of functions from 2^X to the positive reals.

Assumptions - Explainations   Parameter independence, parameter modularity, and structure modularity are the natural analogues of standard assumptions in Bayesian network learning.   Parameter independence implies that P(Ө | S, A) is a product of terms that parallels the decomposition of the likelihood, with one prior term per local likelihood term Lj.   Parameter modularity states that the prior for the parameters of a module Mj depends only on the choice of parents for Mj and not on other aspects of the structure.   Structure modularity implies that the prior over the structure S is a product of terms, one per each module.

Assumptions - Explainations   These two assumptions are new to module networks.   Assignment independence: makes the priors on the parents and parameters of a module independent of the exact set of variables assigned to the module.   Assignment modularity: implies that the prior on A is proportional to a product of local terms, one corresponding to each module.   Thus, the reassignment of one variable from one module Mi to another Mj does not change our preferences on the assignment of variables in modules other than i; j.

Experiments  The network learning procedure was evaluated on synthetic data, gene expression data, and stock market data.  The data consisted solely of continuous values. As all of the variables have the same domain, the definition of the module set reduces simply to a specification of the total number of modules.  Beam search was used as the search algorithm, using a look ahead of three splits to evaluate each operator.  As a comparison, Bayesian networks were used with precisely the same structure learning algorithm, simply treating each variable as its own module.

Synthetic data  The synthetic data was generated by a known module network.  The generating model had 10 modules and a total of 35 variables that were a parent of some module. From the learned module network, 500 variables where selected, including the 35 parents.  This procedure was run for training sets of various sizes ranging from 25 instances to 500 instances, each repeated 10 times for different training sets.

Synthetic data - results  Generalization to unseen test data, measuring the likelihood ascribed by the learned model to4500 unseen instances.  As expected, models learned with larger training sets do better; but, when run using the correct number of 10 modules, the gain of increasing the number of data instances beyond 100 samples is small.  Models learned with a larger number of modules had a wider spread for the assignments of variables to modules and consequently achieved poor performance.

Synthetic data – results cont.  For all training set sizes, except 25, the model with 10 modules performs the best. Log-likelihood per instance assigned to held-out data.

Synthetic data – results cont.  Models learned using 100, 200, or 500 instances and up to 50 modules assigned 80% of the variables to 10 modules. Fraction of variables assigned to the largest 10 modules.

Synthetic data – results cont.  The total number of parent-child relationships in the generating model was  The procedure recovers 74% of the true relationships when learning from a dataset of size 500 instances. Average percentage of correct parent-child relationships recovered.

Synthetic data – results cont.  As the variables begin fragmenting over a large number of modules, the learned structure contains many spurious relationships.  Thus in domains with a modular structure, statistical noise is likely to prevent overly detailed learned models such as Bayesian networks from extracting the commonality between different variables with a shared behavior.

Gene Expression Data   Expression data which measured the response of yeast to different stress conditions was used.   The data consists of 6157 genes and 173 experiments.   2355 genes that varied significantly in the data were selected and learned a module network over these genes.   A Bayesian network was also learned over this data set.

Candidate regulators  A set of 466 candidate regulators was compiled from SGD and YPD.  Both transcriptional factors and signaling proteins that may have transcriptional impact.  Also included genes described to be similar to such regulators.  Excluded global regulators, whose regulation is not specific to a small set of genes or process.

Gene Expression reasults   The figure demonstrates that module networks generalize much better then Bayesian network to unseen data for almost all choices of number of modules.

Biological validity   Biological validity of the learned module network with 50 modules was tested.   The enriched annotations reflect the key biological processes expected in our dataset.   For example, the “protein folding” module contains 10 genes, 7 of which are annotated as protein folding genes. In the whole data set, there are only 26 genes with this annotation. Thus, the p-value of this annotation, that is, the probability of choosing 7 or more genes in this category by choosing 10 random genes, is less than 10^-12.   42 modules, out of 50, had at least one significantly enriched annotation with a p-value less than

Biological validity Cont.   The enrichment of both HAP4 motif and STRE, recognized by Hap4 and Msn4, respectively, supporting their inclusion in the module’s regulation program.   Lines represent 500 bp of genomic sequence located upstream to the start codon of each of the genes; colored boxes represent the presence of cis-regulatory motifs locates in these regions.

Stock Market Data   NASDAQ stock prices for 2143 companies, covering 273 trading days.   stock → variable, instance → trading day.   The value of the variable is the log of the ratio between that day’s and the previous day’s closing stock price.   As potential controllers, 250 of the 2143 stocks, whose average trading volume was the largest across the dataset were selected.

Stock Market Data   Cross validation is used to evaluate the generalization ability of different models.   Module networks perform significantly better than Bayesian networks in this domain.

Stock Market Data   Significant enrichment for 21 annotations, covering a wide variety of sectors where found.   In 20 of the 21 cases, the enrichment was far more significant in the modules learned using module networks compared to the one learned by AutoClass. Module networks compared with Autoclass

Conclusions   The results show that learned module networks have much higher generalization performance than a Bayesian network learned from the same data.   Parameter sharing between variables in the same module allows each parameter to be estimated based on a much larger sample, this allows us to learn dependencies that are considered too weak based on statistics of single variables. (these are well-known advantages of parameter sharing);   An interesting aspect of the method is that it determine automatically which variables have shared parameters.

Conclusions   The assumption of shared structure significantly restricts the space of possible dependency structures, allowing us to learn more robust models than those learned in a classical Bayesian network setting.   In module network, a spurious correlation would have to arise between a possible parent and a large number of other variables before the algorithm would introduce the dependency.

Overview on Module Networks

Literature  Reference: Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data. By: Eran Segal, Michal Shapira, Aviv Regev, Dana Pe’er, David Botstein, Daphne Koller & Nir Friedman. Bibliography:  P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. Autoclass: a Bayesian classification system. In ML ’

THE END