Download presentation
Presentation is loading. Please wait.
Published byMeredith Harmon Modified over 9 years ago
1
Module networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 sroy@biostat.wisc.edu Nov 18 th & 20th, 2014
2
RECAP Probabilistic graphical models (PGMs) provide a natural representation of molecular networks Learning PGMs from data requires us to solve two problems – Parameter estimation given a graph structure – Structure learning which requires us to learn both structure and parameters Bayesian networks are a type of PGMs popular for representing molecular networks Learning Bayesian networks from gene expression data requires additional considerations – Sparse candidate algorithm: heuristics to reduce the number of candidate parents – Bootstrap to assess model confidence – Module networks
3
What you should know Motivation of module networks What is a module network? Compare and contrast module network learning algorithm to other network inference algorithms – E.g. Sparse candidate
4
Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn networks – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn the same parents and parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005
5
Definition of a module Statistical definition (specific to module networks by Segal 2005) – A set of random variables that share a statistical model Biological definition of a module – Set of genes that are co-expressed and co- regulated
6
An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes
7
Key questions of Module Networks How to represent the Conditional Probability Distributions (CPD) for children? – Regression Tree How to learn module networks?
8
Defining a Module Network A probabilistic graphical model over N random variables Set of module variables M 1.. M K Module assignments A that specifies the module (1-to-K) for each X i CPD per module P(M j |Pa Mj ), Pa Mj are parents of module M j – Each variable X i in M j has the same conditional distribution
9
Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME
10
Bayesian network vs Module network Bayesian network – Different CPD per random variable – Learning only requires to search for parents Module network – CPD per module Same CPD for all random variables in the same module – Learning requires parent search and module membership assignment
11
Learning a Module Network Given training dataset, fixed number of modules ( K ) Learn – Module assignments A of each variable to a module – The parents of each module
12
Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j
13
Module network learning algorithm
14
Module assignment search Happens in two places Module initialization – Interpret as clustering of the random variables Module re-assignment
15
Module initialization as clustering of variables for module network
16
Module re-assignment Must preserve the acyclic graph structure Must improve score Module re-assignment happens using a sequential update procedure: – Update only one variable at a time – The change in score of moving a variable from one module to another while keeping the other variables fixed
17
Module re-assignment via sequential update
18
Representing the Conditional probability distribution X i are continuous variables How to represent the distribution of X i given the state of its parents? How to capture context-specific dependencies? Module networks use a regression tree
19
Modeling the relationship between regulators and targets suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites
20
A regression tree A rooted binary tree T Each node in the tree is either an interior node or a leaf node Interior nodes are labeled with a binary test X i <u, u is a real number observed in the data Leaf nodes are associated with univariate distributions of the child
21
A regression tree to capture a CPD X 1 > e 1 X 2 > e 2 YES NO YES Leaf Interior node X3X3 X1X1 X2X2 Expression of gene represented by X 3 modeled using Gaussians at each leaf node e 1, e 2 are values seen in the data
22
An example regression tree for a Module network
23
A very simple regression tree X2X2 X3X3 e1e1 e2e2 X 2 > e 1 X 2 > e 2 YES NO YES X3X3
24
Algorithm for growing a regression tree Input: dataset D, child variable X i, candidate parents C i of X i Output: Tree T Initialize T to a leaf node, estimated from all samples of X i While not converged – For every leaf node l in T Find with the best split at l If split improves score – add two leaf nodes, i and j below l – Update samples and parameters associated with, i and j
25
Learning a regression tree Assume we are searching for the parents of a variable X 3 and it already has two parents X 1 and X 2 X 4 will be considered using “split” operations of existing leaf nodes X 1 > e 1 X 2 > e 2 YES NO YES X 4 > e 3 NOYES X 1 > e 1 X 2 > e 2 YES NO YES N1N1 N2N2 N3N3 N l: Gaussian associated with leaf l N2N2 N3N3 N4N4 N5N5
26
Convergence in regression tree Depth of tree Improvement in score Maximum number of parents Minimum number of samples per leaf node
27
Assessing the value of using Module Networks Using simulated data – Generate data from a known module network – Known module network was in turn learned from real data 10 modules, 500 variables – Evaluate using Test data likelihood Recovery of true parent-child relationships are recovered in learned module network Using gene expression data – External validation of modules (Gene ontology, motif enrichment) – Cross-check with literature
28
Test data likelihood Each line type represents size of training data 10 Modules is the best for almost all training data set sizes
29
Recovery of graph structure
30
Application of Module networks to yeast expression data Segal et al, Regev, Pe’er, Gasch 2005
31
Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network using expression data
32
The Respiration and Carbon Module Regulation tree
33
Global View of Modules modules for common processes often share common – regulators – binding site motifs
34
Summary Module networks – A type of Bayesian network – Identifies modules (sets of similarly behaving random variables) and learns parents for each module – Conditional probability distributions capture “rules” of regulatory relationships – Learning requires inferring parent->module relationships and module assignments In practice give more realistic networks compared to Bayesian networks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.