Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Bayesian network for gene regulatory network construction
A Tutorial on Learning with Bayesian Networks
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Dynamic Bayesian Networks (DBNs)
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
1 gR2002 Peter Spirtes Carnegie Mellon University.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Bayes Net Perspectives on Causation and Causal Inference
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
Bayesian Networks for Regulatory Network Learning BMI/CS 576 Colin Dewey Fall 2015.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Inference in Bayesian Networks
Bayesian Networks Applied to Modeling Cellular Networks
Bayesian Networks: Motivation
Evaluation of inferred networks
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Presentation transcript:

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS Nov 11 th, 2014

RECAP Many different types of molecular networks – Networks are defined by the semantics of the vertex and edges Computational problems in networks – Network reconstruction Infer the structure and parameters of networks We will examine this problem in the context of “expression-based network inference” – Network applications Properties of networks Interpretation of gene sets Using networks to infer function of a gene

Plan for next lectures Representing networks as probabilistic graphical models – Bayesian networks (Today) – Module networks – Dependency networks Other methods for expression-based network inference – Classes of methods – Strengths and weaknesses of different methods

Readings Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm In Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence 1999 Using Bayesian networks to analyze expression data. Journal of Computational Biology 7(3-4): , Inferring Cellular Networks: A review. BMC Bioinformatics 2007

Modeling a regulatory network HSP12 Sko1 Hot1 Sko1 Structure HSP12 Hot1 Who are the regulators? ψ(X 1,X 2 ) Function X1X1 X2X2 X3X3 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. How they determine expression levels? Hot1 regulates HSP12 HSP12 is a target of Hot1

Mathematical representations of networks X1X1 X2X2 X3X3 f Output expression of node Models differ in the function that maps input system state to output state Input expression of neighbors Rate equations Probability distributions Boolean NetworksDifferential equationsProbabilistic graphical models X1X X Input Output X1X2 X3

Network reconstruction Given – A set of attributes associated with network nodes – Typically attributes are mRNA levels Do – Infer what nodes interact with each other Algorithms for network reconstruction vary based on their meaning of interaction – Statistical dependence Mutual information Correlation – Predictive ability

Computational methods to infer networks We will focus on transcriptional regulatory networks These networks are inferred from gene expression data Many methods to do network inference – We will focus on probabilistic graphical models

Notation Assume we have N genes Random variable encoding the expression level of i th gene set of N random variables, one for each gene Joint assignment to all N random variables; d th data sample dataset Graph Parameters

Bayesian networks (BN) A special type of probabilistic graphical model Has two parts: – A graph which is directed and acyclic – A set of conditional distributions Directed Acyclic Graph (DAG) – The nodes denote random variables X 1 … X N – The edges encode statistical dependencies between the random variables Establish parent child relationships Each node X i has a conditional probability distribution (CPD) representing P(X i | Parents(X i ) ) Provides a tractable way to represent large joint distributions

An example Bayesian network Cloudy (C) Rain (R) Sprinkler (S) Adapted from Kevin Murphy: Intro to Graphical models and Bayes networks: WetGrass (W) P(C=F) P(C=T) 0.5 P(R=F) P(R=T) T P(S=F) P(S=T) 0.5 F T P(W=F) P(W=T) 1 0 F T F C F T T C F S R

Bayesian network representation of a transcriptional network Hot1 Sko1 HSP12 Random variables encode expression levels T ARGET ( CHILD ) R EGULATORS (P ARENTS ) X1X1 X2X2 X3X3 X1X1 X2X2 X3X3 P(X 3 |X 1,X 2 ) GenesRandom variables HSP12 Sko1 Hot1 P(X 2 ) P(X 1 ) An example Bayesian network Assume HSP12’s expression is dependent upon Hot1 and Sko1 binding to HSP12’s promoter HSP12 ON HSP12 Sko1 HSP12 OFF

Bayesian networks compactly represent joint distributions CPD

Example Bayesian network of 5 variables X1X1 X2X2 X3X3 X5X5 X4X4 P(X 3 |X 1,X 2 ) P(X 2 ) P(X 1 ) P(X 4 ) P(X 5 |X 3, X 4 )

CPD in Bayesian networks CPD: Conditional probability distributions are central to Bayesian networks We have a CPD for each random variable in our graph CPDs describe the distribution of a Child variable given the state of its parents. The same structure can be parameterized in different ways For example for discrete variables we can have table or tree representations

Consider the following case with Boolean variables X 1, X 2, X 3, X 4 where X 1, X 2 and X 3 are the parents of X 4 Representing CPDs as tables X1X1 X2X2 X3X3 tf ttt ttf tft tff ftt ftf0.5 fft fff P( X 4 | X 1, X 2, X 3 ) as a table X4X4 X1X1 X2X2 X4X4 X3X3 P( X 4 | X 1, X 2, X 3 )

A tree representation of a CPD P( X 4 | X 1, X 2, X 3 ) as a tree P(X 4 = t ) = 0.9 X1X1 f t X2X2 P(X 4 = t ) = 0.5 ft X3X3 P(X 4 = t ) = 0.8 ft X1X1 X2X2 X4X4 X3X3 Allows more compact representation of CPDs. For example, we can ignore some quantities.

The learning problems Parameter learning on known structure – Given training data estimate parameters of the CPDs Structure learning – Given training data, find the statistical dependency structure, and that best describe – Subsumes parameter learning For every candidate graph, we need to estimate the parameters

Example of estimating CPD table from data Consider the four random variables X 1, X 2, X 3, X 4 Assume we observe the following samples of assignments to these variables To estimate P(X 4 |X 1,X 2,X 3 ), we need consider all configurations of X 1,X 2, X 3 and estimate the probability of X 4 being T or F TFTT TTFT TTFT TFTT TFTF TFTF FFTF X1X1 X2X2 X3X3 X4X4 For example, consider X 1 =T, X 2 =F, X 3 =T P(X 4 =T|X 1 =T, X 2 =F, X 3 =T)=2/4 P(X 4 =F|X 1 =T, X 2 =F, X 3 =T)=2/4

Structure learning using score-based search... Bayesian network Maximum likelihood parameters Data

Scoring a Bayesian network The score of a Bayesian network (BN) is determined by how well the BN describes the data This in turn is a function of the data likelihood Given data The score of a BN is therefore Parents of X i Assignment to parents of X i in the d th sample

Scoring a Bayesian network Score of a graph G decomposes over individual variables Which can be re-arranged to be written as the outer sum over variables This enables us to efficiently compute the score effect of local changes – That is changes to the parent set of individual random variables

Learning network structure is computationally expensive For N variables there are possible networks Set of possible networks grows super exponentially NNumber of networks Need approximate methods to search the space of networks

Heuristic search of Bayesian network structures Make local operations to the graph structure – Add an edge – Delete an edge – Reverse an edge Evaluate score and select the network configuration with best score We just need to check for cycles Working with gene expression data requires additional considerations

Structure search operators A B CD A B C D add an edge A B C D delete an edge Current network Check for cycles

Bayesian network search: hill-climbing given: data set D, initial network B 0 i = 0 B best  B 0 while stopping criteria not met { for each possible operator application a { B new  apply(a, B i ) if score(B new ) > score(B best ) B best  B new } ++i B i  B best }

Network inference from expression data is difficult This is because – Lots of variables and not enough measurements for different variable configurations Good heuristics to prune the search space are highly desirable

Extensions to Bayesian networks to handle large number of random variables Sparse candidate algorithm Bootstrap-based ideas to score high confidence network Module networks (subsequent lecture)

The Sparse candidate Structure learning in Bayesian networks Key idea: Prune the potential parents for each node Identify k promising “candidate” parents for each network based on measures of statistical dependence – k<<N, N : number of random variables. Restrict networks to only include a subset of the “candidate” set. Possible pitfall – Early choices might exclude other good parents – Resolve using an iterative algorithm Friedman, 1999

Sparse candidate algorithm notation B n : Bayesian network at iteration n C i n : Candidate parent set for node X i at iteration n Pa n (X i ): Parents of X i in B n

Sparse candidate algorithm Input: – A data set – An initial network B 0 – A parameter k : number of parents Output: – Network B Loop until convergence – Restrict Based on D and B n-1 select candidate parents C i n (|C i n | ≤ k) for variable X i This defines a possibly cyclic directed network H n = {X,E} such that all edges – Maximize Find network B n that maximizes the score Score(B n ;D) among networks satisfying Termination: Return B n

The Restrict Step Measures of relevance

Information theoretic concepts Kullback Leibler (KL) Divergence – Dissimilarity between two distributions Mutual information – Mutual information between two random variables X and Y measures statistical dependence between X and Y – Also called the KL Divergence between the P(X,Y) and P(X)P(Y) Conditional Mutual information – Measures the information between two variables given a third

KL Divergence P(X), Q(X) are two probability distributions over X

Mutual Information Measure of statistical dependence between two random variables, X and Y Also the KL divergence between the joint and product of marginals – D KL (P(X,Y)||P(X)P(Y) )

Conditional Mutual Information Measures the mutual information between X and Y, given Z If Z captures everything about X, knowing Y gives no more information about X. Thus the conditional mutual information of X and Y given Z would be zero.

Measuring relevance of candidate parents in the Restrict Step A good parent for node X i is one that has a strong statistical dependence with X i Mutual information provides a good measure of statistical dependence I(X i ; X j ) Mutual information should be used only as a first approximation – Candidate parents need to be iteratively refined to avoid missing important dependences

Mutual information can miss some parents Consider the following true network If I(A;C)>I(A;D)>I(A;B) and we are selecting two candidate parents, B will never be selected as a parent How do we get B as a candidate parent? Note if we used mutual information alone to select candidates, we might be stuck with C and D A B C D

Sparse candidate restrict step Three strategies to handle the effect of greedy choices in the beginning Estimate the discrepancy between the (in)dependencies in the BN vs those in the data – KL Divergence between P(A,D) in the data vs P B (A,D) from the network B. Measure how much the current parent set shields A from D – Conditional mutual information between A and D given the current parent set of A. Measure how much the score improves on adding D

Measuring relevance of X i to X j M Disc ( X i,X j ) – Discrepancy between two joint distributions P( X i,X j ) : represented in the training data P B ( X i,X j ): represented by the BN B – D KL (P( X i,X j )||P B ( X i,X j )) M Shield ( X i,X j ) – Based on conditional mutual information – I( X i ;X j | Pa(X i ) ) M score ( X i,X j ): – Score when adding Xj to Xi’s current parent set Pa(X i ) – Score( X i ;X j,Pa(X i ),D )

Performance of Sparse candidate over simple hill-climbing Dataset 1 Dataset variables 200 variables Score 15 seems to perform the best

Summary Sparse candidate algorithm was developed to handle structure learning of Bayesian networks with large number of variables The main heuristic is to discard parents that are not likely to be good Different ways to rank parents were based on statistical dependence – Mutual information – Conditional mutual information – Increase in score when adding a new parent

Assessing confidence in the learned network Given the large number of variables and small datasets, the data is not sufficient to reliably determine the “best” network One can however estimate the confidence of specific properties of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X Y’s ancestor? – Is X in the Markov blanket of Y Markov blanket of Y is defined as those variables that render Y independent from the rest of the network Includes Y’s parents, children and parents of Y’s children

Markov blanket If MB(X) is the Markov blanket of X then P(X|MB(X),Y)=P(X|MB(X)). X A B FE C D X’s Markov blanket

How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network G i For each feature of interest f, calculate confidence

Does the confidence estimated from bootstrap procedure represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes conditions

Application of Bayesian network to yeast expression data 76 experiments/microarrays 800 genes Bootstrap procedure on 200 subsampled datasets Sparse candidate as the Bayesian network learning algorithm

Bootstrap-based confidence differs between original and randomized data --- Randomized data Original data

Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary Network inference from expression provides a promising approach to identify cellular networks Bayesian networks are one representation of networks that have a probabilistic and graphical component Network inference naturally translates to learning problems in Bayesian networks. – Network inference is computationally challenge Successful application of Bayesian network learning algorithms to expression data requires additional considerations – Reduce potential parents: statistically or using biological knowledge – Bootstrap based confidence estimation – Permutation based assessment of confidence