Bayesian Networks for Regulatory Network Learning BMI/CS 576 Colin Dewey Fall 2015.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Review: Bayesian learning and inference
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Read R&N Ch Next lecture: Read R&N
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Baye’s Rule.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Introduction to biological molecular networks
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Inference Algorithms for Bayes Networks
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Module Networks BMI/CS 576 Mark Craven December 2007.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
CS 2750: Machine Learning Directed Graphical Models
Representation, Learning and Inference in Models of Cellular Networks
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Read R&N Ch Next lecture: Read R&N
Bayes Net Learning: Bayesian Approaches
Bayesian Networks Applied to Modeling Cellular Networks
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Bayesian Networks: Motivation
Read R&N Ch Next lecture: Read R&N
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Bayesian Networks for Regulatory Network Learning BMI/CS Colin Dewey Fall 2015

The lac Operon: Induction by LacI lactose present  protein encoded by lacI won’t bind to the operator (O) region

The lac Operon: Activation by Glucose glucose absent  CAP protein promotes binding by RNA polymerase; increases transcription

Probabilistic Model of lac Operon suppose we represent the system by the following discrete variables L (lactose)present, absent G (glucose)present, absent I (lacI)present, absent C (CAP)present, absent lacI-unboundtrue, false CAP-boundtrue, false Z (lacZ)high, low, absent suppose (realistically) the system is not completely deterministic the joint distribution of the variables could be specified by 2 6 × = 191 parameters

Motivation for Bayesian Networks Explicitly state (conditional) independencies between random variables –Provide a more compact model (fewer parameters) Use directed graphs to specify model –Take advantage of graph algorithms/theory –Provide intuitive visualizations of models

A Bayesian Network for the lac System Z LGCI lacI-unboundCAP-bound LI truefalse absent absentpresent presentabsent present lacI- unbound CAP- bound absentlowhigh truefalse true false falsetrue Pr ( Z | lacI-unbound, CAP-bound ) Pr ( lacI-unbound | L, I ) absentpresent Pr ( L )

Bayesian Networks Also known as Directed Graphical Models a BN is a Directed Acyclic Graph (DAG) in which –the nodes denote random variables –each node X has a conditional probability distribution (CPD) representing P(X | Parents(X) ) the intuitive meaning of an arc from X to Y is that X directly influences Y formally: each variable X is independent of its non- descendants given its parents

Bayesian Networks a BN provides a factored representation of the joint probability distribution Z LGCI lacI-unboundCAP-bound this representation of the joint distribution can be specified with 20 parameters (vs. 191 for the unfactored representation)

Representing CPDs for Discrete Variables ABC TF TTT TTF TFT TFF FTT FTF0.5 FFT FFF Pr( D | A, B, C ) A Pr(D = T ) = 0.9 F T B Pr(D = T ) = 0.5 FT C Pr(D = T ) = 0.8 FT CPDs can be represented using tables or trees consider the following case with Boolean variables A, B, C, D

Representing CPDs for Continuous Variables we can also model the distribution of continuous variables in Bayesian networks one approach: linear Gaussian models X normally distributed around a mean that depends linearly on values of its parents u i X U1U1 U2U2 UkUk …

The Inference Task in Bayesian Networks Given: values for some variables in the network (evidence), and a set of query variables Do: compute the posterior distribution over the query variables variables that are neither evidence variables nor query variables are hidden variables the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables LGIC lacI- unbound CAP- bound Z present ??low Z L GCI lacI-unbound CAP-bound

The Parameter Learning Task Given: a set of training instances, the graph structure of a BN Do: infer the parameters of the CPDs this is straightforward when there aren’t missing values, hidden variables LGIC lacI- unbound CAP- bound Z present truefalselow present truefalseabsent present false high... Z L GCI lacI-unbound CAP-bound

The Structure Learning Task Given: a set of training instances Do: infer the graph structure (and perhaps the parameters of the CPDs too) LGIC lacI- unbound CAP- bound Z present truefalselow present truefalseabsent present false high...

Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000 expression levels in populations of yeast cells 800 genes 76 experimental conditions

Learning Bayesian Network Structure figure from Friedman et al., Journal of Computational Biology, 2000 given a function for scoring network structures, we can cast the structure-learning task as a search problem

Structure Search Operators A B C D A B C D A B C D A B C D add an edge delete an edge reverse an edge

Bayesian Network Structure Learning we need a scoring function to evaluate candidate networks; Friedman et al. use one with the form log probability of data D given graph G log prior probability of graph G where they take a Bayesian approach to computing i.e. don’t commit to particular parameters in the Bayes net constant (depends on D)

The Bayesian Approach to Structure Learning Friedman et al. take a Bayesian approach: How can we calculate the probability of the data without using specific parameters (i.e. probabilities in the CPDs)? let’s consider a simple case of estimating the parameter of a weighted coin…

The Beta Distribution 01 suppose we’re taking a Bayesian approach to estimating the parameter θ of a weighted coin the Beta distribution provides a convenient prior where # of “imaginary” heads we have seen already # of “imaginary” tails we have seen already continuous generalization of factorial function

The Beta Distribution suppose now we’re given a data set D in which we observe M h heads and M t tails the posterior distribution is also Beta: we say that the set of Beta distributions is a conjugate family for binomial sampling

The Beta Distribution assume we have a distribution P(θ) that is Beta(α h, α t ) what’s the marginal probability (i.e. over all θ) that our next coin flip would be heads? what if we ask the same question after we’ve seen M actual coin flips?

The Dirichlet Distribution for discrete variables with more than two possible values, we can use Dirichlet priors Dirichlet priors are a conjugate family for multinomial data if P(θ) is Dirichlet(α 1,..., α K ), then P(θ|D) is Dirichlet(α 1 +M 1,..., α K +M K ), where M i is the # occurrences of the i th value

The Bayesian Approach to Scoring BN Network Structures we can evaluate this type of expression fairly easily because –parameter independence: the integral can be decomposed into a product of terms: one per variable –Beta/Dirichlet are conjugate families (i.e. if we start with Beta priors, we still have Beta distributions after updating with data) –the integrals have closed-form solutions

Scoring Bayesian Network Structures when the appropriate priors are used, and all instances in D are complete, the scoring function can be decomposed as follows thus we can –score a network by summing terms (computed as just discussed) over the nodes in the network –efficiently score changes in a local search procedure

Bayesian Network Search: The Sparse Candidate Algorithm [Friedman et al., UAI 1999] Given: data set D, initial network B 0, parameter k

The Restrict Step In Sparse Candidate to identify candidate parents in the first iteration, can compute the mutual information between pairs of variables where denotes the probabilities estimated from the data set

The Restrict Step In Sparse Candidate suppose true network structure is: we’re selecting two candidate parents for A and I(A;C) > I(A;D) > I(A;B) the candidate parents for A would then be C and D; how could we get B as a candidate parent on the next iteration? A B C D A D C

The Restrict Step In Sparse Candidate mutual information can be thought of as the KL divergence between the distributions Kullback-Leibler (KL) divergence provides a distance measure between two distributions, P and Q (assumes X and Y are independent)

The Restrict Step In Sparse Candidate we can use KL to assess the discrepancy between the network’s estimate P net (X, Y) and the empirical estimate A B C D A D C B true distributioncurrent Bayes net

The Restrict Step in Sparse Candidate important to ensure monotonic improvement

The Maximize Step in Sparse Candidate hill-climbing search with add-edge, delete-edge, reverse-edge operators test to ensure that cycles aren’t introduced into the graph

Efficiency of Sparse Candidate possible parent sets for each node changes scored on first iteration of search changes scored on subsequent iterations ordinary greedy search greedy search w/at most k parents Sparse Candidate

Bayes Net Structure Learning Case Study: Friedman et al., JCB 2000 expression levels in populations of yeast cells 800 genes 76 experimental conditions used two representations of the data –discrete representation (underexpressed, normal, overexpressed) with CPTs in the models –continuous representation with linear Gaussians

Bayes Net Structure Learning Case Study: Two Key Issues Since there are many variables but data is sparse, there is not enough information to determine the “right” model. Instead, can we consider many of the high-scoring networks? How can we tell if the structure learning procedure is finding real relationships in the data? Is it doing better than chance?

Representing Partial Models How can we consider many high-scoring models? Use the bootstrap method to identify high-confidence features of interest. Friedman et al. focus on finding two types of “features” common to lots of models that could explain the data –Markov relations: is Y in the Markov blanket of X? –order relations: is X an ancestor of Y X A B D

Markov Blankets every other node Y in the network is conditionally independent of X when conditioned on X’s Markov blanket MB(X) the Markov blanket for node X consists of its parents, its children, and its children’s parents X A B FE C D

Rained Grass-wet Sprinkler-on Markov Blankets why are parents of X’s children in its Markov blanket? suppose we’re using the following network to infer the probability that it rained last night we observe the grass is wet; is the Sprinkler-on variable now irrelevant? no – if we observe that the sprinkler is on, this helps to “explain away” the grass being wet

Estimating Confidence in Features: The Bootstrap Method for i = 1 to m –randomly draw sample S i of N expression experiments from the original N expression experiments with replacement –learn a Bayesian network B i from S i some expression experiments will be included multiple times in a given sample, some will be left out. the confidence in a feature is the fraction of the m models in which it was represented

Permutation Testing: Do the Networks Represent Real Relationships how can we tell if the high-confidence features are meaningful? compare against confidence values for randomized data – genes should then be independent and we shouldn’t find “real” features randomize each row independently genes conditions

Confidence Levels of Features: Real vs. Randomized Data Markov featuresorder features figure from Friedman et al., Journal of Computational Biology, 2000

Bayes Net Structure Learning Case Study: Sachs et al., Science 2005 measured –levels of key molecules in (thousands of ) single cells, discretized to low, medium, high –11 phosphoproteins and phospholipids –9 specific perturbations Figure from Sachs et al., Science, 2005

A Signaling Network Figure from Sachs et al., Science 2005

Causality SmokesLung-Cancer TT TF FF TT FT TT FF FF given only observational data, there are many cases in which we can’t determine causality SmokesLung-Cancer Smokes these two networks explain the observations in the table equally well

How to Get at Causality? Observing how events are related in time can provide information about causality. time smokescancersmokescancer

How to Get at Causality? Interventions -- manipulating variables of interest -- can provide information about causality.

Sachs et al. Computational Experiments Simulated-annealing search with add-edge, delete-edge, reverse-edge operators repeated 500 times with different initial random graphs final model includes edges with confidence > 85% to evaluate importance of large data set of single-cell measurements with interventions, constructed control data sets 1.small 2.observation-only 3.population-average

Interventions

Evaluating the Model Learned by Sachs et al.

The Value of Interventions, Data Set Size and Single-Cell Data

Summary of BN Structure Learning structure learning often cast as a search problem the sparse candidate algorithm is more efficient than a naïve greedy search -- takes advantage of assumption that networks in this domain are sparsely connected we can score candidate networks efficiently because of parameter independence (overall score is a sum of local terms) we can score candidate networks efficiently with a Bayesian approach if we use conjugate priors high scoring network structures can be characterized using a bootstrapping approach that considers order and Markov- blanket features the significance of such features can be calculated using permutation testing we can gain more information about causality via time-series and interventions