Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Slides:

Advertisements

Similar presentations

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

Assuming normally distributed data! Naïve Bayes Classifier.

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Lecture 5: Learning models using EM

Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.

Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.

Learning Bayesian Networks

Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University

Probabilistic methods for phylogenetic trees (Part 2)

Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Ensemble Learning (2), Tree and Forest

Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)

1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.

Bayes Net Perspectives on Causation and Causal Inference

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Motif finding with Gibbs sampling CS 466 Saurabh Sinha.

Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 25  Last time Decision trees and decision networks  Today wrap up.

Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Learning Sequence Motif Models Using Expectation Maximization (EM)

Data Mining Lecture 11.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Networks: Motivation

Artificial Intelligence Chapter 20 Learning and Acting with Bayes Nets

Chapter 20. Learning and Acting with Bayes Nets

Parametric Methods Berlin Chen, 2005 References:

Machine Learning: Lecture 6

Machine Learning: UNIT-3 CHAPTER-1

Sequential Learning with Dependency Nets

Presentation transcript:

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks is well-defined and finite. Unfortunately, it is super-exponential in the number of variables. We can define a transition function between states (network structures), such as adding an arc, deleting an arc, or changing the

Learning Structure (Continued) (Continued)… direction of an arc. For each state (structure), we take our best guess of the CPTs given the data as before. We define the score of the network to be either the probability of the data given the network (maximum likelihood framework) or the posterior probability of the network (the product of the prior probability of the

Learning Structure (Continued) (Continued)… network and the probability of the data given the network, normalized over all possible networks). Given a state space with transition function and scoring function, we now have a traditional AI search space to which we can apply greedy hill-climbing, randomized walks with multiple restarts, or a variety of

Learning Structure (Continued) (Continued)… other heuristic search techniques. The balance of opinion currently appears to favor greedy hill-climbing search for this applications, but search techniques for learning Bayes Net structure are wide open for further research -- nice thesis task.

Structural Equivalence Independence Equivalent: 2 structures are independence equivalent if they encode the same conditional independence relations. Distribution Equivalent with respect to a family of CPT formats: 2 structures are equivalent if they represent the same sets of possible distributions. Likelihood Equivalent:the data does not help discriminate between the 2 structures.

One Other Key Point The previous discussion assumes we are going to make a prediction based on the best (e.g., MAP or maximum likelihood) single hypothesis. Alternatively, we could make avoid committing to a single Bayes Net. Instead we could compute all Bayes Nets, and have a probability for each. For any new query

One Other Key Point (Continued) (Continued)… we could calculate the prediction of every network. We could then weigh each network’s prediction by the probability that it is the correct network (given our previous training data), and go with the highest scoring prediction. Such a predictor is the Bayes-optimal predictor.

Problem with Bayes Optimal Because there are a super-exponential number of structures, we don’t want to average over all of them. Two options are used in practice: Selective model averaging:just choose a subset of “best” but “distinct” models (networks) and pretend it’s exhaustive. Go back to MAP/ML (model selection).

Example of Structure Learning: Modeling Gene Expression Data Expression of a gene: making from the gene the protein for which it codes (involves transcription and translation). Can estimate expression by transcription (amount of mRNA made from the gene). DNA hybridization arrays: “chips” that simultaneously measure the levels at which all genes in a sample are expressed.

Importance of Expression Data Often the best clue to a disease or measurement of successful treatment is the degree to which genes are expressed. Such data also gives insight into regulatory networks among genes (one gene may code for a protein that affects another’s expression rate). Can get snapshots of global expression levels.

Modeling Expression Data by Learning a Bayes Net We can model the effects of genes on others by learning a Bayes Net (both structure and CPTs. Friedman et al. do so. See associated figure. Expression of gene E might promote expression of B but expression of A might inhibit B. The facts that E and A directly influence B are captured by the network structure, and the

Modeling Gene Expression Data (Continued) (Continued)… fact that E promotes while A inhibits is captured in the CPT for B given its parents. B directly influences C according to the network, but E and A influence C only indirectly via B.

Decisions Made in this Application Use selective model averaging. Learn multiple models via bootstrapping. (Randomly sample with replacement from the original data set -- each run is with a different random sample of the original data.) Average simply by extracting common features: Is Y in the Markov Blanket of X? Is Y an ancestor of X?

Decisions (Continued) Use Independence Equivalence (two models are equivalent if and only if they encode the same conditional independence information. Must choose a prior distribution over structures. Choose one such that (1) equivalent structures have equivalent scores, and (2) scores are decomposable (score is the sum of the scores of each node, which depend only on node and its parents).

Decisions (Continued) Use arc insertion, arc deletion, or arc direction switch as the transition function. Use greedy hill-climbing as the search strategy, with the following (also greedy) additional heuristic. Sparse candidate algorithm: consider only networks in which the parents of X are nodes that have a “high” correlation with X

Decisions (Continued) (Continued)… when taken individually. Note the similarity of this heuristic with the greedy heuristic in decision tree learning. This means that probabilistic versions of exclusive-OR (for example) will cause problems.