Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

Slides:



Advertisements
Similar presentations
Learning with Missing Data
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
SVM—Support Vector Machines
1/21 Finding Optimal Bayesian Network Structures with Constraints Learned from Data 1 City University of New York 2 University of Helsinki Xiannian Fan.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
Graphical Models - Learning -
CPSC 322, Lecture 16Slide 1 Stochastic Local Search Variants Computer Science cpsc322, Lecture 16 (Textbook Chpt 4.8) February, 9, 2009.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
1 Optimisation Although Constraint Logic Programming is somehow focussed in constraint satisfaction (closer to a “logical” view), constraint optimisation.
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Bayesian belief networks 2. PCA and ICA
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.
Two Approaches to Bayesian Network Structure Learning Goal : Compare an algorithm that learns a BN tree structure (TAN) with an algorithm that learns a.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Structure Learning.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1...
Bayesian Networks Martin Bachler MLA - VO
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Tetris Agent Optimization Using Harmony Search Algorithm
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Crash Course on Machine Learning Part VI Several slides from Derek Hoiem, Ben Taskar, Christopher Bishop, Lise Getoor.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 8, 2000 Jincheng.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Meta-controlled Boltzmann Machine toward Accelerating the Computation Tran Duc Minh (*), Junzo Watada (**) (*) Institute Of Information Technology-Viet.
Bayesian Neural Networks
Latent variable discovery in classification models
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Irina Rish IBM T.J.Watson Research Center
Efficient Learning using Constrained Sufficient Statistics
An Algorithm for Bayesian Network Construction from Data
Overfitting and Underfitting
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Outline Sparse Reconstruction RIP Condition
Learning Bayesian networks
Presentation transcript:

Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

Abstract Learning Bayesian network Search space is extremely large. Optimization problem (in machine learning) Constraint satisfaction (in statistics) Search space is extremely large. Search procedure spends most of times examining extremely unreasonable candidate structures. If we can reduce search space, faster learning will be possible. Some restrictions on candidate parent variables for a variable are given. Bioinformatics

Learning Bayesian Network Structures Constraint satisfaction problem 2-test Optimization problem BDe, MDL Learning is to find the structure maximizes these scores. Search technique Generally NP-hard Greedy hill-climbing, simulated annealing O(n2) If the number of examples and the number of attributes are large, the computational cost is too expensive to get tractable result.

Combining Statistical Properties Most of the candidates considered during the search procedure can be eliminated in advance based on our statistical understanding on the domain If X and Y are almost independent in data, we might decide not to consider Y as a parent of X. Mutual information Restricting the possible parents of each variable (k) k << n – 1 The key idea is to use the network structure found at the last stage to find better candidate parents.

Background A Bayesian network for X = {X1, X2, …, Xn} B = <G, > The problem of learning a Bayesian network Given a training set D = {X1, X2, …, XN}, Find a B that best matches D. BDe, MDL Score(G:D) = iScore(Xi|Pa(Xi):NXi, Pa(Xi)) Greedy hill-climbing search At each step, all possible local change is examined and the change which brings maximal gain in the score is selected. Calculation of sufficient statistics is computational bottle-neck.

Simple Intuitions Using mutual information or correlation If the true structure is X -> Y -> Z, I(X;Z) > 0, I(Y;Z) > 0, I(X;Y) > 0 and I(X;Z|Y) = 0 Basic idea of “Sparse Candidate” algorithm For each variable X, we find a set of variables Y1, Y2, …, Yk that are most promising candidate parents for X. This gives us smaller search space. The main drawback of this idea A mistake in initial stage can lead us to find an inferior scoring network. To iterate basic procedure, using the previously constructed network to reconsider the candidate parents.

Outline of the Sparse Candidate Algorithm

Convergence Properties of the Sparse Candidate Algorithm We require that in Restrict step, the selected candidates for Xi’s parents include Xi’s current parents. PaGn(Xi)  Cin+1 This requirement implies that the winning network Bn is a legal structure in the n + 1 iteration. Score(Bn+1|D)  Score(Bn|D) Stopping criterion Score(Bn) = Score(Bn-1)

Mutual Information Mutual information Example I(A;C) > I (A;D) > I(A;B) B A C D

Discrepancy Test Initial iteration uses mutual information and after this, discrepancy.

Other tests Conditional mutual information Penalizing structures with more parameters

Learning with Small Candidate Sets Standard heuristics Unconstrained Space: O(nCk) Time: O(n2) Constrained by small candidate Space: O(2k) Time: O(kn) Divide and Conquer heuristics

Strongly Connected Components Decomposing H into strongly connected components takes linear time.

Separator Decomposition H’1 H’2 X Y H1 S H2 The bottle-neck is S. We can order the variables in S to disallow any cycle in H1  H2.

Experiments on Synthetic Data

Experiments on Real-Life Data

Conclusions Sparse candidate set enables us to search for good structure efficiently. Better criterion is necessary. The authors applied these techniques to Spellman’s cell-cycle data. Exploiting of network structure to search in H needs to be improved.