Latent variable discovery in classification models

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Conceptual Clustering
Fast Algorithms For Hierarchical Range Histogram Constructions
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Latent Structure Models and Statistical Foundation for TCM Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science.
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 16: Wrap-Up COMP 538 Introduction of Bayesian networks.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Presented by Zeehasham Rasheed
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Latent Tree Models Part II: Definition and Properties
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Bayes Net Perspectives on Causation and Causal Inference
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Discriminative Local Binary Patterns for Human Detection in Personal Album.
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Other NN Models Reinforcement learning (RL)
Slides for “Data Mining” by I. H. Witten and E. Frank.
Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths.
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 8, 2000 Jincheng.
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Outline Time series prediction Find k-nearest neighbors Lag selection Weighted LS-SVM.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Learning Tree Structures
Bayesian Classification
Data Mining Lecture 11.
Applications of IScore (using R)
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
An Algorithm for Bayesian Network Construction from Data
Markov Random Fields Presented by: Vladan Radosavljevic.
Bayesian Learning Chapter
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Machine Learning: UNIT-3 CHAPTER-1
FDA – A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions BISCuit EDA Seminar
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Intro. to Data Mining Chapter 6. Bayesian.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Latent variable discovery in classification models Nevin L. Zhang, Thomas D. Nielsen, Finn V. Jensen Artificial Intelligence in Medicine 30 (2004) 283–299 Advisor : Professor Chung-Chian Hsu Reporter : Wen-Chung Liao 2006/7/12

Outline Motivation Objectives HNB models Learning HNB models Results Concluding Remarks Personal Comments

Motivation The naive Bayes model makes the often unrealistic assumption that the feature variables are mutually independent given the class variable. Latent variable discovery is especially interesting in medical applications

Objectives Show how latent variables can be detected.

HNB models Theorem 1. Parsimonious HNB models are regular. Hierarchical naive Bayes (HNB) model Tree-shaped M = (m, θ) |Z| Parsimonious models Regular models For a latent variable Z in an HNB model, enumerate its neighbors (parent and children) as Z1, Z2, … , Zk. Theorem 1. Parsimonious HNB models are regular. Theorem 2. The set of all regular HNB models for a given set of class and feature variables is finite. Lemma 1. In a regular HNB model, no two singly connected latent nodes can be neighbors. the Bayesian information criterion, BIC(m|D), denotes the BIC score of a model m given a data set D, then Z2 Z1

Learning HNB models Hill-climbing algorithm A natural search space a search space search operators. A natural search space the set of all regular HNB models for a given set of class and feature variables. Restructure the space into two levels: 1. Given a model structure, find an optimal cardinality for the latent variables. 2. Find an optimal model structure.

Learning HNB models Learning cardinality Learning model structures To search this space, start with the naive Bayes model structure. At each step, modify the current model to construct a number of new model structures. The new structures are then evaluated the best structure is selected to seed the next search step. Three operators parent-introduction, parent-alteration node-deletion Theorem 3. Starting from the naive Bayes model structure, we can reach any regular HNB model structure using parent-introduction and parent-alteration.

Results: synthetic data 1. Can our algorithm discover interesting latent variables? 2. Can our algorithm yield better classifiers than the naive Bayes classifier? Three experiments synthetic data sampled from HNB models. All variables have three states. the strength of correlation between the observed variables and latent variables The strength of correlation among observed variables In each experiment, five training sets Result In all cases, our algorithm recovered the structures of the original models precisely. it correctly detected the latent variables.

a test set of 5000 samples For each of the 5000 records in the test set, we computed the posterior distribution of the class variable given values of the feature variables. The KL-divergence between the distribution in the generative model and that in the learned model was calculated.

Results: Monk data sets Monk’s problems Monk-1, Monk-2, and Monk-3 one binary class variable six features variables a1, …, a6 that have 2-4 possible states. Each problem has a data set of 432 records. Between 30 and 40% of the records were used for training and all records were used in testing. Increase record counts by 3, 9, and 27 times Fig. 7. Structures of HNB models constructed for Monk-1. They match the target concepts nicely. Fig. 9. Structures of HNB models constructed for Monk-3. They match the target concepts nicely.

Results: Other UCI data sets the Wisconsin Breast Cancer data set the Pima Indians Diabetes data set. Those data sets consist of more than 500 records and do not have fewer than 10 feature variables. For the breast cancer data set, features ‘‘uniformity-of-cell-size’’ and ‘‘uniformity-of-cell-shape’’ always share the same (latent) parent. For the diabetes data set, there is always a latent node that is the parent of both ‘‘age’’ and ‘‘number-of-pregnancies’’.

Concluding remarks HNB models as a framework for detecting latent variables in naive Bayes models. A hill-climbing algorithm for inducing HNB models from data has been developed. A major drawback of the algorithm, its high complexity. At each step of the search it generates a set of new candidate models based on the current model, estimates parameters for each of the candidate models using EM, scores them, and pick the best one to seed the next step of search. A way to overcome this difficulty might be to use structural EM .

Personal Comments Applications Advantages Drawbacks Business, medicine, … Advantages Find out many properties of HNB models Drawbacks The search space is still large The algorithm is high complexity The algorithm is ambiguous Improve the algorithm by using GA by finding some properties that can partition the search space .