Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Slides:

Advertisements

Similar presentations

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA IDSIA Galleria.

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

ECG Signal processing (2)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Fast Algorithms For Hierarchical Range Histogram Constructions

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

Pattern Recognition and Machine Learning

Supervised Learning Recap

Minimum Redundancy and Maximum Relevance Feature Selection

Visual Recognition Tutorial

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Lecture 5: Learning models using EM

Variations of Minimax Probability Machine Huang, Kaizhu

Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.

Machine Learning CMPT 726 Simon Fraser University

Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.

Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.

Visual Recognition Tutorial

Experimental Evaluation

Scalable Text Mining with Sparse Generative Models

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian network classifiers Huang, Kaizhu Sept.25, 2002 Huang, Kaizhu Sept.25, 2002.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez

Crash Course on Machine Learning

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1...

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

Study of Bayesian network classifier Huang Kaizhu Huang Kaizhu Supervisors: Prof. Irwin King Supervisors: Prof. Irwin King Prof. Lyu Rung Tsong Michael.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,

Slides for “Data Mining” by I. H. Witten and E. Frank.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Lecture 2: Statistical learning primer for biologists

Ensemble Methods in Machine Learning

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

NTU & MSRA Ming-Feng Tsai

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Data Driven Resource Allocation for Distributed Learning

Probability Theory and Parameter Estimation I

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Discriminative Training of Chow-Liu tree Multinet Classifiers

Data Mining Lecture 11.

Model Averaging with Discrete Bayesian Network Classifiers

Machine Learning: Lecture 6

Machine Learning: UNIT-3 CHAPTER-1

Three steps are separately conducted

Machine Learning: Lecture 5

Presentation transcript:

Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, ICANN&ICONIP2003, June, 2003 Istanbul, Turkey

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 2 Outline  Abstract  Background Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree  Bounded Semi-Naïve Bayesian Classifiers  Mixture of Bounded Semi-Naïve Bayesian Classifiers  Experimental Results  Discussion  Conclusion

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 3 Abstract  Propose a technique for constructing semi-naïve Bayesian classifiers. It is bounded by the number of variables that can be combined into a node. It has a less computational cost than the traditional semi-naïve Bayesian networks. Experiments show the proposed technique is more accurate.  Upgrade the Semi-Naïve structure into a mixture structure The expression power is increased Experiments show the mixture approach outperforms other types of classifiers

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 4 A Typical Classification Problem  Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 5  Probabilistic Classifiers The classification mapping function is defined as: The joint probability is not easily estimated from the dataset; Usually, the assumption about the distribution has to be made, e.g., dependent or independent? a constant for a given x w.r.t. c l Background Posterior probability Joint probability

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 6  Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes are independent: Classification mapping function Related Work (1)

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 7 Related Work  Naïve Bayesian Classifiers NB’s performance is comparable with some state- of-the-art classifiers even when its independency assumption does not hold in normal cases.  Question: Can the performance be better when the conditional independency assumption of NB is relaxed ?

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 8  Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables, given the class label C. Related Work

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 9 A tree dependence structure Related Work  Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables, given the class variable C.

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 10 A conditional tree dependency assumption among variables A conditional independency assumption among jointed variables Chow & Liu68 developed a global optimal and polynomial time cost algorithm Traditional SNBs are not well developed like CLT Summary of Related Work

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 11 Kononenko91Pazzani96 Local heuristic Efficient? Accurate? No Inefficient even in jointing 3 variables No Exponential time cost Problems of Traditional SNBs Yes Semi- dependence does not hold in real cases as well Strong Assumption?

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 12 Our Solution  Bounded Semi-Naïve Bayesian Network(B- SNB) Accurate? We use a global combinatorial optimization method. Efficient? We find the network based on Linear Programming, which can be solved in polynomial time.  Mixture of B-SNB (MBSNB) Strong assumption? Mixture structure is a superclass of B-SNB

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 13 Our Solution Improved significantly

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 14  Jointed variables  Completely covering the variable set without overlapping  Conditional independency  Bounded Bounded Semi-Naïve Bayesian Network Model Definition

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 15  Large search space  Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K Hidden principle:  When K is small, a K cardinality of jointed variables will be more accurate than separating them into several jointed variables.  Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).  Search space after reduction: Constraining the Search Space

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 16  How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables) from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood. [x] means rounding the x to the nearest integer Searching K-Bounded-SNB Model

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 17 Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem No coverage among jointed variables All the jointed variables forms the variable set Rounding Scheme: Rounding LP solution into an IP Solution. Rounding Scheme: Rounding LP solution into an IP Solution. Global Optimization Procedure

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 18 Mixture Upgrading (using EM) E STEP M STEP, update S k dby B-SNB method

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 19 Experimental Setup  Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”  Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 20 Overall Prediction Rate(%) We set the bound parameter K to 2 and 3. 2-BSNB means the BSNB model for bounded parameter set to 2. Experimental Results

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 21 NB vs MBSNB

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 22 BSNB vs MBSNB

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 23 CLT vs MBSNB

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 24 C4.5 vs MBSNB

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 25 Average Error Rate Average Error Rate Chart

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 26 Observations  Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy decreases.  Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias; K=3, the accuracy increases.

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 27 Discussion  When n cannot be divided by K exactly (n mod K)=l, l  0, The assumption that all the joined variable has the same cardinality K will be violated. Solution:  Find an l-cardinality jointed variable with the minimum entropy  Do the optimization on the other n-l variables since (n-l mod K) will be 0.  How to choose K ? When the sample number of the dataset is small, a large K may not get a good performance. A good K should be related to the nature of the datasets. A natural way is to use the cross validation methods to find the optimal K.

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 28 Conclusion  A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables B-SNB to have global optimization. The transformation from IP into an LP problem reduces the computational complexity into a polynomial one.  A Mixture of BSNB is developed Expand the expression power of B-SNB Experimental results show the mixture approach outperforms other types of classifiers.

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 29 Main References  Chow, C. K. and Liu, C.N., Approximating discrete probability distributions with dependence trees. In IEEE Trans. on Information Theory, Pages , Vol.14,  I. Kononenko. Semi-naive Bayesian classier. In Proceedings of sixth European Working Session on Learning, pages Springer-Verlag,  M.J.Pazzani. Searching dependency in Bayesian classifiers. In D. Fisher and H.-J. Lenz, editors, Learning from data: Artificial intelligence and statistics V, pages New York, NY:Springer-Verlag,  Nathan Srebro. Maximum likelihood bounded tree-width Markov networks, MIT Master thesis,  Patrick M. Murphy. UCI repository of machine learning databases. In ftp.ics.uci.edu: pub/machine-learning-databases. mlearn/MLRepository.html.  Thanks!

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 30  Thank you!