Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Similar presentations


Presentation on theme: "Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory."— Presentation transcript:

1 Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICANN&ICONIP2003, June, 2003 Istanbul, Turkey

2 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 2 Outline  Abstract  Background Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree  Bounded Semi-Naïve Bayesian Classifiers  Mixture of Bounded Semi-Naïve Bayesian Classifiers  Experimental Results  Discussion  Conclusion

3 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 3 Abstract  Propose a technique for constructing semi-naïve Bayesian classifiers. It is bounded by the number of variables that can be combined into a node. It has a less computational cost than the traditional semi-naïve Bayesian networks. Experiments show the proposed technique is more accurate.  Upgrade the Semi-Naïve structure into a mixture structure The expression power is increased Experiments show the mixture approach outperforms other types of classifiers

4 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 4 A Typical Classification Problem  Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

5 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 5  Probabilistic Classifiers The classification mapping function is defined as: The joint probability is not easily estimated from the dataset; Usually, the assumption about the distribution has to be made, e.g., dependent or independent? a constant for a given x w.r.t. c l Background Posterior probability Joint probability

6 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 6  Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes are independent: Classification mapping function Related Work (1)

7 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 7 Related Work  Naïve Bayesian Classifiers NB’s performance is comparable with some state- of-the-art classifiers even when its independency assumption does not hold in normal cases.  Question: Can the performance be better when the conditional independency assumption of NB is relaxed ?

8 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 8  Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables, given the class label C. Related Work

9 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 9 A tree dependence structure Related Work  Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables, given the class variable C.

10 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 10 A conditional tree dependency assumption among variables A conditional independency assumption among jointed variables Chow & Liu68 developed a global optimal and polynomial time cost algorithm Traditional SNBs are not well developed like CLT Summary of Related Work

11 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 11 Kononenko91Pazzani96 Local heuristic Efficient? Accurate? No Inefficient even in jointing 3 variables No Exponential time cost Problems of Traditional SNBs Yes Semi- dependence does not hold in real cases as well Strong Assumption?

12 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 12 Our Solution  Bounded Semi-Naïve Bayesian Network(B- SNB) Accurate? We use a global combinatorial optimization method. Efficient? We find the network based on Linear Programming, which can be solved in polynomial time.  Mixture of B-SNB (MBSNB) Strong assumption? Mixture structure is a superclass of B-SNB

13 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 13 Our Solution Improved significantly

14 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 14  Jointed variables  Completely covering the variable set without overlapping  Conditional independency  Bounded Bounded Semi-Naïve Bayesian Network Model Definition

15 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 15  Large search space  Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K Hidden principle:  When K is small, a K cardinality of jointed variables will be more accurate than separating them into several jointed variables.  Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).  Search space after reduction: Constraining the Search Space

16 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 16  How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables) from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood. [x] means rounding the x to the nearest integer Searching K-Bounded-SNB Model

17 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 17 Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem No coverage among jointed variables All the jointed variables forms the variable set Rounding Scheme: Rounding LP solution into an IP Solution. Rounding Scheme: Rounding LP solution into an IP Solution. Global Optimization Procedure

18 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 18 Mixture Upgrading (using EM) E STEP M STEP, update S k dby B-SNB method

19 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 19 Experimental Setup  Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”  Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1

20 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 20 Overall Prediction Rate(%) We set the bound parameter K to 2 and 3. 2-BSNB means the BSNB model for bounded parameter set to 2. Experimental Results

21 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 21 NB vs MBSNB

22 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 22 BSNB vs MBSNB

23 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 23 CLT vs MBSNB

24 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 24 C4.5 vs MBSNB

25 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 25 Average Error Rate Average Error Rate Chart

26 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 26 Observations  Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy decreases.  Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias; K=3, the accuracy increases.

27 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 27 Discussion  When n cannot be divided by K exactly (n mod K)=l, l  0, The assumption that all the joined variable has the same cardinality K will be violated. Solution:  Find an l-cardinality jointed variable with the minimum entropy  Do the optimization on the other n-l variables since (n-l mod K) will be 0.  How to choose K ? When the sample number of the dataset is small, a large K may not get a good performance. A good K should be related to the nature of the datasets. A natural way is to use the cross validation methods to find the optimal K.

28 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 28 Conclusion  A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables B-SNB to have global optimization. The transformation from IP into an LP problem reduces the computational complexity into a polynomial one.  A Mixture of BSNB is developed Expand the expression power of B-SNB Experimental results show the mixture approach outperforms other types of classifiers.

29 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 29 Main References  Chow, C. K. and Liu, C.N., Approximating discrete probability distributions with dependence trees. In IEEE Trans. on Information Theory, Pages 462-467, Vol.14, 1968.  I. Kononenko. Semi-naive Bayesian classier. In Proceedings of sixth European Working Session on Learning, pages 206-219. Springer-Verlag, 1991.  M.J.Pazzani. Searching dependency in Bayesian classifiers. In D. Fisher and H.-J. Lenz, editors, Learning from data: Artificial intelligence and statistics V, pages 239-248. New York, NY:Springer-Verlag, 1996.  Nathan Srebro. Maximum likelihood bounded tree-width Markov networks, MIT Master thesis, 2001.  Patrick M. Murphy. UCI repository of machine learning databases. In ftp.ics.uci.edu: pub/machine-learning-databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.  Thanks!

30 ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab 30  Thank you!


Download ppt "Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory."

Similar presentations


Ads by Google