Download presentation
Presentation is loading. Please wait.
1
Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong
2
Huang,Kaizhu Basic Problem Given a dataset {(X 1, C), (X 2, C),…,(X N-1, C), (X N, C)} Here X i stands for the training data,C stands for the class label,assuming we have m classes, We estimate the probability P(C i |X), i=1,2,…,m (1) The classifier is then denoted by: The key point is How we can estimate the posterior probability (1)?
3
Huang,Kaizhu Density estimation problem Given a dataset D {X 1, X 2,…,X N-1, X N, } Where X i is a instance of a m-variable vector {v 1, v 2,… v m-1, v m } The goal is to find a joint distribution P(v 1, v 2,… v m-1, v m ) which can maximize the negative entropy
4
Huang,Kaizhu Interpretation According to information theory,This measures how many bites are needed to described D based on the probability distribution P.To find the maximum P,it is actually to find the well- know MDL (minimal description length)
5
Huang,Kaizhu Graphical Density estimation Naive Bayesian Network (NB) Tree augmented Naive Network(TANB) Chow-Liu tree network(CL) Mixture of tree network(MT)
6
Huang,Kaizhu NBTANB Chow-Liu Mixture of tree EM
7
Huang,Kaizhu Naive Bayesian Network Given a problem in Slide 3,we make the following assumption about the variables: All the variables are independent,given the class label Then the joint distribution P can be written as:
8
Huang,Kaizhu Structure of NB 1.With this structure,it is easy to estimate the joint distribution since we can obtain theby the following easy accumulation
9
Huang,Kaizhu Chow-Liu Tree network Given a problem in Slide 3,we make the following assumption about the variables : Each variable has direct dependence relationship with just one other variable and is conditional independent with other variables,given the class label. Thus the joint distribution can be written into :
10
Huang,Kaizhu Example of CL methods Fig2 is an example of CL tree where 1.v3 is just conditional dependent on v4,and conditional independent on other variables P(v3|v4,B)=P(v3|v4) 2.v5 is just conditional dependent on v4, and conditional independent on other variables P(v5|v4,B)=P(v5|v4) 3.v2 is just conditional dependent on v3, and conditional independent on other variables P(v2|v3,B)=P(v2|v3) 4.v1 is just conditional dependent on v3,and and conditional independent on other variables P(v1|v3,B)=P(v1|v3)
11
Huang,Kaizhu
12
CL Tree The key point about CL tree is that: We use a multiplication of 2-dimension variable distributions to approximate the high-dimension distributions. Then how can we find the best multiplication of 2- dimension variable distributions to approximate the high- dimension distributions optimally
13
Huang,Kaizhu CL tree algorithm 1.Obtaining P(v i |v j ), P(v i,v j ) for each pair of (v i,v j ) by accumulating process. 2.Calculating the mutual entropy 3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes v i,v j is I((v i,v j ) This CL algorithm was proved to be optimal in [1]
14
Huang,Kaizhu Mixture of tree (MT) A mixture of tree model is defined to be a distribution of the form: A MT can be viewed as containing a unobserved choice variable z,which takes values k{1,… }
15
Huang,Kaizhu
16
Difference between MT & CL z can be any integer variable, especially when unobserved variable z is the class label,the MT is changed into the multi-CL tree CL is a supervised learning algorithm,which has to be trained each tree for each class MT is a unsupervised learning algorithm,which considers the class variable as the training data
17
Huang,Kaizhu Optimization problem of MT Given a data set of observations We are required to find the mixture of tree Q that satisfies This optimization problem in mixture model can be solved by EM(Expectation Maximizing) methods
18
Huang,Kaizhu
20
We maximize (7) with respect k and T k with the constraint We can obtain the update equation: As for the second term of (7), In fact it is a CL procedure,so we can maximize it by finding a CL tree based on
21
Huang,Kaizhu MT in classifiers 1. In training phrase. Train the MT model on the training data domain {c}V,C is the class label, V is the input domain 2. In testing phrase a new instance xV is classified by picking the most likely value of the class variable given the settings of the other variables:
22
Huang,Kaizhu Multi-CL in handwritten digit recognition 1. Feature extraction The four basic configurations above are rotated in four cardinal directions and applied to the characters in the six overlapped zones shown in the following
23
Huang,Kaizhu Multi-CL in handwritten digit recognition So we have 4*4*6=96 dimension features
24
Huang,Kaizhu Multi-CL in handwritten digit recognition 1.For a given pattern,we calculate the probabilities this pattern belongs to each class(for digit we have 10 class,0,1,2,…9) 2.We choose the maximum class probability as the classification result Here the probability the pattern “2” belongs to class 2 is the maxim,so we classified it as digit 2
25
Huang,Kaizhu Discussion 1. When all of the component trees are in the same structure,the MT becomes a TANB mode. 2. When z is class label,MT becomes a CL mode 3.The MT is a general mode of CL and naive bayesian mode.So the performance is expected to be better than NB, TANB,CL
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.