Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.

Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong

Huang,Kaizhu Basic Problem Given a dataset {(X 1, C), (X 2, C),…,(X N-1, C), (X N, C)} Here X i stands for the training data,C stands for the class label,assuming we have m classes, We estimate the probability P(C i |X), i=1,2,…,m (1) The classifier is then denoted by: The key point is How we can estimate the posterior probability (1)?

Huang,Kaizhu Density estimation problem Given a dataset D {X 1, X 2,…,X N-1, X N, } Where X i is a instance of a m-variable vector {v 1, v 2,… v m-1, v m } The goal is to find a joint distribution P(v 1, v 2,… v m-1, v m ) which can maximize the negative entropy

Huang,Kaizhu Interpretation According to information theory,This measures how many bites are needed to described D based on the probability distribution P.To find the maximum P,it is actually to find the well- know MDL (minimal description length)

Huang,Kaizhu Graphical Density estimation Naive Bayesian Network (NB) Tree augmented Naive Network(TANB) Chow-Liu tree network(CL) Mixture of tree network(MT)

Huang,Kaizhu NBTANB Chow-Liu Mixture of tree EM

Huang,Kaizhu Naive Bayesian Network Given a problem in Slide 3,we make the following assumption about the variables: All the variables are independent,given the class label Then the joint distribution P can be written as:

Huang,Kaizhu Structure of NB 1.With this structure,it is easy to estimate the joint distribution since we can obtain theby the following easy accumulation

Huang,Kaizhu Chow-Liu Tree network Given a problem in Slide 3,we make the following assumption about the variables : Each variable has direct dependence relationship with just one other variable and is conditional independent with other variables,given the class label. Thus the joint distribution can be written into :

Huang,Kaizhu Example of CL methods Fig2 is an example of CL tree where 1.v3 is just conditional dependent on v4,and conditional independent on other variables P(v3|v4,B)=P(v3|v4) 2.v5 is just conditional dependent on v4, and conditional independent on other variables P(v5|v4,B)=P(v5|v4) 3.v2 is just conditional dependent on v3, and conditional independent on other variables P(v2|v3,B)=P(v2|v3) 4.v1 is just conditional dependent on v3,and and conditional independent on other variables P(v1|v3,B)=P(v1|v3)

Huang,Kaizhu

CL Tree The key point about CL tree is that: We use a multiplication of 2-dimension variable distributions to approximate the high-dimension distributions. Then how can we find the best multiplication of 2- dimension variable distributions to approximate the high- dimension distributions optimally

Huang,Kaizhu CL tree algorithm 1.Obtaining P(v i |v j ), P(v i,v j ) for each pair of (v i,v j ) by accumulating process. 2.Calculating the mutual entropy 3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes v i,v j is I((v i,v j ) This CL algorithm was proved to be optimal in [1]

Huang,Kaizhu Mixture of tree (MT) A mixture of tree model is defined to be a distribution of the form: A MT can be viewed as containing a unobserved choice variable z,which takes values k{1,… }

Huang,Kaizhu

Difference between MT & CL z can be any integer variable, especially when unobserved variable z is the class label,the MT is changed into the multi-CL tree CL is a supervised learning algorithm,which has to be trained each tree for each class MT is a unsupervised learning algorithm,which considers the class variable as the training data

Huang,Kaizhu Optimization problem of MT Given a data set of observations We are required to find the mixture of tree Q that satisfies This optimization problem in mixture model can be solved by EM(Expectation Maximizing) methods

Huang,Kaizhu

We maximize (7) with respect k and T k with the constraint We can obtain the update equation: As for the second term of (7), In fact it is a CL procedure,so we can maximize it by finding a CL tree based on

Huang,Kaizhu MT in classifiers 1. In training phrase. Train the MT model on the training data domain {c}V,C is the class label, V is the input domain 2. In testing phrase a new instance xV is classified by picking the most likely value of the class variable given the settings of the other variables:

Huang,Kaizhu Multi-CL in handwritten digit recognition 1. Feature extraction The four basic configurations above are rotated in four cardinal directions and applied to the characters in the six overlapped zones shown in the following

Huang,Kaizhu Multi-CL in handwritten digit recognition So we have 4*4*6=96 dimension features

Huang,Kaizhu Multi-CL in handwritten digit recognition 1.For a given pattern,we calculate the probabilities this pattern belongs to each class(for digit we have 10 class,0,1,2,…9) 2.We choose the maximum class probability as the classification result Here the probability the pattern “2” belongs to class 2 is the maxim,so we classified it as digit 2

Huang,Kaizhu Discussion 1. When all of the component trees are in the same structure,the MT becomes a TANB mode. 2. When z is class label,MT becomes a CL mode 3.The MT is a general mode of CL and naive bayesian mode.So the performance is expected to be better than NB, TANB,CL

Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.

Similar presentations

Presentation on theme: "Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.

Similar presentations

Presentation on theme: "Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback