Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.

Slides:



Advertisements
Similar presentations
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Advertisements

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Supervised Learning Recap
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Semi-Supervised Learning
Crash Course on Machine Learning
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Bayesian Networks. Male brain wiring Female brain wiring.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
EM and expected complete log-likelihood Mixture of Experts
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
Data Mining Lecture 11.
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Feature space tansformation methods
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
LECTURE 23: INFORMATION THEORY REVIEW
Learning From Observed Data
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong

Huang,Kaizhu Basic Problem Given a dataset {(X 1, C), (X 2, C),…,(X N-1, C), (X N, C)} Here X i stands for the training data,C stands for the class label,assuming we have m classes, We estimate the probability P(C i |X), i=1,2,…,m (1) The classifier is then denoted by: The key point is How we can estimate the posterior probability (1)?

Huang,Kaizhu Density estimation problem Given a dataset D {X 1, X 2,…,X N-1, X N, } Where X i is a instance of a m-variable vector {v 1, v 2,… v m-1, v m } The goal is to find a joint distribution P(v 1, v 2,… v m-1, v m ) which can maximize the negative entropy

Huang,Kaizhu Interpretation According to information theory,This measures how many bites are needed to described D based on the probability distribution P.To find the maximum P,it is actually to find the well- know MDL (minimal description length)

Huang,Kaizhu Graphical Density estimation Naive Bayesian Network (NB) Tree augmented Naive Network(TANB) Chow-Liu tree network(CL) Mixture of tree network(MT)

Huang,Kaizhu NBTANB Chow-Liu Mixture of tree EM

Huang,Kaizhu Naive Bayesian Network Given a problem in Slide 3,we make the following assumption about the variables: All the variables are independent,given the class label Then the joint distribution P can be written as:

Huang,Kaizhu Structure of NB 1.With this structure,it is easy to estimate the joint distribution since we can obtain theby the following easy accumulation

Huang,Kaizhu Chow-Liu Tree network Given a problem in Slide 3,we make the following assumption about the variables : Each variable has direct dependence relationship with just one other variable and is conditional independent with other variables,given the class label. Thus the joint distribution can be written into :

Huang,Kaizhu Example of CL methods Fig2 is an example of CL tree where 1.v3 is just conditional dependent on v4,and conditional independent on other variables P(v3|v4,B)=P(v3|v4) 2.v5 is just conditional dependent on v4, and conditional independent on other variables P(v5|v4,B)=P(v5|v4) 3.v2 is just conditional dependent on v3, and conditional independent on other variables P(v2|v3,B)=P(v2|v3) 4.v1 is just conditional dependent on v3,and and conditional independent on other variables P(v1|v3,B)=P(v1|v3)

Huang,Kaizhu

CL Tree The key point about CL tree is that: We use a multiplication of 2-dimension variable distributions to approximate the high-dimension distributions. Then how can we find the best multiplication of 2- dimension variable distributions to approximate the high- dimension distributions optimally

Huang,Kaizhu CL tree algorithm 1.Obtaining P(v i |v j ), P(v i,v j ) for each pair of (v i,v j ) by accumulating process. 2.Calculating the mutual entropy 3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes v i,v j is I((v i,v j ) This CL algorithm was proved to be optimal in [1]

Huang,Kaizhu Mixture of tree (MT) A mixture of tree model is defined to be a distribution of the form: A MT can be viewed as containing a unobserved choice variable z,which takes values k{1,… }

Huang,Kaizhu

Difference between MT & CL z can be any integer variable, especially when unobserved variable z is the class label,the MT is changed into the multi-CL tree CL is a supervised learning algorithm,which has to be trained each tree for each class MT is a unsupervised learning algorithm,which considers the class variable as the training data

Huang,Kaizhu Optimization problem of MT Given a data set of observations We are required to find the mixture of tree Q that satisfies This optimization problem in mixture model can be solved by EM(Expectation Maximizing) methods

Huang,Kaizhu

We maximize (7) with respect k and T k with the constraint We can obtain the update equation: As for the second term of (7), In fact it is a CL procedure,so we can maximize it by finding a CL tree based on

Huang,Kaizhu MT in classifiers 1. In training phrase. Train the MT model on the training data domain {c}V,C is the class label, V is the input domain 2. In testing phrase a new instance xV is classified by picking the most likely value of the class variable given the settings of the other variables:

Huang,Kaizhu Multi-CL in handwritten digit recognition 1. Feature extraction The four basic configurations above are rotated in four cardinal directions and applied to the characters in the six overlapped zones shown in the following

Huang,Kaizhu Multi-CL in handwritten digit recognition So we have 4*4*6=96 dimension features

Huang,Kaizhu Multi-CL in handwritten digit recognition 1.For a given pattern,we calculate the probabilities this pattern belongs to each class(for digit we have 10 class,0,1,2,…9) 2.We choose the maximum class probability as the classification result Here the probability the pattern “2” belongs to class 2 is the maxim,so we classified it as digit 2

Huang,Kaizhu Discussion 1. When all of the component trees are in the same structure,the MT becomes a TANB mode. 2. When z is class label,MT becomes a CL mode 3.The MT is a general mode of CL and naive bayesian mode.So the performance is expected to be better than NB, TANB,CL