ECE539 final project Instructor: Yu Hen Hu Fall 2005

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Pattern Recognition and Machine Learning

AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

EE-148 Expectation Maximization Markus Weber 5/11/99.

Principal Component Analysis

L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

Dimensional reduction, PCA

Speaker Adaptation for Vowel Classification

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Optimal Adaptation for Statistical Classifiers Xiao Li.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing.

A split-and-merge framework for 2D shape summarization D. Gerogiannis, C. Nikou and A. Likas Department of Computer Science, University of Ioannina, Greece.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Data Mining and Decision Support

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

CS479/679 Pattern Recognition Dr. George Bebis

Mining Utility Functions based on user ratings

Chapter 3: Maximum-Likelihood Parameter Estimation

LECTURE 11: Advanced Discriminant Analysis

Background on Classification

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

School of Computer Science & Engineering

LECTURE 10: DISCRIMINANT ANALYSIS

Statistical Models for Automatic Speech Recognition

Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:

Machine Learning Basics

Clustering Evaluation The EM Algorithm

Latent Variables, Mixture Models and EM

Unsupervised-learning Methods for Image Clustering

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

REMOTE SENSING Multispectral Image Classification

REMOTE SENSING Multispectral Image Classification

Probabilistic Models with Latent Variables

Statistical Models for Automatic Speech Recognition

SMEM Algorithm for Mixture Models

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Machine Learning Math Essentials Part 2

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

EE513 Audio Signals and Systems

Generally Discriminant Analysis

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 09: DISCRIMINANT ANALYSIS

Principal Component Analysis

Lecture 16. Classification (II): Practical Considerations

Hairong Qi, Gonzalez Family Professor

Hairong Qi, Gonzalez Family Professor

Presentation transcript:

Knowledge Mining and Soil Mapping using Maximum Likelihood Classifier with Gaussian Mixture Models ECE539 final project Instructor: Yu Hen Hu Fall 2005 Good morning, everyone. My name is LI Xiao-Bing. I am a Ph.D. candidate from University of Science and Technology of China. Now let me show you my research activities. Jian Liu 12/13/2005

Overview This study deals with data mining from soil survey maps and soil mapping with mined soil-landscape knowledge. I’ll give the details of our proposed approaches for redundancy reduction of acoustic model, and my other research work.

Soil – landscape models Soil is a product of the interaction of surrounding environments “soil-landscape model” (Hudson, 1992) Soil can be predicated given the environments

Environmental variables Environmental factors affecting soil formation: Bedrock geology Elevation (DEM) Slope gradient 1st derivative along the steepest slope Profile curvature 2nd derivative along the steepest slope Planform curvature 2nd derivative perpendicular to contour lines

Previous Approaches & Problems Fuzzy system (Zhu 2001) Elicits knowledge from a soil scientist and represents it with arbitrary curves Assumes independence of each environmental variable ANN (Zhu 2000; Behrens 2005; Scull 2005 ) Black box knowledge representation High dimensional matrix is hard to comprehend Decision trees (Bui, 1999; Qi et.al. 2003) Knowledge extracted is crisp (typical case), no information about gradation

Proposal – Knowledge Representation GMM representation is more suitable because: Probability representation well captures the physical gradation of the phenomenon The interactions between environmental variables are taken into account by the multivariate Gaussian distribution Mixture model gives a great potential of capturing the real distribution Physically a soil type may have multiple instances.

Proposal – Maximum Likelihood Classifier P(A|Class1) = 0.8 P(A|Class2) = 0.5 A then is classified into class1 based on “Maximum likelihood” Naturally evaluates the composite effect environmental variables have on the probability of soil formation

Algorithm Training procedure: Testing procedure: Standardize feature dimensions of training set For each geology group in the training data For each soil type in the geology group Fit a GMM using EM algorithm (# of mixtures is preset, k-means is used to initialize the cluster centers) Testing procedure: Standardize feature dimensions of testing set For each sample point For each class in the corresponding geology group Calculate the corresponding likelihood based on GMM The point is classified to the class with the maximum likelihood

Case Study … Training set Testing set elevation slope gradient profile curvature planform curvature geology soil map Testing set Our objective is to reduce the model size by reducing the redundancy of acoustic model. We proposed the following approaches to reach the goal. The first one is the MCE-based dimensionality reduction to reduce the number of feature dimensions. The second one is the optimal clustering and non-uniform allocation of Gaussian kernels in feature dimension to reduce the number of feature-level parameters. The third one is the state divergence-based determination of the number of Gaussian components of each state to reduce the number of components. … elevation soil map geology

Evaluation of the GMM representation The GMM representations well capture the gradation of soil on the landscape, which complies well with expert knowledge e.g. Council at footslope e.g. Elbaville at backslope Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.

Training accuracy & testing accuracy Overall, 80% classification accuracy against testing data Increasing number of mixtures leads to higher classification accuracy at an expense of exponentially increasing storage and computational load classification accuracy (%) geology area 1 geology area 2 # of mixtures training testing 1 70.04 68.07 79.80 77.13 2 76.66 74.50 78.99 76.84 4 81.51 79.27 80.03 75.55 8 83.17 80.12 84.07 79.23 Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.

Classification Accuracy vs. # of Mixtures Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.

Mapping accuracy based on field data 64 points are correctly classified out of 83 field sample points (77%), higher than traditional manual based soil survey (usually 60%) Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework. Classification result using 8 mixtures (the dark blue areas are not mapped)

More comments Standardization of feature dimensions is very effective, -- improves mapping accuracy from 55% to 80% Preprocessing techniques such as data cleaning required by decision tree is not critical to ML because the ML classifier is not as sensitive to training errors as long as they are not of a huge amount. Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.

Conclusion GMM is suitable to represent soil-landscape knowledge ML classifier with GMMs is promising for soil knowledge mining and soil mapping Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.

Future improvement? Reduce the storage and computational load so that bigger number of mixtures can be used to improve classification accuracy Use diagonal matrix to replace full covariance matrix (after applying de-correlation to the features)? Let’s go to the first method, the MCE-based dimensionality reduction. Usually the dimensionality reduction is performed by LDA transformation which separates classes through maximizing the ratio of between-class scatter matrix and within-class scatter matrix. But is has little direct relation with the classifier’s target of minimum recognition error rate. So it results in recognition performance degradation. As MCE criterion can adjust the parameters to achieve minimum recognition error, we proposed to adjust the LDA transformation and the classification parameters simultaneously according to the MCE criterion in the DFE framework.