Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines
Lecture 3 Nonparametric density estimation and classification
Chapter 4: Linear Models for Classification
Classification and Decision Boundaries
Linear Discriminant Functions
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensembles of Classifiers Evgueni Smirnov
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Optimal Bayes Classification
Non-Bayes classifiers. Linear discriminants, neural networks.
Berkay Topcu Sabancı University, 2009
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Linear Models for Classification
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Linear Classifier Team teaching.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Lecture 15. Pattern Classification (I): Statistical Formulation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
COSC 4335: Other Classification Techniques
Generally Discriminant Analysis
Nonparametric density estimation and classification
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang A New Fuzzy Stacked Generalization Technique and Analysis of its Performance Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang

Stacked Generalization http://www.scholarpedia.org/article/Ensemble_learning

Black art problem SG usually boosts the classification result. SG may get worse than that of the individual classifiers in some other cases. Selection of the parameters of the SG, and designing classifiers and feature spaces have been considered as a black art.

Main idea Fuzzy Stacked Generalization (FSG) technique was introduced to resolve the black art problem for the minimization of classification error of the nearest neighbor rule.

Highlights of ‘hierarchical distance learning’ Minimize the error difference between N-sample and large-sample error rather than just minimize large-sample error. Use class posterior probabilities of the samples as feature vectors for meta-classifier input rather than a discriminative transformation Different feature space mappings can be employed in different classifiers in the ensemble

Asymptotic convergence of nearest neighbor (NN) classifier

Optimal distance measure for nearest neighbor classifier Misclassifying probability for two-class problem w1 and w0 denotes the class label, x0 is the new sample to be classified and Xnn is its nearest neighbor

Optimal distance measure for nearest neighbor classifier N is large enough Asymptotical misclassification rate Increasing the number of data sample vectors results in a decrease in the difference, However, this same effect may be realized by the proper choice of d(x, .), without the added cost of more samples Difference between N-sample and asymptotical as x0 is fixed Minimize Minimize

Optimal distance measure for nearest neighbor classifier Taylor series approximation on local region of x0 The optimal distance function should find the XNN that minimize the difference. If choose the mentioned distance function, it is guaranteed that the difference will be the minimal among local regions of x0 Best local distance measure

N-sample and large-sample classification errors of NN The classification error rate of Bayes classifier is defined as Why min on c, it is meaningless if you take a summation of all possible c ? s is sample; xj is a feature vector representation of s; Wc is the class label; L is the loss function Expected error is defined as

N-sample and large-sample classification errors of NN The probability of error is given by The asymptotic error

N-sample and large-sample classification errors of NN Bayes error Asymptotic error k-NN error Or 2e(1-e) ? Main goal is to minimize the error difference

N-sample and large-sample classification errors of NN Should be Ec, by taylor approximation, either increase N or design a distance function can decrease the error function typo Minimizing the error difference is equivalent to minimize the error function

A proper distance function Where

Averaged error Minimize: Each feature dimension will have a classifier, an averaged error over an ensemble of classifiers is defined as: Minimize:

Hierarchical decision fusion algorithm Estimate posterior probability using individual k-NN as base-layer classifier Concatenate probability estimate vector from each classifier (feature dimension) One k-NN based on is used as Meta-layer classifier

Fuzzy k-NN The classification algorithms that implement fuzzy rules usually outperforms the algorithms that implement crisp rules. However, the effect of the employment of fuzzy rules to the classification performance of SG is given as an open problem

Estimate posterior probability in fuzzy k-NN parameter

Benefit Split the feature and assign individual classifier may increase the diversity the classifier Concatenating on the fusion space may improve the overall performing at the cost of dimensionality boosting (CJ).If C is too large, this will affect its performance. Too many parameters, no generalized framework FSG establish the balance between dimensionality curse and the information deficiency by using individual feature and classifier rather than concatenating all the features and do feature selection or normalization in a single classifier. And no matter how high the dimension of the base-layer classifier, the fusion space dimension of will always be CJ. This can be a partial solution to dimensionality curse.

Performance on artificial dataset Datasets are constructed in such a way that each sample is correctly recognized by at least one of the base-layer classifiers

Performance on artificial dataset 90% of the samples are correctly classified by at least one of the base-layer classifiers

For instance, the sample marked with red star s1 is misclassified by the first classifier as shown in Fig. 2.a, but correctly classified by the second classifier as shown in Fig. 2.b. In addition, the feature of the sample marked with blue star s2 is correctly classified by the first classifier as shown in Fig. 2.a, but misclassified by the second classifier as shown in Fig. 2.b.

The concatenation operation creates a 4 (2×2) dimensional fusion space at the meta-layer input feature space, i.e. fusion space.

Experiments on multi-attribute datasets Due to dimensionality curse, the performance gain of FSG compared to the classification performances of the aforementioned algorithms decreases as A increases

Experiments on multi-feature datasets

The End