C OMBINING E NSEMBLE T ECHNIQUE OF S UPPORT V ECTOR M ACHINES WITH THE O PTIMAL K ERNEL M ETHOD FOR H IGH D IMENSIONAL D ATA C LASSIFICATION I-Ling Chen.

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine

Support Vector Machines

Machine learning continued Image source:

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,

Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.

Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Fei Xing1, Ping Guo1,2 and Michael R. Lyu2

A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

An Introduction to Support Vector Machines Martin Law.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.

This week: overview on pattern recognition (related to machine learning)

Efficient Model Selection for Support Vector Machines

Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

1 Particle Swarm Optimization-based Dimensionality Reduction for Hyperspectral Image Classification He Yang, Jenny Q. Du Department of Electrical and Computer.

An Introduction to Support Vector Machines (M. Law)

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.

Linear Models for Classification

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.

An Automatic Method for Selecting the Parameter of the RBF Kernel Function to Support Vector Machines Cheng-Hsuan Li 1,2 Chin-Teng.

Nonparametric Weighted Feature Extraction (NWFE) and Its Kernel-based Version (KNWFE) Bor-Chen Kuo Graduate School of Educational Measurement and Statistics,

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

PREDICT 422: Practical Machine Learning

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Support Vector Machines

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee

Linear Discrimination

Support Vector Machines 2

Presentation transcript:

C OMBINING E NSEMBLE T ECHNIQUE OF S UPPORT V ECTOR M ACHINES WITH THE O PTIMAL K ERNEL M ETHOD FOR H IGH D IMENSIONAL D ATA C LASSIFICATION I-Ling Chen 1, Bor-Chen Kuo 1, Chen-Hsuan Li 2, Chih-Cheng Hung 3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.

Introduction Statement of problems The Objective Literature Review Support Vector Machines –Kernel method Multiple Classifier System –Random subspace method, Dynamic subspace method An Optimal Kernel Method for selecting RBF Kernel Parameter Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work Outline

INTRODUCTION

or so called curse of dimensionality, peaking phenomenon Small sample size, N High dimensionality, d low performance Hughes Phenomenon (Hughes, 1968)

Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998) It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe- Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006) SVM includes Kernel Trick Support Vector Learning Support Vector Machines (SVM)

The Goal of Kernel Method for Classification The samples in the same class can be mapped into the same area. The samples in the different classes can be mapped into the different areas.

SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set. Illustration of SV learning with kernel trick: optimal hyperplane support vectors margins support vector Support Vector Learning nonlinear feature mapping

Multiple Classifier System There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets. (Ho, T. K.,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010) Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons. Approaches to building classifier ensembles.

THE FRAMEWORK OF RANDOM SUBSPACE METHOD (RSM) BASED ON SVM (HO, 1998) Given the learning algorithm, SVM, and the ensemble size, S.

THE INADEQUACIES OF RSM Given the learning algorithm, SVM, and the ensemble size, S. ＊ Irregular Rule Each individual feature potentially possesses the different discriminate power for classification. A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones. ＊ Implicit Number How to choose a suitable subspace dimensionality for the SVM. Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier. random features selection Given w

Two importance distributions –Importance distribution of feature weight, W distribution to model the selected probability of each feature. –Importance distribution of subspace dimensionality, R distribution to automatically determine the suitable subspace size. Initialization R0R0 Kernel smoothing Feature Density (%) Class separability of LDA for each featureRe-substitution accuracy for each feature DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010)

THE FRAMEWORK OF DSM BASED ON SVM Given the learning algorithm, SVM, and the ensemble size, S.

INADEQUACIES OF DSM Given the learning algorithm, SVM, and the ensemble size, S. ＊ Kernel function The SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM. ＊ time-consuming Choosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.

The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function. Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically. An Optimal Kernel Method for Selecting RBF Kernel Parameter Gaussian Radial Basis Function (RBF) kernel : In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.

Kernel-based Dynamic Subspace Method (KDSM)

THE FRAMEWORK OF KDSM Original Dataset X Separability Feature (Band) Kernel based Feature Selection Distribution M dist Multiple Classifiers Subspace Pool (Reduced Dataset) Decision Fusion (Majority Voting) Kernel based W distribution Kernel Space (L-dimension) Optimal RBF Kernel Algorithm + Kernel Smoothing Optimal RBF Kernel Algorithm Until the performance of classification is stable

Experiment Design AlgorithmDescription SVM_CV Without any dimension reduction on only a single SVM with CV method SVM_OP Without any dimension reduction on only a single SVM with OP method DSM_W ACC DSM with the re-substitution accuracy as the feature weights DSM_ W LDA DSM with the separability of Fisher’s LDA as the feature weights KDSM Kernel-based dynamic subspace method proposed in this research OP : the optimal method to choose CV : 5-fold cross-validation We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ 2 ) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.

Hyperspectral Image data EXPERIMENTAL DATASET IR Image Image (No. of bands) Washington, DC Mall (dims d=191) # of classes7 Category (No. of labeled data) Roof (3776) Road (1982) Path (737) Grass (2870) Tree (1430) Water (1156) Shadow (840)

Experimental Results MethodSVM_CVSVM_OP DSM_ W ACC DSM_ W LDA KDSM Case 1 Accuracy (%) CPU Time (sec) Case 2 Accuracy (%) CPU Time (sec) Case 3 Accuracy (%) CPU Time (sec) There are three cases in Washington, DC Mall. case 1: ; case 2: case 3: : the number of training samples in class i : the number of all training samples

Experiment Results in Washington, DC Mall Method Case 1Case 2Case 3 AccuracyRatioAccuracyRatioAccuracyRatio DSM_W ACC 85.49% % % DSM_W LDA 87.47% % % KDSM88.64%192.53%197.43%1 The outcome of classification by using various multiple classifier systems:

Classification Maps with N i =20 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM

Classification Maps (roof) with N i =40 □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM

Classification Maps with N i =300 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM

In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset. The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets. Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time. Conclusions

25 Thank You