Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,

Slides:



Advertisements
Similar presentations
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Advertisements

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Particle Swarm Optimization (PSO)  Kennedy, J., Eberhart, R. C. (1995). Particle swarm optimization. Proc. IEEE International Conference.
Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Sparse vs. Ensemble Approaches to Supervised Learning
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Reduced Support Vector Machine
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Sparse vs. Ensemble Approaches to Supervised Learning
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
CS 4700: Foundations of Artificial Intelligence
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Particle Swarm Optimization Algorithms
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Issues with Data Mining
Efficient Model Selection for Support Vector Machines
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
(Particle Swarm Optimisation)
1 IE 607 Heuristic Optimization Particle Swarm Optimization.
Topics in Artificial Intelligence By Danny Kovach.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Consensus Group Stable Feature Selection
Classification Ensemble Methods 1
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
Faculty of Information Engineering, Shenzhen University Liao Huilian SZU TI-DSPs LAB Aug 27, 2007 Optimizer based on particle swarm optimization and LBG.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Breeding Swarms: A GA/PSO Hybrid 簡明昌 Author and Source Author: Matthew Settles and Terence Soule Source: GECCO 2005, p How to get: (\\nclab.csie.nctu.edu.tw\Repository\Journals-
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Meta-heuristics Introduction - Fabien Tricoire
COMP61011 : Machine Learning Ensemble Models
Training Neural networks to play checkers
Data Mining Practical Machine Learning Tools and Techniques
Prepared by: Mahmoud Rafeek Al-Farra
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
iSRD Spam Review Detection with Imbalanced Data Distributions
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica CHALEARN’s reading group on model selection, June 14, 2012

Design cycle of a classifier Data Preprocessing Feature selection Classification method Training model Model evaluation ERROR [ ] ? … Instances Features Data

Full model selection Data Full model selection

Wrappers for model selection In most of the related works a single model is considered and their hyperparameters are optimized via heuristic optimization: – Swarm optimization, – Genetic algorithms, – Pattern search, – Genetic programming – … M. Momma, K. Bennett. A Pattern Search Method for Model Selection of Support Vector Regression. Proceedings of the SIAM conference on data mining, pp. 261—274, T. Howley, M. Madden. The Genetic Kernel Support Vector Machine: Description and Evaluation. Artificial Intelligence Review, Vol 24(3-4): 379—395, B. Zhang and H. Muhlenbein. Evolving optimal neural networks using genetic algorithms with Occam's razor. Complex Systems, Vol. 7 (1993), pp

Model type selection for regression Genetic algorithms are used for the selection of model type (learning method, feature selection, preprocessing) and parameter optimization for regression problems D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul): ,

PSMS: PSO for full model selection Particle swarm model selection: Use particle swarm optimization for exploring the search space of full models in a particular ML-toolbox Normalize + RBF-SVM (γ = 0.01) PCA + Neural-Net (10 h. units) Relief (feat. Sel.) +Naïve Bayes Normalize + PolySVM (d= 2) Neural Net ( 3 units) s2n (feat. Sel.) +K- ridge

Particle swarm optimization 1.Randomly initialize a population of particles (i.e., the swarm) 2.Repeat the following iterative process until stop criterion is meet: a)Evaluate the fitness of each particle b)Find personal best (p i ) and global best (p g ) c)Update particles d)Update best solution found (if needed) 3.Return the best particle (solution) Fitness value... Initt=1t=2t=max_it...

PSMS : PSO for full model selection Codification of solutions as real valued vectors Choice of methods for preprocessing, feature selection and classification Hyperparameters for the selected methods Preprocessing before feature selection?

Particle swarm optimization Each individual (particle) i has: – A position in the search space (X i t ), which represents a solution to the problem at hand, – A velocity vector (V i t ), which determines how a particle explores the search space After random initialization, particles update their positions according to: A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,2006 Local information Global information Inertia weight

Some results in benchmark data Comparison of PSMS and pattern search H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , 2009.

Some results in benchmark data Comparison of PSMS and pattern search H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , 2009.

Some results in benchmark data The inertia weight may help to avoid overfitting and to direct the search

Some results in benchmark data Global and local information also helps! No local information Is considered

Some results in benchmark data Global and local information also helps! No global information Is considered

Some results in benchmark data Global and local information also helps! Global and local information is considered

Some results in benchmark data Mechanisms of PSO that may help to avoid overfitting: – Each solution is updated using local and global information – Inertia weight – Early stopping – Randomness – Heterogeneous representation

PSMS in the ALvsPK challenge Five data sets for binary classification Goal: to obtain the best classification model for each data set Two tracks: – Prior knowledge – Agnostic learning

PSMS in the ALvsPK challenge Best configuration of PSMS: EntryDescriptionAdaGinaHivaNovaSylvaOverallRank Interim-all-priorBest PK st psmsx_jmlr_run_IPSMS nd Logitboost-treesBest AL th Comparison of the performance of models selected with PSMS with that obtained by other techniques in the ALvsPK challenge Models selected with PSMS for the different data sets

Ensemble PSMS Idea: taking advantage of the large number of models that are evaluated during the search for building ensemble classifiers Problem: How to select the partial solutions from PSMS so that they are accurate and diverse to each other Motivation: The success of ensemble classifiers depends mainly in two key aspects of individual models: Accuracy and diversity

Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=2I=max_it... I=2I=3

Experimental results Data: – 9 Benchmark machine learning data sets (binary classification) – 1 Object recognition data set (multiclass, 10 classes) IDData setTrainingTestingFeatures 1Breast-cancer Diabetes Flare solar German Heart Image Splice Thyroid Titanic ORSCEF Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010 [Best Student Paper Award].

Experimental results: performance Benchmark data sets: better performance is obtained by ensemble methods IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±0.91 Avg.82.90± ± ± ±0.65 Average accuracy over 10-trials of PSMS and EPSMS in benchmark data Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010 [Best Student Paper Award].

Experimental results: Diversity of ensemble Diversity results IDEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Avg ± ± ± EPSMS-SE models are more diverse Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010 [Best Student Paper Award].

Experimental results: region labeling Per-concept improvement of EPSMS variants over straight PSMS IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI AUC91.53± ± ± ±5.3 MCC69.58%76.59%79.13%81.49%

Lessons learned Ensembles generated with EPSMS outperformed individual classifiers; including those selected with PSMS Models evaluated by PSMS are diverse to each other and accurate More stable predictions are obtained with the ensemble version of PSMS

Other applications of PSMS/EPSMS Successful: – Acute leukemia classification – Authorship verification (Spanish/English) – Authorship attribution – Region labeling – ML Challenges Not successful: – Review recommendation (14 features) – Region labeling (~90 classes) – Sentiment analysis on speech signals (high p – small n) – Plagiarism detection (a few samples) – ML Challenges

Thank you!