Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Classification Classification Examples
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Particle Swarm Optimization (PSO)  Kennedy, J., Eberhart, R. C. (1995). Particle swarm optimization. Proc. IEEE International Conference.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Reduced Support Vector Machine
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Presented by Zeehasham Rasheed
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Introduction to Machine Learning Approach Lecture 5.
Introduction to machine learning
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Issues with Data Mining
Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.
Efficient Model Selection for Support Vector Machines
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Full model selection with heuristic search: a first approach with PSO Hugo Jair Escalante Computer Science Department, Instituto Nacional de Astrofísica,
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Universit at Dortmund, LS VIII
Benk Erika Kelemen Zsolt
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Consensus Group Stable Feature Selection
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Introduction to Machine Learning, its potential usage in network area,
Brief Intro to Machine Learning CS539
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
An Artificial Intelligence Approach to Precision Oncology
School of Computer Science & Engineering
An Inteligent System to Diabetes Prediction
Overview of Machine Learning
iSRD Spam Review Detection with Imbalanced Data Distributions
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Presentation transcript:

Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México. {mmontesg, 7th Russian Summer School in Information Retrieval Kazan, Russia, September 2013

Overview of the course Day 1: Introduction to text classification Day 2: Bag of concepts representations Day 3: Representations that incorporate sequential and syntactic information Day 4: Methods for non-conventional classification Day 5: Automated construction of classification models 2

AUTOMATED CONSTRUCTION OF CLASSIFICATION MODELS: FULL MODEL SELECTION Novel represensations and methods in text classification 3

Outline Pattern classification Model selection in a broad sense: FMS Related works PSO for full model selection Experimental results and applications Conclusions and future work directions 4

Pattern classification Trained machine 1 (is a car) -1 (is not a car) Queries Answers Learning machine Training data 5

Applications Natural language processing, Computer vision, Robotics, Information technology, Medicine, Science, Entertainment, …. 6

Example: galaxy classification from images Data Preprocessing Feature selection Classification method Training model Model evaluation ERROR [ ] ? … Instances Features Data 7

Some issues with the cycle of design How to preprocess the data? What method to use? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 8

Some issues with the cycle of design What feature selection method to use? How many features are enough? How to set the hyperparameters of the FS method? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 9

Some issues with the cycle of design What learning machine to use? How to set the hyperparameters of the chosen method? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 10

Some issues with the cycle of design What combination of methods works better? How to evaluate model’s performance How to avoid overfitting? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 11

Some issues with the cycle of design The above issues are usually fixed manually: – Domain expert’s knowledge – Machine learning specialists’ knowledge – Trial and error The design/development of a pattern classification system relies on the knowledge and biases of humans, which may be risky, expensive and time consuming Automated solutions are available but only for particular processes (e.g., either feature selection, or classifier selection but not both) Is it possible to automate the whole process? 12

Full model selection Data Full model selection H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): ,

Full model selection Given a set of methods for data preprocessing, feature selection and classification select the combination of methods (together with their hyperparameters) that minimizes an estimate of classification performance H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): ,

Full model selection Full model: A model composed of methods for data preprocessing, feature selection and classification Example: chain { 1:normalize center=0 2:relief f_max=Inf w_min=-Inf k_num=4 3:neural units=10 shrinkage=1e-014 balance=0 maxiter=100 } chain { 1:standardize center=1 2:svcrfe svc kernel linear coef0=0 degree=1 gamma=0 shrinkage=0.001 f_max=Inf 3:pc_extract f_max=2000 4:svm kernel linear C=Inf ridge=1e-013 balanced_ridge=0 nu=0 alpha_cutoff=-1 b0=0 nob=0 } 15

Full model selection Pros – The job of the data analyst is considerably reduced – Neither knowledge on the application domain nor on machine learning is required – Different methods for preprocessing, feature selection and classification are considered – It can be used in any classification problem Cons – It is real function + combinatoric optimization problem – Computationally expensive – Risk of overfitting 16

OVERVIEW OF RELATED WORKS Novel represensations and methods in text classification 17

Model selection via heuristic optimization A single model is considered and their hyperparameters are optimized via heuristic optimization: – Swarm optimization, – Genetic algorithms, – Pattern search, – Genetic programming – … 18

GEMS GEMS (Gene Expression Model Selection) is a system for automated development and valuation of high- quality cancer diagnostic models and biomarker discovery from microarray gene expression data A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. GEMS: A System for Automated Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data. International Journal of Medical Informatics, 2005 Aug;74(7-8): N. Fanananapazir, A. Statnikov, C.F. Aliferis. The Fast-AIMS Clinical Mass Spectrometry Analysis System. Advaces in Bioinformatics, 2009, Article ID

GEMS The user specifies the models, and methods to be considered GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters 20

GEMS The user specifies the models, and methods to be considered GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters 21

Model type selection for regression Genetic algorithms are used for the selection of model type (learning method, feature selection, preprocessing) and parameter optimization for regression problems D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul): ,

Meta-learning: learning to learn Evaluates and compares the application of learning algorithms on (many) previous tasks/domains to suggest learning algorithms (combinations, rankings) for new tasks Focuses on the relation between tasks/domains and learning algorithms Accumulating experience on the performance of multiple applications of learning methods Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer,

Meta-learning: learning to learn Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer,

Google prediction API “Machine learning as a service in the cloud” Upload your data, train a model and perform queries Nothing is for free! 25

Google prediction API Your data become property of Google! 26

IBM’s SPSS modeler 27

$$$$$ 28

Who cares? 29

PSMS: OUR APPROACH TO FULL MODEL SELECTION Novel represensations and methods in text classification 30

Full model selection Data Full model selection 31

PSMS: Our approach to full model selection Particle swarm model selection: Use particle swarm optimization for exploring the search space of full models in a particular ML-toolbox H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, Normalize + RBF-SVM (γ = 0.01) PCA + Neural- Net (10 h. units) Relief (feat. Sel.) +Naïve Bayes Normalize + PolySVM (d= 2) Neural Net ( 3 units) s2n (feat. Sel.) +K-ridge 32

Particle swarm optimization Population-based search heuristic Inspired on the behavior of biological communities that exhibit local and social behaviors A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,

Particle swarm optimization Each individual (particle) i has: – A position in the search space (X i t ), which represents a solution to the problem at hand, – A velocity vector (V i t ), which determines how a particle explores the search space After random initialization, particles update their positions according to: A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,

Particle swarm optimization 1.Randomly initialize a population of particles (i.e., the swarm) 2.Repeat the following iterative process until stop criterion is meet: a)Evaluate the fitness of each particle b)Find personal best (p i ) and global best (p g ) c)Update particles d)Update best solution found (if needed) 3.Return the best particle (solution) Fitness value... Initt=1t=2t=max_it... 35

PSMS : PSO for full model selection Set of methods (not restricted to this set) Classification Feature selection Preprocessing 36

PSMS : PSO for full model selection Codification of solutions as real valued vectors Choice of methods for preprocessing, feature selection and classification Hyperparameters for the selected methods Preprocessing before feature selection? 37

PSMS : PSO for full model selection Fitness function: – K-fold cross-validation balanced error rate – K-fold cross-validation area under the ROC curve Training data Train Test Error fold 1 Error fold 2 Error fold 3 Error fold 4 Error fold 5 CV estimate 38

SOME EXPERIMENTAL RESULTS OF PSMS Novel represensations and methods in text classification 39

PSMS in the ALvsPK challenge Five data sets for binary classification Goal: to obtain the best classification model for each data set Two tracks: – Prior knowledge – Agnostic learning 40

PSMS in the ALvsPK challenge Best configuration of PSMS: EntryDescriptionAdaGinaHivaNovaSylvaOverallRank Interim-all-priorBest PK st psmsx_jmlr_run_IPSMS nd Logitboost-treesBest AL th Comparison of the performance of models selected with PSMS with that obtained by other techniques in the ALvsPK challenge Models selected with PSMS for the different data sets 41

PSMS in the ALvsPK challenge Official ranking: 42

Some results in benchmark data Comparison of PSMS and pattern search 43

Some results in benchmark data Comparison of PSMS and pattern search 44

Some results in benchmark data Comparison of PSMS and pattern search 45

PSMS: Interactive demo Isabelle Guyon, Amir Saffari, Hugo Jair Escalante, Gokan Bakir, and Gavin Cawley, CLOP: a Matlab Learning Object Package. NIPS 2007 Demonstrations, Vancouver, British Columbia, Canada

PSMS: APPLICATIONS AND EXTENSIONS Novel represensations and methods in text classification 47

Authorship verification

The task of deciding whether given text documents were or were not written by a certain author (i.e., a binary classification problem) Applications include: fraud detection, spam filtering, computer forensics and plagiarism detection 49

Authorship verification 50 Dear Mary, I have been busy in the last few weeks, however, I would like to ….…. This paper describes an novel approach to region labeling by using a Markov random field model …. Hereby I am sending my application for the PhD position offered in your research group … Hi Jane, how things are going on in Mexico; I hope you are not having problems because … Sample documents from author “X” Our paper introduces a novel method for … Unseen document Model for author “X” Feature extractionModel construction and testing This document was not written by “X” Prediction Today in Brazil an Air France plane crashed down, during a storm…. Sample documents from other authors than “X” … … … …

PSMS for authorship verification Documents are represented by their (binary) bag- of-words We apply straight PSMS for selecting the classification model for each author We report average performance over several authors Iberoamerican congress on pattern recognition IberoamericanAccommodationCongressPatternRecognition …… ……… Zorro Hugo Jair Escalante, Manuel Montes, and Luis Villaseñor. Particle Swarm Model Selection for Authorship Verification. LNCS 5856, pp , Springer,

Experimental results Two collections were considered for experiments: Evaluation measures: – Balanced error rate (BER) – F 1 -measure We compare PSMS to results obtained with individual classification models CollectionTrainingTestingFeaturesAuthorsLanguageDomain MX-PO281728,9705SpanishPoetry CCAT2,500 3,40050EnglishNews 52

Experimental results Col. / Class. zarbinaïvelboostneuralsvckridgerflssvmFMS/1FMS/0FMS/-1 MX-PO CCAT Average (over the authors) of balanced error rate obtained by each technique in the test sets 53

Experimental results Full models selected for the MX-PO data set PoetPreprocessingFeature Selection ClassifierBER Efrain Huertastandardize(1)-zarbi10.28 % Jaime Sabines--zarbi26.79 % Octavio Paznormalize(0)Zfilter(3070)zarbi25.09 % Rosario Castellanos normalize(0)Zfilter(3400)Kridge (γ=0.44; s= 0.407) 33.04% Ruben Bonifazshift-scale (log)-neural (u=3;s=1.81;i=15;b=1);35.71% 54

Experimental results ClassifiersFeature selection Preprocessing zarbinaiveneuralsvckridgelssvmWWOW Frequency of selection 68%10% 2% 8%76%24%88%12% Balanced error rate (BER) Statistics on the selection of methods when using PSMS (FMS/1) for the CCAT collection 55

Experimental results Per-author performance by models selected with FMS/1 for the CCA data set 96.91% [normalize+standardize+shift-scale] + [Ftest (104)] + [naïve] Most relevant features (words) for this author: china, chinesse, beijing, official diplomat 56

Experimental results Per-author performance by models selected with FMS/1 for the CCA data set 14.81% [normalize] + [] + [lssvm] [normalize+standardize+shift-scale] + [Zfilter (234)] + [zarbi] When using FMS/0 we got F1= 45.71% 57

Lessons learned PSMS selected very effective models for each author, Allowing the user to provide/fix a classifier resulted beneficial We were not able to outperform baseline methods in the multiclass classification task of authorship attribution 58

ENSEMBLE PSMS Novel represensations and methods in text classification 59

Ensemble PSMS Many models are evaluated during the search process of PSMS; although a single model is selected 60

Ensemble PSMS Idea: taking advantage of the large number of models that are evaluated during the search for building ensemble classifiers Problem: How to select the partial solutions from PSMS so that they are accurate and diverse to each other Motivation: The success of ensemble classifiers depends mainly in two key aspects of individual models: Accuracy and diversity 61

Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 62

How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 63 Ensemble PSMS

How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 64 Ensemble PSMS

Experimental results Data: – 9 Benchmark machine learning data sets (binary classification) – 1 Object recognition data set (multiclass, 10 classes) IDData setTrainingTestingFeatures 1Breast-cancer Diabetes Flare solar German Heart Image Splice Thyroid Titanic ORSCEF Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010

Experimental results Evaluation: – Average of area under the ROC curve (performance) – Coincident failure diversity (ensemble diversity) If p 0 < 1 If p 0 = 1 W. Wang. Some fundamental issues in ensemble methods. Proc. Of IJCNN07, pp. 2244–2251, IEEE, July

Experimental results: performance Benchmark data sets: better performance is obtained by ensemble methods IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±0.91 Avg.82.90± ± ± ±0.65 Average accuracy over 10-trials of PSMS and EPSMS in benchmark data 67 Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010

Experimental results: Diversity of ensemble Diversity results IDEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Avg ± ± ± EPSMS-SE models are more diverse 68 Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010

Experimental results: region labeling Per-concept improvement of EPSMS variants over straight PSMS IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI AUC91.53± ± ± ±5.3 MCC69.58%76.59%79.13%81.49% 69

Lessons learned Ensembles generated with EPSMS outperformed individual classifiers; including those selected with PSMS Models evaluated by PSMS are diverse to each other and accurate The extension of PSMS for generating ensembles is promising 70

Summary Full model selection is a broad view of the model selection problem in supervised learning There is an increasing interest for this type of methods The search space is huge and multiple minima may exist There is no guarantee of avoiding overfitting Yet, we showed evidence that it is possible to attempt to automate the cycle of design of pattern classification systems 71

Summary PSMS / EPSMS an automatic tool for the selection of classification models for any classification task PSMS / EPSMS has been successfully applied in several domains Disclaimer: We do not expect PSMS / EPSMS to perform well in every application it is tested, although we recommend it as a first option when building a classifier 72

Summary For multiclass classification straight OVA strategies did not work Alternative methods that do not use the output of classifiers are a good option to explore as future work 73

Research opportunities Multi-Swarm PSMS for building ensembles Multiclass PSMS/EPSMS: – Bi-level optimization (i.e., individual and multiclass performance) – Learning to fuse the outputs (e.g., using genetic programming) Meta-ensembles 74

Research opportunities Other search strategies for full model selection (e.g., Genetic programming, GRASP, Tabu search) Other toolboxes (e.g., weka) Meta-learning + PSMS Combination of different full model selection techniques 75

Applications Any classification problem where specific classifiers are required for each class 76

References Isabelle Guyon. A Practical Guide to Model Selection. In Jeremie Marie, editor, Machine Learning Summer School 2008,Springer Texts in Statistics, (slide from I.Guyon’s) A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. GEMS: A System for Automated Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data. International Journal of Medical Informatics, 2005 Aug;74(7-8): D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul): , 2009 Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, Isabelle Guyon, Amir Saffari, Hugo Jair Escalante, Gokan Bakir, and Gavin Cawley, CLOP: a Matlab Learning Object Package. NIPS 2007 Demonstrations, Vancouver, British Columbia, Canada H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , Hugo Jair Escalante, Jesus A. Gonzalez, Manuel Montes-y-Gomez, Pilar Gómez, Carolina Reta, Carlos A. Reyes, Alejandro Rosales. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, In press, Artificial Intelligence in Medicine, Available online 15 April Hugo Jair Escalante, Manuel Montes, and Luis Villaseñor. Particle Swarm Model Selection for Authorship Verification. LNCS 5856, pp , Springer,

Questions? 7th Russian Summer School in Information Retrieval Kazan, Russia, September

79

EPSMS for acute leukemia classification Acute leukemia: a malignant disease that affects a considerable portion of the world population There are different types and subtypes of acute leukemia, requiring different treatments. 80

EPSMS for acute leukemia classification Different types/subtypes: – Acute Lymphocytic Leukemia (ALL): L1, L2, L3 – Acute Myelogenous Leukemia (AML): M0, M1, M2, M3, M4, M5, M6, M7 Considered tasks: – Binary ALL vs AML L1 vs L2 M2 vs M3, M5, M3 vs. M2, M5, M3 vs M2, M5, – Multiclass M1 vs M2 vs M3 L1 vs L2 vs M1 vs M2 vs M3 81

EPSMS for acute leukemia classification Despite the fact that there are advanced and precise methods to identify leukemia types, they are very expensive and unavailable in most of hospitals of third world countries A Cheaper alternative: morphological acute leukemia classification from bone marrow cell images 82

EPSMS for acute leukemia classification Morphological classification: – Image registration – Image segmentation – Feature extraction (Morphological, statistical, texture) – Classifier construction CellNucleusCytoplasm J. A. Gonzalez et al. Leukemia Identification from Bone Marrow Cells Images using a Machine Vision and Data Mining Strategy, Intelligent Data Analysis, Vol 15(3):443—462,

EPSMS for acute leukemia classification In previous work either: – the same classification model has been considered for different problems, or – Models have been manually selected by trial and error Proposal: Using EPSMS for automatically selecting specific classifiers for the different tasks Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012) 84

EPSMS for acute leukemia classification Experiments were performed with real data collected by a Mexican health institution (IMSS), using 10-fold CV 85

EPSMS for acute leukemia classification In general ensembles generated with EPSMS outperformed previous work Models selected with EPSMS were more effective than those selected with straight PSMS 86 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)

EPSMS for ALC - binary tasks 87

EPSMS for ALC – Multiclass MeasureRegionRef.PSMSE1E2E3 M2 vs M3 vs M5 Avg. AUCC AccuracyC Avg. AUCN&C AccuracyN&C L1 vs L2 vs M2 vs M3 vs M5 Avg. AUCC AccuracyC Avg. AUCN&C AccuracyN&C Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)

Models selected with EPSMS 89 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)

Models selected with EPSMS 90 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)

Lessons learned Models selected with PSMS / EPSMS outperformed those manually selected after a careful study Different settings of EPMS outperformed both PSMS and the baseline, although we did not found a better strategy for all of the tasks The multiclass classification performance was acceptable, although there is still much work to do 91