Novel representations and methods in text classification Manuel Montes, Hugo Jair Escalante Instituto Nacional de Astrofísica, Óptica y Electrónica, México. {mmontesg, 7th Russian Summer School in Information Retrieval Kazan, Russia, September 2013
Overview of the course Day 1: Introduction to text classification Day 2: Bag of concepts representations Day 3: Representations that incorporate sequential and syntactic information Day 4: Methods for non-conventional classification Day 5: Automated construction of classification models 2
AUTOMATED CONSTRUCTION OF CLASSIFICATION MODELS: FULL MODEL SELECTION Novel represensations and methods in text classification 3
Outline Pattern classification Model selection in a broad sense: FMS Related works PSO for full model selection Experimental results and applications Conclusions and future work directions 4
Pattern classification Trained machine 1 (is a car) -1 (is not a car) Queries Answers Learning machine Training data 5
Applications Natural language processing, Computer vision, Robotics, Information technology, Medicine, Science, Entertainment, …. 6
Example: galaxy classification from images Data Preprocessing Feature selection Classification method Training model Model evaluation ERROR [ ] ? … Instances Features Data 7
Some issues with the cycle of design How to preprocess the data? What method to use? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 8
Some issues with the cycle of design What feature selection method to use? How many features are enough? How to set the hyperparameters of the FS method? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 9
Some issues with the cycle of design What learning machine to use? How to set the hyperparameters of the chosen method? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 10
Some issues with the cycle of design What combination of methods works better? How to evaluate model’s performance How to avoid overfitting? Data Preprocessing Feature selection Classification method Training model Model evaluation Data 11
Some issues with the cycle of design The above issues are usually fixed manually: – Domain expert’s knowledge – Machine learning specialists’ knowledge – Trial and error The design/development of a pattern classification system relies on the knowledge and biases of humans, which may be risky, expensive and time consuming Automated solutions are available but only for particular processes (e.g., either feature selection, or classifier selection but not both) Is it possible to automate the whole process? 12
Full model selection Data Full model selection H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): ,
Full model selection Given a set of methods for data preprocessing, feature selection and classification select the combination of methods (together with their hyperparameters) that minimizes an estimate of classification performance H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): ,
Full model selection Full model: A model composed of methods for data preprocessing, feature selection and classification Example: chain { 1:normalize center=0 2:relief f_max=Inf w_min=-Inf k_num=4 3:neural units=10 shrinkage=1e-014 balance=0 maxiter=100 } chain { 1:standardize center=1 2:svcrfe svc kernel linear coef0=0 degree=1 gamma=0 shrinkage=0.001 f_max=Inf 3:pc_extract f_max=2000 4:svm kernel linear C=Inf ridge=1e-013 balanced_ridge=0 nu=0 alpha_cutoff=-1 b0=0 nob=0 } 15
Full model selection Pros – The job of the data analyst is considerably reduced – Neither knowledge on the application domain nor on machine learning is required – Different methods for preprocessing, feature selection and classification are considered – It can be used in any classification problem Cons – It is real function + combinatoric optimization problem – Computationally expensive – Risk of overfitting 16
OVERVIEW OF RELATED WORKS Novel represensations and methods in text classification 17
Model selection via heuristic optimization A single model is considered and their hyperparameters are optimized via heuristic optimization: – Swarm optimization, – Genetic algorithms, – Pattern search, – Genetic programming – … 18
GEMS GEMS (Gene Expression Model Selection) is a system for automated development and valuation of high- quality cancer diagnostic models and biomarker discovery from microarray gene expression data A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. GEMS: A System for Automated Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data. International Journal of Medical Informatics, 2005 Aug;74(7-8): N. Fanananapazir, A. Statnikov, C.F. Aliferis. The Fast-AIMS Clinical Mass Spectrometry Analysis System. Advaces in Bioinformatics, 2009, Article ID
GEMS The user specifies the models, and methods to be considered GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters 20
GEMS The user specifies the models, and methods to be considered GEMS explores all of the combinations of methods, using grid search to optimize hyperparameters 21
Model type selection for regression Genetic algorithms are used for the selection of model type (learning method, feature selection, preprocessing) and parameter optimization for regression problems D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul): ,
Meta-learning: learning to learn Evaluates and compares the application of learning algorithms on (many) previous tasks/domains to suggest learning algorithms (combinations, rankings) for new tasks Focuses on the relation between tasks/domains and learning algorithms Accumulating experience on the performance of multiple applications of learning methods Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer,
Meta-learning: learning to learn Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , Brazdil P., Vilalta R, Giraud-Carrier C., Soares C.. Metalearning. Encyclopedia of Machine learning. Springer,
Google prediction API “Machine learning as a service in the cloud” Upload your data, train a model and perform queries Nothing is for free! 25
Google prediction API Your data become property of Google! 26
IBM’s SPSS modeler 27
$$$$$ 28
Who cares? 29
PSMS: OUR APPROACH TO FULL MODEL SELECTION Novel represensations and methods in text classification 30
Full model selection Data Full model selection 31
PSMS: Our approach to full model selection Particle swarm model selection: Use particle swarm optimization for exploring the search space of full models in a particular ML-toolbox H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, Normalize + RBF-SVM (γ = 0.01) PCA + Neural- Net (10 h. units) Relief (feat. Sel.) +Naïve Bayes Normalize + PolySVM (d= 2) Neural Net ( 3 units) s2n (feat. Sel.) +K-ridge 32
Particle swarm optimization Population-based search heuristic Inspired on the behavior of biological communities that exhibit local and social behaviors A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,
Particle swarm optimization Each individual (particle) i has: – A position in the search space (X i t ), which represents a solution to the problem at hand, – A velocity vector (V i t ), which determines how a particle explores the search space After random initialization, particles update their positions according to: A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley,
Particle swarm optimization 1.Randomly initialize a population of particles (i.e., the swarm) 2.Repeat the following iterative process until stop criterion is meet: a)Evaluate the fitness of each particle b)Find personal best (p i ) and global best (p g ) c)Update particles d)Update best solution found (if needed) 3.Return the best particle (solution) Fitness value... Initt=1t=2t=max_it... 35
PSMS : PSO for full model selection Set of methods (not restricted to this set) Classification Feature selection Preprocessing 36
PSMS : PSO for full model selection Codification of solutions as real valued vectors Choice of methods for preprocessing, feature selection and classification Hyperparameters for the selected methods Preprocessing before feature selection? 37
PSMS : PSO for full model selection Fitness function: – K-fold cross-validation balanced error rate – K-fold cross-validation area under the ROC curve Training data Train Test Error fold 1 Error fold 2 Error fold 3 Error fold 4 Error fold 5 CV estimate 38
SOME EXPERIMENTAL RESULTS OF PSMS Novel represensations and methods in text classification 39
PSMS in the ALvsPK challenge Five data sets for binary classification Goal: to obtain the best classification model for each data set Two tracks: – Prior knowledge – Agnostic learning 40
PSMS in the ALvsPK challenge Best configuration of PSMS: EntryDescriptionAdaGinaHivaNovaSylvaOverallRank Interim-all-priorBest PK st psmsx_jmlr_run_IPSMS nd Logitboost-treesBest AL th Comparison of the performance of models selected with PSMS with that obtained by other techniques in the ALvsPK challenge Models selected with PSMS for the different data sets 41
PSMS in the ALvsPK challenge Official ranking: 42
Some results in benchmark data Comparison of PSMS and pattern search 43
Some results in benchmark data Comparison of PSMS and pattern search 44
Some results in benchmark data Comparison of PSMS and pattern search 45
PSMS: Interactive demo Isabelle Guyon, Amir Saffari, Hugo Jair Escalante, Gokan Bakir, and Gavin Cawley, CLOP: a Matlab Learning Object Package. NIPS 2007 Demonstrations, Vancouver, British Columbia, Canada
PSMS: APPLICATIONS AND EXTENSIONS Novel represensations and methods in text classification 47
Authorship verification
The task of deciding whether given text documents were or were not written by a certain author (i.e., a binary classification problem) Applications include: fraud detection, spam filtering, computer forensics and plagiarism detection 49
Authorship verification 50 Dear Mary, I have been busy in the last few weeks, however, I would like to ….…. This paper describes an novel approach to region labeling by using a Markov random field model …. Hereby I am sending my application for the PhD position offered in your research group … Hi Jane, how things are going on in Mexico; I hope you are not having problems because … Sample documents from author “X” Our paper introduces a novel method for … Unseen document Model for author “X” Feature extractionModel construction and testing This document was not written by “X” Prediction Today in Brazil an Air France plane crashed down, during a storm…. Sample documents from other authors than “X” … … … …
PSMS for authorship verification Documents are represented by their (binary) bag- of-words We apply straight PSMS for selecting the classification model for each author We report average performance over several authors Iberoamerican congress on pattern recognition IberoamericanAccommodationCongressPatternRecognition …… ……… Zorro Hugo Jair Escalante, Manuel Montes, and Luis Villaseñor. Particle Swarm Model Selection for Authorship Verification. LNCS 5856, pp , Springer,
Experimental results Two collections were considered for experiments: Evaluation measures: – Balanced error rate (BER) – F 1 -measure We compare PSMS to results obtained with individual classification models CollectionTrainingTestingFeaturesAuthorsLanguageDomain MX-PO281728,9705SpanishPoetry CCAT2,500 3,40050EnglishNews 52
Experimental results Col. / Class. zarbinaïvelboostneuralsvckridgerflssvmFMS/1FMS/0FMS/-1 MX-PO CCAT Average (over the authors) of balanced error rate obtained by each technique in the test sets 53
Experimental results Full models selected for the MX-PO data set PoetPreprocessingFeature Selection ClassifierBER Efrain Huertastandardize(1)-zarbi10.28 % Jaime Sabines--zarbi26.79 % Octavio Paznormalize(0)Zfilter(3070)zarbi25.09 % Rosario Castellanos normalize(0)Zfilter(3400)Kridge (γ=0.44; s= 0.407) 33.04% Ruben Bonifazshift-scale (log)-neural (u=3;s=1.81;i=15;b=1);35.71% 54
Experimental results ClassifiersFeature selection Preprocessing zarbinaiveneuralsvckridgelssvmWWOW Frequency of selection 68%10% 2% 8%76%24%88%12% Balanced error rate (BER) Statistics on the selection of methods when using PSMS (FMS/1) for the CCAT collection 55
Experimental results Per-author performance by models selected with FMS/1 for the CCA data set 96.91% [normalize+standardize+shift-scale] + [Ftest (104)] + [naïve] Most relevant features (words) for this author: china, chinesse, beijing, official diplomat 56
Experimental results Per-author performance by models selected with FMS/1 for the CCA data set 14.81% [normalize] + [] + [lssvm] [normalize+standardize+shift-scale] + [Zfilter (234)] + [zarbi] When using FMS/0 we got F1= 45.71% 57
Lessons learned PSMS selected very effective models for each author, Allowing the user to provide/fix a classifier resulted beneficial We were not able to outperform baseline methods in the multiclass classification task of authorship attribution 58
ENSEMBLE PSMS Novel represensations and methods in text classification 59
Ensemble PSMS Many models are evaluated during the search process of PSMS; although a single model is selected 60
Ensemble PSMS Idea: taking advantage of the large number of models that are evaluated during the search for building ensemble classifiers Problem: How to select the partial solutions from PSMS so that they are accurate and diverse to each other Motivation: The success of ensemble classifiers depends mainly in two key aspects of individual models: Accuracy and diversity 61
Ensemble PSMS How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 62
How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 63 Ensemble PSMS
How to select potential models for building ensembles? BS: store the global best model in each iteration BI: the best model in each iteration SE: combine the outputs of the final swarm How to fuse the outputs of the selected models? Simple (un-weighted) voting Fitness value... InitI=1I=4I=max_it... I=2I=3 64 Ensemble PSMS
Experimental results Data: – 9 Benchmark machine learning data sets (binary classification) – 1 Object recognition data set (multiclass, 10 classes) IDData setTrainingTestingFeatures 1Breast-cancer Diabetes Flare solar German Heart Image Splice Thyroid Titanic ORSCEF Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010
Experimental results Evaluation: – Average of area under the ROC curve (performance) – Coincident failure diversity (ensemble diversity) If p 0 < 1 If p 0 = 1 W. Wang. Some fundamental issues in ensemble methods. Proc. Of IJCNN07, pp. 2244–2251, IEEE, July
Experimental results: performance Benchmark data sets: better performance is obtained by ensemble methods IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±0.91 Avg.82.90± ± ± ±0.65 Average accuracy over 10-trials of PSMS and EPSMS in benchmark data 67 Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010
Experimental results: Diversity of ensemble Diversity results IDEPSMS-BSEPSMS-SEEPSMS-BI ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Avg ± ± ± EPSMS-SE models are more diverse 68 Hugo Jair Escalante, Manuel Montes and Enrique Sucar. Ensemble Particle Swarm Model Selection. Proceedings of the International Joint Conference on Neural Networks (IJCNN2010 – WCCI2010), pp , IEEE,, 2010
Experimental results: region labeling Per-concept improvement of EPSMS variants over straight PSMS IDPSMSEPSMS-BSEPSMS-SEEPSMS-BI AUC91.53± ± ± ±5.3 MCC69.58%76.59%79.13%81.49% 69
Lessons learned Ensembles generated with EPSMS outperformed individual classifiers; including those selected with PSMS Models evaluated by PSMS are diverse to each other and accurate The extension of PSMS for generating ensembles is promising 70
Summary Full model selection is a broad view of the model selection problem in supervised learning There is an increasing interest for this type of methods The search space is huge and multiple minima may exist There is no guarantee of avoiding overfitting Yet, we showed evidence that it is possible to attempt to automate the cycle of design of pattern classification systems 71
Summary PSMS / EPSMS an automatic tool for the selection of classification models for any classification task PSMS / EPSMS has been successfully applied in several domains Disclaimer: We do not expect PSMS / EPSMS to perform well in every application it is tested, although we recommend it as a first option when building a classifier 72
Summary For multiclass classification straight OVA strategies did not work Alternative methods that do not use the output of classifiers are a good option to explore as future work 73
Research opportunities Multi-Swarm PSMS for building ensembles Multiclass PSMS/EPSMS: – Bi-level optimization (i.e., individual and multiclass performance) – Learning to fuse the outputs (e.g., using genetic programming) Meta-ensembles 74
Research opportunities Other search strategies for full model selection (e.g., Genetic programming, GRASP, Tabu search) Other toolboxes (e.g., weka) Meta-learning + PSMS Combination of different full model selection techniques 75
Applications Any classification problem where specific classifiers are required for each class 76
References Isabelle Guyon. A Practical Guide to Model Selection. In Jeremie Marie, editor, Machine Learning Summer School 2008,Springer Texts in Statistics, (slide from I.Guyon’s) A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. GEMS: A System for Automated Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data. International Journal of Medical Informatics, 2005 Aug;74(7-8): D. Gorissen, T. Dhaene, F. de Turck. Evolutionary Model Type Selection for Global Surrogate Modeling. In Journal of Machine Learning Research, 10(Jul): , 2009 Brazdil P., Giraud-Carrier C., Soares C., Vilalta R. Metalearning: Applications to Data Mining. Springer Verlag. ISBN: , H. J. Escalante. Towards a Particle Swarm Model Selection algorithm. Multi-level inference workshop and model selection game, NIPS, Whistler, VA, BC, Canada, Isabelle Guyon, Amir Saffari, Hugo Jair Escalante, Gokan Bakir, and Gavin Cawley, CLOP: a Matlab Learning Object Package. NIPS 2007 Demonstrations, Vancouver, British Columbia, Canada H. J. Escalante, E. Sucar, M. Montes. Particle Swarm Model Selection, In Journal of Machine Learning Research, 10(Feb): , Hugo Jair Escalante, Jesus A. Gonzalez, Manuel Montes-y-Gomez, Pilar Gómez, Carolina Reta, Carlos A. Reyes, Alejandro Rosales. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, In press, Artificial Intelligence in Medicine, Available online 15 April Hugo Jair Escalante, Manuel Montes, and Luis Villaseñor. Particle Swarm Model Selection for Authorship Verification. LNCS 5856, pp , Springer,
Questions? 7th Russian Summer School in Information Retrieval Kazan, Russia, September
79
EPSMS for acute leukemia classification Acute leukemia: a malignant disease that affects a considerable portion of the world population There are different types and subtypes of acute leukemia, requiring different treatments. 80
EPSMS for acute leukemia classification Different types/subtypes: – Acute Lymphocytic Leukemia (ALL): L1, L2, L3 – Acute Myelogenous Leukemia (AML): M0, M1, M2, M3, M4, M5, M6, M7 Considered tasks: – Binary ALL vs AML L1 vs L2 M2 vs M3, M5, M3 vs. M2, M5, M3 vs M2, M5, – Multiclass M1 vs M2 vs M3 L1 vs L2 vs M1 vs M2 vs M3 81
EPSMS for acute leukemia classification Despite the fact that there are advanced and precise methods to identify leukemia types, they are very expensive and unavailable in most of hospitals of third world countries A Cheaper alternative: morphological acute leukemia classification from bone marrow cell images 82
EPSMS for acute leukemia classification Morphological classification: – Image registration – Image segmentation – Feature extraction (Morphological, statistical, texture) – Classifier construction CellNucleusCytoplasm J. A. Gonzalez et al. Leukemia Identification from Bone Marrow Cells Images using a Machine Vision and Data Mining Strategy, Intelligent Data Analysis, Vol 15(3):443—462,
EPSMS for acute leukemia classification In previous work either: – the same classification model has been considered for different problems, or – Models have been manually selected by trial and error Proposal: Using EPSMS for automatically selecting specific classifiers for the different tasks Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012) 84
EPSMS for acute leukemia classification Experiments were performed with real data collected by a Mexican health institution (IMSS), using 10-fold CV 85
EPSMS for acute leukemia classification In general ensembles generated with EPSMS outperformed previous work Models selected with EPSMS were more effective than those selected with straight PSMS 86 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)
EPSMS for ALC - binary tasks 87
EPSMS for ALC – Multiclass MeasureRegionRef.PSMSE1E2E3 M2 vs M3 vs M5 Avg. AUCC AccuracyC Avg. AUCN&C AccuracyN&C L1 vs L2 vs M2 vs M3 vs M5 Avg. AUCC AccuracyC Avg. AUCN&C AccuracyN&C Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)
Models selected with EPSMS 89 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)
Models selected with EPSMS 90 Hugo Jair Escalante, et al.. Acute Leukemia Classiffcation with Ensemble Particle Swarm Model Selection, Artificial Intelligence in Medicine, 55(3): (2012)
Lessons learned Models selected with PSMS / EPSMS outperformed those manually selected after a careful study Different settings of EPMS outperformed both PSMS and the baseline, although we did not found a better strategy for all of the tasks The multiclass classification performance was acceptable, although there is still much work to do 91