Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.

Slides:

Advertisements

Similar presentations

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.

Feature Selection Presented by: Nafise Hatamikhah

WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Feature Selection for Regression Problems

Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.

Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.

Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%

Experimental Evaluation

05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.

CS Instance Based Learning1 Instance Based Learning.

Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Efficient Model Selection for Support Vector Machines

1 GAs and Feature Weighting Rebecca Fiebrink MUMT March 2005.

Presented by Tienwei Tsai July, 2005

Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.

Universit at Dortmund, LS VIII

Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"

Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.

Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.

1 A New Method for Composite System Annualized Reliability Indices Based on Genetic Algorithms Nader Samaan, Student,IEEE Dr. C. Singh, Fellow, IEEE Department.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Presented By Lingzhou Lu & Ziliang Jiao. Domain ● Optical Character Recogntion (OCR) ● Upper-case letters only.

Image Classification for Automatic Annotation

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Data Mining and Decision Support

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Feature Selection Methods Part-I By: Dr. Rajeev Srivastava IIT(BHU), Varanasi.

Novel Approaches to Optimised Self-configuration in High Performance Multiple Experts M.C. Fairhurst and S. Hoque University of Kent UK A.F. R. Rahman.

A new algorithm for directed quantum search Tathagat Tulsi, Lov Grover, Apoorva Patel Vassilina NIKOULINA, M2R III.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Alan P. Reynolds*, David W. Corne and Michael J. Chantler

Supervised Time Series Pattern Discovery through Local Importance

Boosting Nearest-Neighbor Classifier for Character Recognition

Feature Selection To avid “curse of dimensionality”

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Intro to Machine Learning

Self-supervised adaptation for on-line text recognition

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

Automatic Handwriting Generation

Presentation transcript:

Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British Columbia Department of Electrical & Computer Engineering

Outline Introduction & Problem Definition Motivation & Objectives System Overview Results Conclusions

Scanning Pre-Processing Feature Extraction Classification Post-Processing Text document Classified text Introduction Off-line Character Recognition System Address readers Bank Cheques readers Reading data entered in forms (tax forms) Detecting forged signatures

Introduction Many variants of character (symbol) shape, size. Different writers have different writing styles. Same person could have different writing style. Thus, unlimited number of variations for a single character exists. For typical handwritten recognition task:

Variations in handwritten digits extracted from zip codes To overcome this diversity, a large number of features must be added Introduction An example of features that we used are: moment invariants, number of loops, number of end points, centroid, area, circularity and so on. L=2, E=0 L=1, E=1 L=0, E=3

Add more features  Increase problem size Dilemma  Increase run time/memory for classification  To accommodate variations in symbols  Hope to increase classification accuracy Character Recognition System Problem  Add-hoc process, depends on experience and trail and error  Might add redundant/irrelevant features which decrease the accuracy

Advantages Feature Selection Solution: Feature Selection Definition: Select a relevant subset of features from a larger set of features while maintaining or enhancing accuracy  Remove irrelevant and redundant features Total of 40 features -> reduced to 16 7 Hu moments -> only first three Area removed -> redundant (Circularity)  Maintain/enhance the classification accuracy 70% recognition rate using 40 features -> 75% after FS & using only 16 features  Faster classification and less memory requirements

Feature Selection/Weighting The process of assigning weights (binary or real valued) to features needs a search algorithm to search for the set of weights that results in best classification accuracy (optimization problem) Genetic algorithm is a good search method for optimization problems Feature Selection (FS)Feature Weighting (FW) Special CaseGeneral Case Binary weights (0 for irrelevant/redundant & 1 for relevant) Real-valued weights (variable weights depending on the feature relevance) Number of feature subset combinations

Genetic Feature Selection/Weighting Has been proven to be a powerful search method for FS problem Does not require derivative information or any extra knowledge; only the objective function (classifier’s error rate) to evaluate the quality of the feature subset Search a population of solutions in parallel, so they can provide a number of potential solutions not only one GA is resistant to becoming trapped in local minima Why use GA for FS/FW

Objectives & Motivations Study the effect of varying weight values on the number of selected features (FS often eliminates more features than FW, how much ??) Compare the performance of genetic feature selection/weighting in the presence of irrelevant & redundant features (not studied before) Compare the performance of genetic feature selection/weighting for regular cases (test the hypothesis that says that FW should have better or at least same results as FS ??) Evaluate the performance of the better method (GFS or GFW) in terms of optimality and time complexity (study the feasibility of genetic search for optimality & time) Build a genetic feature selection/weighting system to be applied to character recognition problem and investigate the following issues:

Methodology The recognition problem is to classify isolated handwritten digits Used k-nearest-neighbor as a classifier (k=1) Used genetic algorithm as search method Applied genetic feature selection and weighting in the wrapper approach (i.e. fitness function is the classifier’s error rate) Used two phases during the program run: training/testing phase and validation phase

System Overview Pre- Processing Module All Extracted features N Feature selection/weighting Module (GA) Evaluation Module (KNN classifier) Feature subset Assessment of feature subset Evaluation Best feature subset (M <N) Training/Testing Validation Feature Extraction Module Input (isolated handwritten digits images) Clean images

Results (Comparison 1) Effect of varying weight values on the number of selected features As the number of weight values increase, the probability of a feature having weight value=0 (POZ) decreases, so the number of eliminated features decreases GFS eliminates more features (thus selects less features) than GFW because of its smaller number of weight values (0/1) and without compromising classification accuracy

Results (Comparison 2) Performance of genetic feature selection/weighting in the presence of irrelevant features The performance of 1-NN classifier rapidly degrades by increasing the number of irrelevant features As the number of irrelevant features increases, FS outperform all FW settings in both classification accuracy and elimination of features

Results (Comparison 3) Performance of genetic feature selection/weighting in the presence of redundant features The classification accuracy of 1-NN does not suffer so much by adding redundant features, but they increase the problem size As the number of redundant features increases, FS has slightly better classification accuracy than all FW settings, but significantly outperform FW in elimination of features

Results (Comparison 4) Performance of genetic feature selection/weighting for regular cases (not necessarily having irrelevant/redundant) FW has better training accuracies than FS, but FS is better in generalization (have better accuracies for unseen validation samples) FW over-fits the training samples

Results (Evaluation 1) Convergence of GFS to an Optimal or Near-Optimal Set of Features Number of features Best Exhaustive (class. rate %) Best GA (class. rate %) Average GA (5 runs) GFS was able to return optimal or near-optimal values (reached by the exhaustive search) The worst average value obtained by GFS less than 1% away from optimal value

Results (Evaluation 2) Number of Features Best Exh. (opt. & near-opt.) Exhaustive Run Time Best GA Average GA (for 5 runs) Number of Generations GA Run Time (single run) 874, minutes minutes , 7513 minutes minutes , 7747 minutes minutes 1479, hours minutes , 796 hours minutes , days minutes Convergence of GFS to an Optimal or Near-Optimal Set of Features within an Acceptable Number of Generations The time needed for GFS is bounded by (lower) linear-fit and (upper) exponential-fit curves The use of GFS for highly dimensional problems need parallel processing

Conclusions GFS is superior to GFW in feature reduction and without compromising classification accuracy In the presence of irrelevant features, GFS is better than GFW in both feature reduction and classification accuracy In the presence of redundant features, GFS is also preferred over GFW due its increased ability to feature reduction For regular databases, it is advisable to use 2 or 3 weight values at most to avoid over-fitting GFS is a reliable method to find optimal or near-optimal solution, but need parallel processing for large problem sizes

Questions ?