Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British Columbia Department of Electrical & Computer Engineering
Outline Introduction & Problem Definition Motivation & Objectives System Overview Results Conclusions
Scanning Pre-Processing Feature Extraction Classification Post-Processing Text document Classified text Introduction Off-line Character Recognition System Address readers Bank Cheques readers Reading data entered in forms (tax forms) Detecting forged signatures
Introduction Many variants of character (symbol) shape, size. Different writers have different writing styles. Same person could have different writing style. Thus, unlimited number of variations for a single character exists. For typical handwritten recognition task:
Variations in handwritten digits extracted from zip codes To overcome this diversity, a large number of features must be added Introduction An example of features that we used are: moment invariants, number of loops, number of end points, centroid, area, circularity and so on. L=2, E=0 L=1, E=1 L=0, E=3
Add more features Increase problem size Dilemma Increase run time/memory for classification To accommodate variations in symbols Hope to increase classification accuracy Character Recognition System Problem Add-hoc process, depends on experience and trail and error Might add redundant/irrelevant features which decrease the accuracy
Advantages Feature Selection Solution: Feature Selection Definition: Select a relevant subset of features from a larger set of features while maintaining or enhancing accuracy Remove irrelevant and redundant features Total of 40 features -> reduced to 16 7 Hu moments -> only first three Area removed -> redundant (Circularity) Maintain/enhance the classification accuracy 70% recognition rate using 40 features -> 75% after FS & using only 16 features Faster classification and less memory requirements
Feature Selection/Weighting The process of assigning weights (binary or real valued) to features needs a search algorithm to search for the set of weights that results in best classification accuracy (optimization problem) Genetic algorithm is a good search method for optimization problems Feature Selection (FS)Feature Weighting (FW) Special CaseGeneral Case Binary weights (0 for irrelevant/redundant & 1 for relevant) Real-valued weights (variable weights depending on the feature relevance) Number of feature subset combinations
Genetic Feature Selection/Weighting Has been proven to be a powerful search method for FS problem Does not require derivative information or any extra knowledge; only the objective function (classifier’s error rate) to evaluate the quality of the feature subset Search a population of solutions in parallel, so they can provide a number of potential solutions not only one GA is resistant to becoming trapped in local minima Why use GA for FS/FW
Objectives & Motivations Study the effect of varying weight values on the number of selected features (FS often eliminates more features than FW, how much ??) Compare the performance of genetic feature selection/weighting in the presence of irrelevant & redundant features (not studied before) Compare the performance of genetic feature selection/weighting for regular cases (test the hypothesis that says that FW should have better or at least same results as FS ??) Evaluate the performance of the better method (GFS or GFW) in terms of optimality and time complexity (study the feasibility of genetic search for optimality & time) Build a genetic feature selection/weighting system to be applied to character recognition problem and investigate the following issues:
Methodology The recognition problem is to classify isolated handwritten digits Used k-nearest-neighbor as a classifier (k=1) Used genetic algorithm as search method Applied genetic feature selection and weighting in the wrapper approach (i.e. fitness function is the classifier’s error rate) Used two phases during the program run: training/testing phase and validation phase
System Overview Pre- Processing Module All Extracted features N Feature selection/weighting Module (GA) Evaluation Module (KNN classifier) Feature subset Assessment of feature subset Evaluation Best feature subset (M <N) Training/Testing Validation Feature Extraction Module Input (isolated handwritten digits images) Clean images
Results (Comparison 1) Effect of varying weight values on the number of selected features As the number of weight values increase, the probability of a feature having weight value=0 (POZ) decreases, so the number of eliminated features decreases GFS eliminates more features (thus selects less features) than GFW because of its smaller number of weight values (0/1) and without compromising classification accuracy
Results (Comparison 2) Performance of genetic feature selection/weighting in the presence of irrelevant features The performance of 1-NN classifier rapidly degrades by increasing the number of irrelevant features As the number of irrelevant features increases, FS outperform all FW settings in both classification accuracy and elimination of features
Results (Comparison 3) Performance of genetic feature selection/weighting in the presence of redundant features The classification accuracy of 1-NN does not suffer so much by adding redundant features, but they increase the problem size As the number of redundant features increases, FS has slightly better classification accuracy than all FW settings, but significantly outperform FW in elimination of features
Results (Comparison 4) Performance of genetic feature selection/weighting for regular cases (not necessarily having irrelevant/redundant) FW has better training accuracies than FS, but FS is better in generalization (have better accuracies for unseen validation samples) FW over-fits the training samples
Results (Evaluation 1) Convergence of GFS to an Optimal or Near-Optimal Set of Features Number of features Best Exhaustive (class. rate %) Best GA (class. rate %) Average GA (5 runs) GFS was able to return optimal or near-optimal values (reached by the exhaustive search) The worst average value obtained by GFS less than 1% away from optimal value
Results (Evaluation 2) Number of Features Best Exh. (opt. & near-opt.) Exhaustive Run Time Best GA Average GA (for 5 runs) Number of Generations GA Run Time (single run) 874, minutes minutes , 7513 minutes minutes , 7747 minutes minutes 1479, hours minutes , 796 hours minutes , days minutes Convergence of GFS to an Optimal or Near-Optimal Set of Features within an Acceptable Number of Generations The time needed for GFS is bounded by (lower) linear-fit and (upper) exponential-fit curves The use of GFS for highly dimensional problems need parallel processing
Conclusions GFS is superior to GFW in feature reduction and without compromising classification accuracy In the presence of irrelevant features, GFS is better than GFW in both feature reduction and classification accuracy In the presence of redundant features, GFS is also preferred over GFW due its increased ability to feature reduction For regular databases, it is advisable to use 2 or 3 weight values at most to avoid over-fitting GFS is a reliable method to find optimal or near-optimal solution, but need parallel processing for large problem sizes
Questions ?