Download presentation
Presentation is loading. Please wait.
Published byNeil Bates Modified over 6 years ago
1
Genetic-Algorithm-Based Instance and Feature Selection
Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii
2
Abstract GA based approach for selecting a small number of instances from a given data set in a pattern classification problem. To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set.
3
Genetic Algorithm Coding Fitness function
Binary string of the length (n+m) ai : inclusion or exclusion of the i-th feature sp : the inclusion or exclusion of the p-th instance Fitness function Minimize |F|, minimize |P|, and maximize g(S) |F| : number of selected feature |P| : number of selected instance g(S) : classification performance
4
Genetic Algorithm Performance measure (first one) : gA(S)
The number of correctly classified instances Minimize |P| subject to gA(S) = m Performance measure (second one) : gB(S) When an instance xq was included in the reference set, xq was not selected as its own nearest neighbor. fitness
5
Genetic Algorithm Initialization
Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string Randomly select a pair of strings Apply a uniform crossover Apply a mutation operator Generation Update: Select the Npop best string from 2Npop Termination test
6
Numerical Example
7
Biased Mutation For effectively decreasing the number of selected instances is to bias the mutation probability In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0.
8
Data sets 2 artificial + 4 real Normal distribution with small overlap
Normal distribution with large overlap Iris data Appendicitis Data Cancer Data Wine Data
9
Parameter Specifications
Pop Size : 50 Crossover Prob. : 1.0 Mutation Prob. Pm = 0.01 for feature selection Pm(1 0) = 0.1 for instance selection Pm(0 1) = 0.01 for instance selection Stopping condition : 500 gen. Weight values : Wg = 5; WF = 1; WP = 1 Performance measure : gA(S) or gB(S) 30 trials for each data
10
Performance on Training Data
11
Performance on Test Data
Leaving-one-out procedure (iris & appendicitis) 10-fold cross-validation (cancer & wine)
12
Effect of Feature Selection
13
Effect on NN
14
Some Variants
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.