Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic-Algorithm-Based Instance and Feature Selection

Similar presentations


Presentation on theme: "Genetic-Algorithm-Based Instance and Feature Selection"— Presentation transcript:

1 Genetic-Algorithm-Based Instance and Feature Selection
Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii

2 Abstract GA based approach for selecting a small number of instances from a given data set in a pattern classification problem. To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set.

3 Genetic Algorithm Coding Fitness function
Binary string of the length (n+m) ai : inclusion or exclusion of the i-th feature sp : the inclusion or exclusion of the p-th instance Fitness function Minimize |F|, minimize |P|, and maximize g(S) |F| : number of selected feature |P| : number of selected instance g(S) : classification performance

4 Genetic Algorithm Performance measure (first one) : gA(S)
The number of correctly classified instances Minimize |P| subject to gA(S) = m Performance measure (second one) : gB(S) When an instance xq was included in the reference set, xq was not selected as its own nearest neighbor. fitness

5 Genetic Algorithm Initialization
Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string Randomly select a pair of strings Apply a uniform crossover Apply a mutation operator Generation Update: Select the Npop best string from 2Npop Termination test

6 Numerical Example

7 Biased Mutation For effectively decreasing the number of selected instances is to bias the mutation probability In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0.

8 Data sets 2 artificial + 4 real Normal distribution with small overlap
Normal distribution with large overlap Iris data Appendicitis Data Cancer Data Wine Data

9 Parameter Specifications
Pop Size : 50 Crossover Prob. : 1.0 Mutation Prob. Pm = 0.01 for feature selection Pm(1  0) = 0.1 for instance selection Pm(0  1) = 0.01 for instance selection Stopping condition : 500 gen. Weight values : Wg = 5; WF = 1; WP = 1 Performance measure : gA(S) or gB(S) 30 trials for each data

10 Performance on Training Data

11 Performance on Test Data
Leaving-one-out procedure (iris & appendicitis) 10-fold cross-validation (cancer & wine)

12 Effect of Feature Selection

13 Effect on NN

14 Some Variants


Download ppt "Genetic-Algorithm-Based Instance and Feature Selection"

Similar presentations


Ads by Google