Genetic-Algorithm-Based Instance and Feature Selection

Slides:



Advertisements
Similar presentations
Algorithm Design Techniques
Advertisements

ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Fuzzy Genetics-based Machine Learning Algorithms Presented by Vahid Jazayeri.
Data Mining Techniques Outline
Genetic Algorithm for Variable Selection
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Neural Optimization of Evolutionary Algorithm Strategy Parameters Hiral Patel.
Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.
Genetic Algorithms and Ant Colony Optimisation
Efficient Model Selection for Support Vector Machines
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Last lecture summary. SOM supervised x unsupervised regression x classification Topology? Main features? Codebook vector? Output from the neuron?
1 Genetic Algorithms and Ant Colony Optimisation.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
Solving Function Optimization Problems with Genetic Algorithms September 26, 2001 Cho, Dong-Yeon , Tel:
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
The Implementation of Genetic Algorithms to Locate Highest Elevation By Harry Beddo.
Genetic Algorithms. Solution Search in Problem Space.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Advanced AI – Session 7 Genetic Algorithm By: H.Nematzadeh.
A MapReduced Based Hybrid Genetic Algorithm Using Island Approach for Solving Large Scale Time Dependent Vehicle Routing Problem Rohit Kondekar BT08CSE053.
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Introduction to genetic algorithm
Using GA’s to Solve Problems
Chapter 14 Genetic Algorithms.
Genetic Algorithm in TDR System
Genetic Algorithms.
Data Mining Practical Machine Learning Tools and Techniques
Instance Based Learning
CH 5: Multivariate Methods
Artificial Intelligence Project 2 Genetic Algorithms
COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.
An evolutionary approach to solving complex problems
Artificial Intelligence (CS 370D)
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Gene Selection for Microarray-based Cancer Classification Using Genetic Algorithm 이 정문 2003/04/01 BI Lab.
CSSE463: Image Recognition Day 17
CS621: Artificial Intelligence
Optimization and Learning via Genetic Programming
Machine Learning Techniques for Data Mining
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Introduction to Data Mining, 2nd Edition
Flower Pollination Algorithm
Behrouz Minaei, William Punch
Instance Based Learning
CSSE463: Image Recognition Day 17
EE368 Soft Computing Genetic Algorithms.
Boltzmann Machine (BM) (§6.4)
Artificial Intelligence 9. Perceptron
Introduction to Genetic Algorithm and Some Experience Sharing
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
COSC 4335: Part2: Other Classification Techniques
Genetic algorithms: case study
CSSE463: Image Recognition Day 17
Beyond Classical Search
Presentation transcript:

Genetic-Algorithm-Based Instance and Feature Selection Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii

Abstract GA based approach for selecting a small number of instances from a given data set in a pattern classification problem. To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set.

Genetic Algorithm Coding Fitness function Binary string of the length (n+m) ai : inclusion or exclusion of the i-th feature sp : the inclusion or exclusion of the p-th instance Fitness function Minimize |F|, minimize |P|, and maximize g(S) |F| : number of selected feature |P| : number of selected instance g(S) : classification performance

Genetic Algorithm Performance measure (first one) : gA(S) The number of correctly classified instances Minimize |P| subject to gA(S) = m Performance measure (second one) : gB(S) When an instance xq was included in the reference set, xq was not selected as its own nearest neighbor. fitness

Genetic Algorithm Initialization Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string Randomly select a pair of strings Apply a uniform crossover Apply a mutation operator Generation Update: Select the Npop best string from 2Npop Termination test

Numerical Example

Biased Mutation For effectively decreasing the number of selected instances is to bias the mutation probability In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0.

Data sets 2 artificial + 4 real Normal distribution with small overlap Normal distribution with large overlap Iris data Appendicitis Data Cancer Data Wine Data

Parameter Specifications Pop Size : 50 Crossover Prob. : 1.0 Mutation Prob. Pm = 0.01 for feature selection Pm(1  0) = 0.1 for instance selection Pm(0  1) = 0.01 for instance selection Stopping condition : 500 gen. Weight values : Wg = 5; WF = 1; WP = 1 Performance measure : gA(S) or gB(S) 30 trials for each data

Performance on Training Data

Performance on Test Data Leaving-one-out procedure (iris & appendicitis) 10-fold cross-validation (cancer & wine)

Effect of Feature Selection

Effect on NN

Some Variants