Classifier Representation in LCS

Slides:



Advertisements
Similar presentations
G53MLE | Machine Learning | Dr Guoping Qiu
Advertisements

More Set Definitions and Proofs 1.6, 1.7. Ordered n-tuple The ordered n-tuple (a1,a2,…an) is the ordered collection that has a1 as its first element,
Computer Vision Lecture 18: Object Recognition II
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
EC: Lecture 17: Classifier Systems Ata Kaban University of Birmingham.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202 Fall 2007 Introduction to Classification Greg Grudic.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.
Genetic Algorithms CS121 Spring 2009 Richard Frankel Stanford University 1.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Schemata Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Why Bother with Theory? Might provide performance.
CS Machine Learning Genetic Algorithms (II).
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
Another Example: Circle Detection
Today’s Lecture Neural networks Training
Chapter 14 Genetic Algorithms.
PREDICT 422: Practical Machine Learning
Clustering MacKay - Chapter 20.
Hierarchical Clustering: Time and Space requirements
SIMILARITY SEARCH The Metric Space Approach
Particle Swarm Optimization (2)
Design and Analysis of Algorithms Chapter -2
Fun with Hyperplanes: Perceptrons, SVMs, and Friends
Dan Roth Department of Computer and Information Science
Instance Based Learning
School of Computer Science & Engineering
Data Mining K-means Algorithm
Heuristic Functions.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Alternative Representations for Artificial Immune Systems
Haim Kaplan and Uri Zwick
Recognition: Face Recognition
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Mean Shift Segmentation
Advanced Artificial Intelligence Feature Selection
CS 4/527: Artificial Intelligence
Efficient Distance Computation between Non-Convex Objects
Instance Based Learning (Adapted from various sources)
Niloy Ganguly, Andreas Deutsch Center for High Performance Computing
Machine Learning Chapter 3. Decision Tree Learning
Chapter 8: Functions of Several Variables
Chapter 3 Brute Force Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Machine Learning Chapter 3. Decision Tree Learning
Alan Mishchenko UC Berkeley (With many thanks to Donald Knuth,
Alan Mishchenko UC Berkeley (With many thanks to Donald Knuth for
3. Brute Force Selection sort Brute-Force string matching
Bayesian Classification
The Naïve Bayes (NB) Classifier
Analysis of Algorithms
3. Brute Force Selection sort Brute-Force string matching
Machine Learning: UNIT-4 CHAPTER-2
Text Categorization Berlin Chen 2003 Reference:
Information Retrieval
Fourier Transform of Boundaries
Supervised machine learning: creating a model
Version Space Machine Learning Fall 2018.
Chapter 9 Genetic Algorithms
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Classifier Representation in LCS James Marshall and Tim Kovacs

Classifier Representations We are comparing traditional LCS representations with alternatives from different classification algorithms E.g. Artificial Immune Systems (AIS)

LCS Classifiers Classifier conditions in LCS are specified in a ternary alphabet and look like this: 00#1011##0 Classifiers match instances if all their bits match, apart from wildcards which match 0 or 1, e.g.: 00#1011##0 (Classifier) 0011011010 (Instance)

LCS Classifiers So, classifiers match instances on a d-dimensional hyperplane d is number of # in condition Classifiers specify an action as well as a condition In classification, this can be a predicted class for matched instances: 00#1011##0:1

AIS Classifiers Hyperplanes are not the only shape An obvious alternative classifier representation comes from one AIS representation Classifiers match instances if the Hamming distance between them is below a threshold i.e. hyperspheres of given radius

Representation Comparison Q: Apart from the obvious differences in calculating matches, how do LCS and AIS representations differ? A: quite a lot Instances covered by classifiers changes in different ways with size Search space size for classifiers is substantially different

Instance Coverage Hyperplane coverage varies with dimension: 2d Hypersphere coverage varies with problem size and radius:

Instance Coverage

Classifier Search Space Number of possible hyperspheres changes with problem size, but is constant for any given radius: 2n Number of possible hyperplanes changes with dimension and problem size:

Classifier Search Space N.B. as n increases i.e. hypersphere search space much smaller than hyperplane search space

Comparing Classifier Performance on Multiplexers

Multiplexers A longstanding testbed for LCS Instances consist of address bits and data bits Instance class given by value of addressed data bit Typical multiplexer sizes used are 6 (2 + 22) and 11 (3 + 23) 010 00101001

Proofs It’s easy to prove the following theorems for the multiplexer: 100% accurate hyperplanes always possible 100% accurate hyperspheres never possible Hyperspheres must be paired and have specificity to be 100% accurate Hyperspheres must have variable radius to avoid ambiguity Proposition: more hyperspheres required than hyperplanes to accurately classify instance space

Enumeration of Classifiers 11-multiplexer is small enough to enumerate classifiers and look at accuracy distribution i.e. measure percentage of instances covered by a classifier that belong to same (majority) class Let’s do this just for the smallest classifiers of comparable size that generalise (i.e. dimension 2, or radius 1)…

Enumeration of Classifiers N.B. 100% accurate classifiers are the mode for 2-dimensional hyperplanes, no 100% accurate hyperspheres exist… …as predicted by theorems 1 and 2

Enumeration of Classifiers For 4-d hyperplane, 75% accurate classifiers are the mode ~25% of all classifiers are 100% accurate Could help explain Tim’s result* on effectiveness of selection and reinforcement of randomly generated rules (i.e. no GA rule exploration)? *Kovacs & Kerber. GECCO 2004, LNCS 3103, 785-796

XCSphere Extended an existing XCS implementation to use hyperspheres instead of hyperplanes Restrict to binary alphabet instead of ternary Hamming distance < radius matching rule Generalisation of hyperspheres Proper superset condition easy to evaluate

Evaluation Results on 11-multiplexer: XCS XCSphere

Comparing Classifier Performance on Hypersphere Function

Hypersphere Function We decided a new function, whose most efficient representation is with hyperspheres Given a boolean function of odd length, assign class 0 to all instances closest to al 0s string, and class 1 to all other instances

Evaluation Results on hypersphere function: XCS XCSphere

XCSphere: Multiple Representation XCS

Competing Representations Competition between overlapping classifiers is intense in XCS We can use this to implement a hybrid XCS with hyperplane and hypersphere classifiers Seed initial population with 50% of each, similarly during covering Sphere and plane classifiers can’t recombine, hence are like different species

Evaluation Results for XCSphere: Multiplexer Hypersphere function

XCSphere Results XCSphere achieved performance generally better, across all three problems, than specific XCS versions XCSphere slower to converge on multiplexer than XCS with hyperplanes …but, weak evidence that XCSphere faster to converge on sphere function than XCS with hyperspheres

Summary Hybrid representations in a single classifier systems: A useful way to mitigate representational bias? Possibility of evolving representations?