Classifier Representation in LCS James Marshall and Tim Kovacs
Classifier Representations We are comparing traditional LCS representations with alternatives from different classification algorithms E.g. Artificial Immune Systems (AIS)
LCS Classifiers Classifier conditions in LCS are specified in a ternary alphabet and look like this: 00#1011##0 Classifiers match instances if all their bits match, apart from wildcards which match 0 or 1, e.g.: 00#1011##0 (Classifier) 0011011010 (Instance)
LCS Classifiers So, classifiers match instances on a d-dimensional hyperplane d is number of # in condition Classifiers specify an action as well as a condition In classification, this can be a predicted class for matched instances: 00#1011##0:1
AIS Classifiers Hyperplanes are not the only shape An obvious alternative classifier representation comes from one AIS representation Classifiers match instances if the Hamming distance between them is below a threshold i.e. hyperspheres of given radius
Representation Comparison Q: Apart from the obvious differences in calculating matches, how do LCS and AIS representations differ? A: quite a lot Instances covered by classifiers changes in different ways with size Search space size for classifiers is substantially different
Instance Coverage Hyperplane coverage varies with dimension: 2d Hypersphere coverage varies with problem size and radius:
Instance Coverage
Classifier Search Space Number of possible hyperspheres changes with problem size, but is constant for any given radius: 2n Number of possible hyperplanes changes with dimension and problem size:
Classifier Search Space N.B. as n increases i.e. hypersphere search space much smaller than hyperplane search space
Comparing Classifier Performance on Multiplexers
Multiplexers A longstanding testbed for LCS Instances consist of address bits and data bits Instance class given by value of addressed data bit Typical multiplexer sizes used are 6 (2 + 22) and 11 (3 + 23) 010 00101001
Proofs It’s easy to prove the following theorems for the multiplexer: 100% accurate hyperplanes always possible 100% accurate hyperspheres never possible Hyperspheres must be paired and have specificity to be 100% accurate Hyperspheres must have variable radius to avoid ambiguity Proposition: more hyperspheres required than hyperplanes to accurately classify instance space
Enumeration of Classifiers 11-multiplexer is small enough to enumerate classifiers and look at accuracy distribution i.e. measure percentage of instances covered by a classifier that belong to same (majority) class Let’s do this just for the smallest classifiers of comparable size that generalise (i.e. dimension 2, or radius 1)…
Enumeration of Classifiers N.B. 100% accurate classifiers are the mode for 2-dimensional hyperplanes, no 100% accurate hyperspheres exist… …as predicted by theorems 1 and 2
Enumeration of Classifiers For 4-d hyperplane, 75% accurate classifiers are the mode ~25% of all classifiers are 100% accurate Could help explain Tim’s result* on effectiveness of selection and reinforcement of randomly generated rules (i.e. no GA rule exploration)? *Kovacs & Kerber. GECCO 2004, LNCS 3103, 785-796
XCSphere Extended an existing XCS implementation to use hyperspheres instead of hyperplanes Restrict to binary alphabet instead of ternary Hamming distance < radius matching rule Generalisation of hyperspheres Proper superset condition easy to evaluate
Evaluation Results on 11-multiplexer: XCS XCSphere
Comparing Classifier Performance on Hypersphere Function
Hypersphere Function We decided a new function, whose most efficient representation is with hyperspheres Given a boolean function of odd length, assign class 0 to all instances closest to al 0s string, and class 1 to all other instances
Evaluation Results on hypersphere function: XCS XCSphere
XCSphere: Multiple Representation XCS
Competing Representations Competition between overlapping classifiers is intense in XCS We can use this to implement a hybrid XCS with hyperplane and hypersphere classifiers Seed initial population with 50% of each, similarly during covering Sphere and plane classifiers can’t recombine, hence are like different species
Evaluation Results for XCSphere: Multiplexer Hypersphere function
XCSphere Results XCSphere achieved performance generally better, across all three problems, than specific XCS versions XCSphere slower to converge on multiplexer than XCS with hyperplanes …but, weak evidence that XCSphere faster to converge on sphere function than XCS with hyperspheres
Summary Hybrid representations in a single classifier systems: A useful way to mitigate representational bias? Possibility of evolving representations?