1 Learning Classifier Systems Andrew Cannon Angeline Honggowarsito 1.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm
1 Wendy Williams Metaheuristic Algorithms Genetic Algorithms: A Tutorial “Genetic Algorithms are good at taking large, potentially huge search spaces and.
Genetic Algorithms1 COMP305. Part II. Genetic Algorithms.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
Feature Selection for Regression Problems
Reduced Support Vector Machine
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
7/2/2015Intelligent Systems and Soft Computing1 Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Introduction,
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Learning Classifier Systems Dominic Cockman, Jesper Madsen, Qiuzhen Zhu 1.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Algorithms: A Tutorial
Introduction to Learning Classifier Systems Dr. J. Bacardit, Dr. N. Krasnogor G53BIO - Bioinformatics.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Genetic Algorithm.
Issues with Data Mining
Genetic Algorithms and Ant Colony Optimisation
Evolutionary Intelligence
© Negnevitsky, Pearson Education, CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University.
Efficient Model Selection for Support Vector Machines
Ch. Eick: Evolutionary Machine Learning Classifier Systems n According to Goldberg [113], a classifier system is “a machine learning system that learns.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Genetic algorithms Prof Kang Li
Ch. Eick: Evolutionary Machine Learning n Different Forms of Learning: Learning agent receives feedback with respect to its actions (e.g. from a teacher)
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
Genetic Algorithms Genetic algorithms imitate a natural optimization process: natural selection in evolution. Developed by John Holland at the University.
1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.
ART – Artificial Reasoning Toolkit Evolving a complex system Marco Lamieri
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
LCS Case Studies BRANDEN PAPALIA, JAMES PATRICK, MICHAEL STEWART FACULTY OF ENGINEERING, COMPUTING AND MATHEMATICS.
© Negnevitsky, Pearson Education, Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Introduction,
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Niching Genetic Algorithms Motivation The Idea Ecological Meaning Niching Techniques.
 Negnevitsky, Pearson Education, Lecture 9 Evolutionary Computation: Genetic algorithms n Introduction, or can evolution be intelligent? n Simulation.
Ch. Eick: Evolutionary Machine Learning n Different Forms of Learning: Learning agent receives feedback with respect to its actions (e.g. from a teacher)
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Learning Classifier Systems BRANDEN PAPALIA, MICHAEL STEWART, JAMES PATRICK FACULTY OF ENGINEERING, COMPUTING AND MATHEMATICS.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Edge Assembly Crossover
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
Learning Classifier Systems (Introduction) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering and Computer Science Victoria University.
Data Mining and Decision Support
Towards a Mapping of Modern AIS and Learning Classifier Systems Larry Bull Department of Computer Science & Creative Technologies University of the West.
1 Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations Genetic Algorithm (GA)
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Novel Approaches to Optimised Self-configuration in High Performance Multiple Experts M.C. Fairhurst and S. Hoque University of Kent UK A.F. R. Rahman.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations.
Introduction to Machine Learning, its potential usage in network area,
Selected Topics in CI I Genetic Programming Dr. Widodo Budiharto 2014.
Genetic Algorithms.
Rule Induction for Classification Using
Advanced Artificial Intelligence Evolutionary Search Algorithm
Artificial Intelligence Chapter 4. Machine Evolution
Genetic Algorithms: A Tutorial
Artificial Intelligence Chapter 4. Machine Evolution
Boltzmann Machine (BM) (§6.4)
Traveling Salesman Problem by Genetic Algorithm
Evolutionary Ensembles with Negative Correlation Learning
Genetic Algorithms: A Tutorial
Presentation transcript:

1 Learning Classifier Systems Andrew Cannon Angeline Honggowarsito 1

2  Introduction to LCS / LCS Metaphor  The Driving Mechanism ◦ Learning ◦ Evolution  Minimal Classifier System  Michigan VS Pittsburgh  Categories of LCS  Optimisation  Application: data mining Contents

Introduction  Our world is a Complex System ◦ Interconnected parts ◦ Properties exhibited by collective parts might be different from individual parts  Adaptive ◦ Capacity to change and learn from experience 3

Introduction  John Holland in 1975 ◦ New York city, as a system that exists in a steady state of operation, made up of “buyers, sellers, administrations, streets, bridges, and buildings that are always changing. Like the standing wave in front of a rock in a fast-moving stream, a city is a pattern in time.” 4

5 Rule-Based Agents  Represented by rule-based agents. ◦ Agents - Single Components  IF condition THEN action  Use system’s environment information to make decision

Metaphor  Two biological Metaphors: ◦ Evolution ◦ Learning  Genetic Algorithm & Learning Mechanism  Environment of the system  Example ◦ Robots navigating maze environment 6

The Driving Mechanism  Discovery - The Genetic Algorithm ◦ Rule Discovery ◦ Apply Genetic Algorithm  The fitness function quantifies the optimality of a given rule ◦ Classification Accuracy most widely used as metric of fitness 7

The Driving Mechanism  Learning ◦ “The improvement of performance in some environment through the acquisition of knowledge resulting from experience in that environment.” ◦ Each classifier has one or more parameters ◦ Iteratively update the parameters 8

Learning  Purposes: ◦ Identify useful classifiers ◦ Discovery of better rules  Different problem domains require different styles of learning.  Learning based on the information provided ◦ Batch Learning  Training instances presented simultaneously.  End result: rule set that does not change with respect to time. 9

Learning  Incremental learning ◦ One training instances at a time ◦ End result:  Rule set that changes continuously  Learning based on type of feedback ◦ Supervised Learning ◦ Reinforcement Learning 10

Minimal Classifier System  Basic LCS Implementation  Developed by Larry Bull  Advancing LCS theory  Designed to understand more complex implementations, instead of solving real world problems. 11

Minimal Classifier System 12

Minimal Classifier System  Input Data ◦ 4 Digit binary Number  Learning Iteratively, one instance at a time  Population [N] ◦ Condition {C} ◦ Action {A} ◦ Fitness Parameter {F}  Population is randomly initialized 13

Population  Condition ◦ String of “0, 1, #” ◦ 00#1 matches 0011 or 0001  Action ◦ Which action is possible (0 or 1)  Fitness Parameter ◦ How good is the classifier CAF 00#1188 0##1017 # #191 0# #

Match Set  Population Scanned  Match Set : List of rules whose condition matches the input string at each position  Input = 0011 CAF 00#1188 0##1017 # #191 0# #007 CAF 00#1188 0## #191 0#11066 Population Match set 15

Action Set  Action Set established using explore/exploit scheme by alternating between: ◦ Select action found in M (Explore) ◦ Select deterministically with prediction array CAF 00#1188 0## # #066 00# #191 Action 1179 Action 083 Match set Action set Prediction array 16

Minimal Classifier System  Prediction array: List of prediction values calculated for each action  Prediction value: sum of fitness values found in the subset of M advocating the same action  Learning starts when the reward is received 17

Michigan VS Pittsburgh  Holland proposed Michigan style  Pittsburgh was proposed by Kenneth Dejong and his student  Main Distinction between two approaches: ◦ Individuals structure ◦ Problem solving structure ◦ Individuals competition/competitors ◦ Online VS offline learning 18

Michigan VS Pittsburgh  Structure of the individual ◦ Michigan, each individual is a classifier, entire population is the problem solution ◦ Pittsburgh, each individual is a set of classifiers representing a solution  Individual competition / cooperation ◦ Michigan, apply individual competition ◦ Pittsburgh, apply individual cooperation 19

Michigan VS Pittsburgh  Offline / Online Learning ◦ Michigan apply online or offline learning, while Pittsburgh apply offline learning  Problem solution ◦ Michigan, distribute problem solution ◦ Pittsburgh, compact problem solution 20

21  Sigaud, O & Wilson, SW 2007, ‘Learning classifier systems: a survey’, Soft Computing, vol. 11, no. 11, pp  Urbanowicz, RJ & Moore, JH 2009, ‘Learning Classifier Systems: A Complete Introduction, Review and Roadmap’, Journal of Artificial Evolution and Applications, vol. 2009, 25 pages. Sources

22  Categories of LCS ◦ Strength based (ZCS) ◦ Accuracy based (XCS) ◦ Anticipation based (ALCS)  Optimisation  Application: data mining Contents

23  Zeroth level classifier systems (ZCS)  Introduced by Wilson in 1994 Strength based (ZCS)

24  Zeroth level classifier systems (ZCS)  Introduced by Wilson in 1994 Strength based (ZCS) Fixed size population of rules

25  Rules maps conditions to actions Strength based (ZCS) Condition → Action

26  Rules map conditions to actions  Fitness is the predicted accumulated reward (initialised to some value S 0 ) Strength based (ZCS) Condition → ActionFitness

27 Strength based (ZCS) Match set M Stimuli

28  Select an action from M via roulette wheel selection Strength based (ZCS) Match set M P(rule) = Fitness(rule) / (sum of fitnesses of all rules in M)

29  All the rules in M that advocated the action from the selected rule become the action set A Strength based (ZCS) Action set A

30  Distribute rewards  First reward classifiers from the previous step 1. Take the sum of fitnesses of rules in the current action set A 2. Multiple by discount factor γ and learning factor α 3. Distribute equally among the members of the action set from the previous step Strength based (ZCS)

31  Distribute rewards  Then receive a reward from the environment as a result of executing the current action 1. Multiply the reward received from the environment by learning rate α 2. Distribute equally among the members of the current action set Strength based (ZCS)

32  Penalise other members of the match set  Multiple the fitness of each member of the current match set that is not contained in the current action set (M\A) by tax τ Strength based (ZCS)

33  At each step, a genetic algorithm is run with probability p  Two members of the global rule population are selected via roulette wheel selection  Two offspring are produced via one point crossover and mutation with fitnesses given by the average fitness of their parents  Two members of the global population are deleted via roulette wheel selection based on inverse fitnesses Strength based (ZCS)

34  Run a covering operator if no rules match the environmental stimuli (match set M is empty) or if every rule in the match set has fitness equal to or less than some fraction Ф of the population average  Create a new rule with random action and average fitness that fires under the current stimuli (possibly generalised). Replace an existing rule selected via roulette wheel selection based on inverse fitness Strength based (ZCS)

35  Suggested parameters (Bull and Hurst 2002) ◦ Population size of 400 ◦ Initial rule fitness S 0 = 20.0 ◦ Learning rate α = 0.2 ◦ Discount factor γ = 0.71 ◦ Tax τ = 0.1 ◦ Genetic algorithm run with probability p = 0.25 ◦ Cover operating firing fraction Ф = 0.5 Strength based (ZCS)

36  May not fully represent problem space ZCS Disadvantages

37  Extended classifier system  Most studied and widely used family of LCS  Each rule predicts a particular reward (and error)  Each rule has a particular fitness  Retain rules that predict lower rewards as long as those predictions are accurate Accuracy based (XCS)

38  Population of rules (initially empty but bounded to some size P) specifying actions in response to conditions  Match set formed in response to stimuli from environment  Action selected from match set ◦ Highest fitness ◦ Roulette wheel selection ◦ Alternation between exploration and exploitation  Rules advocating the same action form the action set Accuracy based (XCS)

39  Receive a reward r from the environment for executing the specified action  Update the predicted reward for each rule in the action set ◦ p ← p + β (r-p)  Update the predicted error for each rule in the action set ◦ ε ← ε + β (|r-p| - ε) ◦ β = estimation rate Accuracy based (XCS)

40  If ε < ε 0, set prediction accuracy k=1  Otherwise, set prediction accuracy ◦ k = α(ε 0 /ε) v for some α,v>0  Calculate relative prediction accuracy ◦ k’ = k(rule) / (sum of k for all rules in action set)  Update the fitness of each rule ◦ f ← f + β (k’ - f) ◦ α = learning rate ◦ β = estimation rate Accuracy based (XCS)

41  Run genetic algorithm to introduce diversity and increase fitness of population  Run every θ GA time steps  Run on members of action set (rather than members of global population)  Favours accurate classifiers  Two parents selected via roulette wheel selection (based on fitness) produce two offspring via mutation and crossover Accuracy based (XCS)

42  Covering operator adds new rules when no rules match the current environmental condition (possibly generalised)  If the population exceeds its bounded size, the requisite number of rules are deleted via roulette wheel selection based on the average size of the action sets containing each rule Accuracy based (XCS)

43  Sometimes, when a new rule is added, it is checked whether or not a more general rule already exists – if it does, another copy of the more general rule is added instead of the more specific rule  Favours generalisation  Computationally expensive Accuracy based (XCS)

44  Suggested parameters (Butz and Wilson 2002) ◦ Maximum population size P=800 or P=2000 ◦ Learning rate α = 0.1 ◦ Estimation rate β = 0.2 ◦ Genetic algorithm run every θ GA = 25 time steps Accuracy based (XCS)

45  Anticipatory learning classifier systems  Anticipate the effect of taking a particular action under a particular condition  Optimise accuracy of predicted effects Anticipation based (ALCS) (Condition, Action) → Effect

46  Effects might specify that after taking an action (in a specific state) that particular environmental variables might stay the same (=), adopt a particular value or cannot be predicted (?) Anticipation based (ALCS)

47  For example, the rule [##1][a] → [1?=] would predict that after taking action a in a state matching the condition ##1, that the first environment variable is 1, the second cannot be predicted while the third does not change Anticipation based (ALCS)

48  For example, the rule [##1][a] → [1?=] would predict that after taking action a in a state matching the condition ##1, that the first environment variable is 1, the second cannot be predicted while the third does not change Anticipation based (ALCS)

49  For example, the rule [##1][a] → [1?=] would predict that after taking action a in a state matching the condition ##1, that the first environment variable is 1, the second cannot be predicted while the third does not change Anticipation based (ALCS)

50  For example, the rule [##1][a] → [1?=] would predict that after taking action a in a state matching the condition ##1, that the first environment variable is 1, the second cannot be predicted while the third does not change Anticipation based (ALCS)

51  For example, the rule [##1][a] → [1?=] would predict that after taking action a in a state matching the condition ##1, that the first environment variable is 1, the second cannot be predicted while the third does not change Anticipation based (ALCS)

52  In a grid based maze, if there is a wall to the North of an agent, moving North will result in no environmental variables changing  If there is a wall to the North of an agent and no wall to the East of an agent, moving East will result in there being a wall to the North West of the agent Anticipation based (ALCS)

53  Apply heuristics to specialise or generalise classifiers  May favour exploration over exploitation to be able to efficiently explore the problem space Anticipation based (ALCS)

54  In practice, seek to optimise ◦ Performance (quality of solution) ◦ Scalability ◦ Adaptability ◦ Speed  Ideal rule set is ◦ Correct ◦ Complete ◦ Compact/minimal ◦ Non-overlapping Optimisation

55  Kharbat, Odeh and Bull 2008  Perform data mining from a breast cancer data set to aid in diagnosis Application: data mining

56  UK health trust had a data set on breast cancer patients  Early diagnosis is very important  Seek to find patterns to aid in diagnosis  Each patient represented by 45 attributes that may be binary, categorical or real valued  Three grades of cancer aggressiveness (G1, G2, G3)  Seek to find patterns of data corresponding to each grade Application: data mining

57  Selected a sample of 1150 patients from the data set  Data needed to be pre-processed ◦ Normalise real-valued attributes to the range [0,1] ◦ Balance the three grades of cancer Application: data mining

58  Accuracy based LCS (XCS) used  Performance of XCS was compared to C4.5 (decision tree inductive learning technique)  Train XCS to match patterns in collections of attributes to grades of cancer aggressiveness (G1, G2, G3)  Parameters of XCS determined empirically – maximum population size of 10,000 found to be optimal Application: data mining

59  After training, rule set was compacted to make it more manageable using techniques such as removing low accuracy rules and clustering similar rules together  Domain experts asked to comment on the quality and usefulness of rules found and whether they revealed interesting or new information that could aid in diagnosis Application: data mining

60  2,901 rules were randomly selected from XCS and compacted to 300 to be examined by a domain expert  9 considered new or interesting  Some of the rules from the compacted set matched patterns that were already well-known  None contradicted existing knowledge  Not all rules were found to be useful Application: data mining

61  Performance of XCS was superior to C4.5 in terms of originality, quality, richness and descriptiveness of rules  More complicated rules  Very large rule set (300 when compacted) makes it more tedious to find interesting new results Application: data mining

62  XCS system used as a means to an end in this case  Alayón et al describe a system used to recognise patterns in medical images used for cancer diagnosis  Stone and Bull 2008 describe a system intended for foreign exchange trading Application: data mining

63  Alayón, S, Estévez, JI, Sigut, J, Sánchez, JL & Toledo, P 2006, ‘An evolutionary Michigan recurrent fuzzy system for nuclei classification in cytological images using nuclear chromatin distribution’, Journal of Biomedical Informatics, vol. 39, no. 6, pp  Bull, L & Hurst, J 2002, ‘ZCS redux’, Evolutionary computation, vol. 10, no. 2, pp  Butz, MV & Wilson, SW 2002, ‘An algorithmic description of XCS’, Soft Computing, vol. 6, no. 3, pp  Gérard, P, Meyer, J & Sigaud, O 2005, ‘Combining latent learning with dynamic programming in the modular anticipatory classifier system’, European Journal of Operational Research, vol. 160, no. 3, pp  Kharbat, F, Odeh, M & Bull, L 2008, ‘Knowledge Discovery from Medical Data: An Empirical Study with XCS’ in Studies in Computational Intelligence 125: Learning Classifier Systems in Data Mining, eds L Bull, E Bernadó- Mansilla & J Holmes, Springer, Berlin, pp  Sigaud, O & Wilson, SW 2007, ‘Learning classifier systems: a survey’, Soft Computing, vol. 11, no. 11, pp  Stolzmann, W 2001, ‘Anticipatory Classifier Systems: An introduction’, AIP Conference Proceedings, vol. 2001, no. 573, pp  Stone, C & Bull, L 2008, ‘Foreign Exchange Trading Using a Learning Classifier System’ in Studies in Computational Intelligence 125: Learning Classifier Systems in Data Mining, eds L Bull, E Bernadó-Mansilla & J Holmes, Springer, Berlin, pp  Urbanowicz, RJ & Moore, JH 2009, ‘Learning Classifier Systems: A Complete Introduction, Review and Roadmap’, Journal of Artificial Evolution and Applications, vol. 2009, 25 pages. Sources

64 Questions?