A survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex Freitas (2001) A look at Genetic Algorithms(GA) and Genetic Programming(GP)

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Advertisements

Random Forest Predrag Radenković 3237/10
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Chapter 7 – Classification and Regression Trees
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Decision Tree Algorithm
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Chapter 14 Genetic Algorithms.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.
Khaled Rasheed Computer Science Dept. University of Georgia
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Programming.
Data Mining Techniques
Genetic Algorithm.
Issues with Data Mining
Ch. Eick: Evolutionary Machine Learning Classifier Systems n According to Goldberg [113], a classifier system is “a machine learning system that learns.
Genetic Algorithms CS121 Spring 2009 Richard Frankel Stanford University 1.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
© Negnevitsky, Pearson Education, Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Evolution strategies Evolution.
Chapter 9 – Classification and Regression Trees
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
1 Machine Learning: Lecture 12 Genetic Algorithms (Based on Chapter 9 of Mitchell, T., Machine Learning, 1997)
1 Chapter 14 Genetic Algorithms. 2 Chapter 14 Contents (1) l Representation l The Algorithm l Fitness l Crossover l Mutation l Termination Criteria l.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Chapter 12 FUSION OF FUZZY SYSTEM AND GENETIC ALGORITHMS Chi-Yuan Yeh.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Genetic Programming. What is Genetic Programming? GP for Symbolic Regression Other Representations for GP Example of GP for Knowledge Discovery Outline.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Chapter 14 Genetic Algorithms.
Chapter 7. Classification and Prediction
Rule Induction for Classification Using
Presented by: Dr Beatriz de la Iglesia
Gerd Kortemeyer, William F. Punch
EE368 Soft Computing Genetic Algorithms.
Machine Learning: UNIT-4 CHAPTER-2
Evolutionary Ensembles with Negative Correlation Learning
Presentation transcript:

A survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex Freitas (2001) A look at Genetic Algorithms(GA) and Genetic Programming(GP) also from the point-of-view of the user. An overview of Data Mining and Knowledge Discovery The three desirable properties of KD: 1)Accuracy. We would like a high predictive accuracy rate for our data mining task of classification 2)Comprehensible.The user should not be presented with a “black box” predictor that spouts out a number of IF_THEN rules and say this is the output. Prediction rules should be comprehensible. 3)Interesting. In many cases the user should be presented with an ensemble of discovered rules not all of which will be interesting in practice.

The overview: n Show the user some schematics: n Data Set#1 n Data Set#2 n Data Set# >Data Integration----->Pre-Processing n Data Set#4Data Mining n Data Set#5Post-Processing n Data Set#6Output Knowledge

Motivation n “We believe that ideally a combination of subjective and objective approaches should be used to try to solve the very hard problem of returning interesting knowledge to the user” n Objective approaches: clearly the algorithms and computer processing give us high-level knowledge n Subjective approaches: user helps to define and re-define those formulas or IF_THEN rules that are of interest to the knowledge discovery.

Data Mining Task: Classification n Many of the KD procedures are to predict the value (the class) of a user-specified goal attribute based on the predicting attributes. n E.G. IF( Unpaid_Loan =“NO”) AND (Overdrafts=“Yes”) THEN ( Credit = “Bad”) n For a comprehensive discussion about how to measure predictive accuracy of classification rules we are referred to [34] Hand, DJ, Construction and Assessment of Classification Rules n Tom Mitchell’s book also has good information.

Data Mining : Other tasks n Dependence Modelling :an extension or generalization of classification n Clustering: discovering groups: unsupervised learning n Discovery of Association Rules: more than one item in the consequent attribute is possible; the classification task may be assymmetric with respect to the predicting attributes and the consequent attribute

The Knowledge Discovery Process n Data integration n Data cleansing n Discretization: transforming a “continuous” attribute into a discrete one. (For example, “low”, “medium”, “high”) n Attribute Selection: selecting a set of attributes relevant for classification. n The motivation for attribute selection may be obvious: it has been found that irrelavant attributes can somehow “confuse” the data mining algorithm, leading to the discovery of inaccurate or useless knowledge. ( A wrapper method can help select attributes.)

Discovered Knowledge Postprocessing n Two main motivations: u First, when the discovered rule set is large, we want to simplify it. (The user may or may not help at this stage.) Some other techniques will be addressed. u Second, we often want to extract a subset of interesting rules, among all the discovered ones. We will look at some objective methods later on(GA) but subjective methods involving a user/data miner collaboration may also be important.

Genetic Algorithms (GA) for Rule Discovery n Michigan approach: population consists of individuals(“chromosomes”) where each individual encodes a single prediction rule. n Pittsburgh approach: each individual encodes a set of prediction rules n Pluses and minuses: the Pittsburgh approach directly takes into account rule interaction when computing the fitness function of an individual. This approach leads to syntactically-longer individuals. In the Michigan approach the individuals are simpler and syntactically shorter. It simplifies the design of genetic operators. (Interactions in Michigan approach are not taken into account.) n Take the rule: IF cond#1 AND cond#2 AND …cnd#n….THEN class= c(i) u Representation of the rule antecedent u Representation of rule consequent ( the THEN part)

The Rule Antecedent (Using GA) n Often there is a conjunction of conditions. n Usually use binary encoding. n A given attribute can take on k discrete values. Encoding can consist of k bits.( for “on” or “off”) Allows for internal disjunctions. u ………0 n All bits can be turned into “1” ’s in order to “turn off” this condition. n Non-binary encoding is possible. Variable-length individuals will arise. May have to modify crossover to be able to cope with variable-length individuals.

Representing the Rule Consequent (Predicted Class) n Three ways of representing the predicted class. (the THEN part) u First, encode it in the genome of an individual (possibly making it subject to evolution.) u Second, associate all individuals of the population with the same predicted class, which is never modified during the running of the algorithm. u Third, choose the predicted class most suitable for a rule (a deterministic way) as soon as the corresponding rule antecedent is formed. ( E.G. Maximize fitness.) u Author believes the third possibility to be the most sound overall.

Genetic Operators for Rule Discovery n Selection: Each individual represents a single rule.(Michigan approach). An approach called REGAL can be used. Individuals to be “mated” are “elected” by training examples. (Use a fitness operator or a probabilistic model.) n Generalizing/specializing crossover: basic idea of this special kind of crossover is to generalize or specialize a given rule, depending on whether it is currently overfitting or underfitting the data. n Generalizing/specializing-condition operator. The g/s of a rule can be done in a way independent of crossover. (Tweaking the antecedent conditions especially if contnuous conditions exist.)

Fitness Function for Rule Discovery n Remember:Accuracy, comprehensibility + interesting n How to get these 3 rule quality criteria incorporated into a fitness function. n Let a rule be IF A THEN C, the calculate the confidence factor CF = TP / ( TP + FP) using chart: n GivenActual Class n. C not C n.Predicted C TP FP n. Class not C FN TN n Comp=completeness measure = TP / (TP + FN) n Fitness = CF * COMP = (TP)(TP) / (TP+FP)(TP+FN) n Fitness = w1 X (CF * COMP) + w2 X (Simp) where Simp is a measure of rule simplicity 0< Simp<1 and w1 and w2 are weights

Genetic Algorithms (GAs) for Pre-processing n “The use of GAs for attribute selection seems natural. The main reason is that the major source of difficulty in attribute selection is attribute interaction, and one of the strengths of Gas is that they usually cope well with attribute interactions.” n We can use very simple genetic encoding where each individual represents a candidate attribute subset. A candidate attribute subset can be represented as a string with m binary genes where m is the number of attributes and each gene can take on a “0” or “1”. n Follow crossover and mutation procedures. n GA can be used with nearest neighbour algorithms (NNA) to “tweak” for better results.

Genetic Algorithms (GAs) for Post-processing n GAs can be used in the post-processing step when there is an ensemble of classifiers (e.g. rule sets) created. n Generating an ensemble of classifiers is useful since it has been shown that in several cases an ensemble of classifiers has a better predictive accuracy than a single classifier. n A fitness function may be created using weights for each classifier in the ensemble. (A user may help.) There are also GA schemes to optimize the weights of the classifiers.

Genetic Programming (GP) for Rule Discovery n Individual representation: attributes are often numeric n Functions such as +,-,*,/,,=, AND, OR,… are used as well as input arguments. n An individual is often represented as a tree diagram. n Once we apply the functions in the internal nodes of a GP individual to the values of the attributes in the leaf nodes of the individual, the system computes a numerical value that is output at the root of the tree. n Discovering comprehensible rules using GP: these rules could be similar to GA rules but there are some othersuch as: n Simplicity = (MaxNodes -0.5NumNodes -0.5) / (MaxNodes -1) u This could lead to discovery of short, simple rules that may be required in the Medical field.

Genetic Programming (GP) for Data Pre-Processing n A Major problem in attribute construction is that the search space tends to be huge. If the search can be accomplished especially with the relational operator “>” then many good candidate operations may evolve. n Sometimes there are GA/GP methods for pre-processing. n Conclusions: n In his Chapter on evolutionary algorithms Alex Freitas has shown us where the emphasis should be laid in Knowledge learning. His goal of “transparency” of pre-processing, rule learning and post- processing methods should be taken into account. A user would like to know where the classifier or the IF_THEN rule came from and if he can help influence the process to get a more intelligent result.

High Classification Accuracy does not imply Effective Genetic Search (Tim Kovacs;Manfred Kerber) n The authors are publishing their experimental results which they believe will help clear up some of the limitations of GA methods. In particular they examine XCS, a popular classification system which uses genetic algorithms. n The paper by K+K refers us to work by Stewart Wilson(1995) entitled “ Classifier Fitness Based on Accuracy”. In XCS each classifier maintains a prediction of expected payoff, but the classifier’s fitness is given by a measure of the predictor’s accuracy. Wilson’s example shows some individuals in the population P: n.pєF n Table of#011: n Population P11##: n.#0##: n Where p=prediction є = prediction error F= fitness parameter

XCS and XCS-NGA are compared n The authors XCS-NGA (XCS with no GA) uses XCS modified so that genetic search does not operate on the initial rule population. In all other respects XCS-NGA functions as XCS. n XCS classifies data points by a vote among the rules which match it, with each vote weighted both by the rule’s fitness. In this way a low-accuracy rule and a high-accuracy rule is given the classification of the high-accuracy rule. n In XCS, the rules (region shapes and sizes) are adapted by the GA. n XCS-NGA lacks a GA and its region shapes and sizes do not change. n XCS-NGA relies on there being enough rules to adequately solve the rule improvement problem (rule discovery) by random chance. n Roughly speaking, XCS-NGA’s approach is to generate many random rules and ignore those which happen to have low accuracy.

High Accuracy implications of XCS-NGA n The authors have obtained very good results with XCS-NGA. They agree that this alternative has its limitations but they want to address the publication of classification accuracy as a goal onto itself. n Many published papers use 6-bit multiplexer functions only. (Strings are of length L= k + 2^k so that if k=2bits then L=6. For a 70-bit multiplexer we have : k=6 bits, and L= 70.) n Because of its random nature the XCS-NGA thrives better with more initial rules and then give excellent results. n The claim is that they argue that only those studies which claim effective genetic research based on results with small functions are demonstrated invalid by their results with XCS-NGA.

A More Powerful Metric for Evaluating Genetic Search n This metric is symbolized by %[O] where %[O] = the proportion of the Optimal solution in the rule population on a given time step. n This metric has been shown to have greater discriminatory power than the performance metric introduced by Wilson. n Wilson’s “performance” is defined as a moving average of the proportion of the last n trials in which the system has responded with the correct action. ( n is traditionally equal to 50.) n There are often 400 rules to start with. The XCS-NGA does better with more rules. n %[O] is better able to discern the progress of genetic search than the performance metric. n The new metric extends the utility of small tests. n %[O] has disadvantages including the need to compute the optimal solution in advance as well as the computational expense of evaluating it. n Finally, replacing GA with an interactive random rule generator would provide a baseline against which to compare genetic search.