Presentation is loading. Please wait.

Presentation is loading. Please wait.

Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris Kalles Presented by Alexandre Temporel.

Similar presentations


Presentation on theme: "Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris Kalles Presented by Alexandre Temporel."— Presentation transcript:

1 Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris Kalles Presented by Alexandre Temporel

2 Abstract The idea is to make use of evolutionary techniques to evolve decision trees. Can the learner: search efficiently simple & complex hypotheses space? Discoverconditionally dependant attributes? irrelevant Tests using standard concept learning and also compared to two known algorithms: – C4.5 (Quinlan, 1993) – OneR (Holte, 1993) GOAL: demonstrates the potential advantages of this evolutionary techniques compared to other classifiers.

3 Outline 1)Problem Statement 2)Construction of GATree System. -Operators -Fitness function -Advanced features 3)Experiments -1 st exp: search efficiently simple & complex hypotheses space -2 nd exp: conditionally dependant attributes irrelevant attributes -3 rd exp: search target concepts on standard databases 4)Discussion on the search type of GATree 5)Conclusion

4 Related work Gas have widely been used for classification and concept learning tasks: J.Bala, J.Huang and H.Valaie (1995) Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification School of Information and Engineering Work on their ability as a tool to evolve decision trees: Burnett C.Nathan (may 2001) Evolutionary Induction of Decision Trees Kenneth A.De Jong, William M.Spears and Dianna F.Gordon Use Genetic Algorithms for concept learning Naval Research Laboratory Koza R. Koza Concept formation and decision tree induction using the genetic programming paradigm Martijn C.J. Bot and William B.Langdon Application of Genetic Programming to Induction of Linear Classification Trees H.Kennedy, C.Chinniah, P.Bradbeer, L.Morss The construction and Evaluation of Decision Trees: a comparison of Evolutionary and Concept Learning Methods Napier University

5 Problem statement (1/2) Also if presence of irrelevant attributes in a data set... Mislead the impurity functions... Produce bigger, less comprehensible and lower performance tree. Using these Evolutionary techniques allow us to: overcome the use of greedy heuristics search the hypotheses space in a natural way evolve simple, accurate, robust and simple decision trees

6 Problem statement (2/2) Decision trees used in many domains (i.e.: pattern classification) Proven to be NP complete (Murthy, 1998) Current inductive learning algorithms use: Information gain, gain ratio (Quinlan, 1986) Gini Index (Breiman 1984) Assumption: attributes are conditionally independents Poor performance on ‘strong conditional dependence’ data-set

7 GATree system (1/3) : Operators GATree program uses GALIB (Wall, 1996), a robust C++ library of Genetic Algorithm components. http://lancet.mit.edu/ga/http://lancet.mit.edu/ga/  Initial population: we use minimal binary decision trees Crossover Operation Mutation Operator

8 GATree system (2/3) : Fitness function Factor Size x is a constant (i.e. x=1000) If x is small, the fitness figure decreases with bigger trees. »SMALLER TREES->BETTER FITNESS If x is large, we search bigger search space. »ONLY ACCURACY REALLY MATTERS

9 GATree system (3/3) : Advanced features Overcrowding problem (Goldberg) use of a scaled payoff function ( which reduces fitness of similar trees in a population ) Use of alternative crossover and mutation operators More Accurate sub-trees have less chance to be selected for crossover or mutation To Speed Up evolution, use of: Limited Error Fitness (LEF) (GatherCole&Rose, 1997) CPU timesaving with insignificant accuracy losses

10 Experiments: foreword (1/2) 1 st experiment: we use DataGen (Melli,1999) to generate artificial data set using random rules (to ensure complexity variety). The goal is to reconstruct the underlying knowledge. 2 nd experiment: We use more or less complicated target concepts (Xor, parity....) and see how GATree performs against C4.5. 3 rd experiment: We use WEKA (Witten & Frank, 2000) to test C4.5 and OneR algorithms. ==> Use of 5 fold cross-validation

11 Experiments: foreword (2/2) Evolution Type: Generational Init. Population: 200 Generations: 800 Generation Gap: 25% Mutation Prob.: 0.005 Crossover Prob.: 0.93 Size factor: 1000 Random Seed: 123456789

12 1 st exp: Simple concept 3 Rules to extract: (31.0%) c1 <- B=(f or g or j) & C=(a or g or j) (28.0%) c2 <- C=(b or e) (41.0%) c3 <- B=(b or i) & C=(d or i) size Best individual Fitness-accuracy Size Number of generations

13 1 st exp: Complex concept 8 Rules to extract: (15.0%) c1 <- B=(f or l or s or w) & C=(c or e or f or k) (14.0%) c2 <- A= (a or b or t) & B=(a or h or q or k)................................. Etc.... size Best individual Fitness-accuracy Size Number of generations

14 2 nd exp (irrelevant attribute) A1A2A3Class TFTT TFFT FTFT FTTT FFTF FFFF TTTF TTFF Easy concept but C4.5 falsely estimates the contribution of A3 GATree produces the optimal solution irrelevant

15 2 nd exp (conditionally. Dependant attribute) NameAttribClass FunctionNoiseInstances Random attributes XOR110(A1 xor A2) or (A3 xor A4)No1006 XOR210(A1 xor A2) xor (A3 xor A4)No1006 XOR310 (A1 xor A2) or (A3 and A4) or (A5 and A6) 10% Class Error 1004 Par1103 attrib. parity problemNo1007 Par2104 attributes parity problemNo1006 Greedy heuristics are not good when attributes are conditionally dependants. GATree outperformed at a very good level GATree can be disturbed by class noise C4.5GATree XOR167±12.04100±0 XOR253±18.5790±17.32 XOR379±6.5278±8.37 Par170±24.49100±0 Par263±6.7185±7.91

16 3rd exp (results on standard sets) Greedy heuristics are not good when attributes are conditionally dependants. GATree outperformed at a very good level GATree can be disturbed by class noise AccuracySize C4.5OneRGATreeC4.5GATree Colic 83.84±3.4181.37±5.3685.01±4.5527.45.84 Heart-Statlog 74.44±3.5676.3±3.0477.48±3.0739.48.28 Diabetes 66.27±3.7163.27±2.5963.97±3.71140.66.6 Credit 83.77±2.9386.81±4.4586.81±457.83 Hepatitis 77.42±6.8484.52±6.280.46±5.3919.85.56 Iris 92±2.9894.67±3.893.8±4.029.67.48 Labor 85.26±7.9872.73±14.3787.27±7.248.68.72 Lymph 65.52±14.6374.14±7.1875.24±10.6928.27.96 Breast-Cancer 71.93±5.1168.17±7.9371.03±8.3435.46.68 Zoo 90±7.9143.8±10.4782.4±4.021710.12 Vote 96.09±3.8695.63±4.33 113 Glass 55.24±7.4943.19±4.3353.48±4.3360.28.98 Balance-Scale 78.24±4.459.68±4.471.15±6.47106.68.92 AVERAGES78.4672.6479.7543.27.01

17 3rd exp (observations) GA trees performs as well as or a bit better than C4.5 on the standard data sets. But trees size produces by GA trees are 6 times smaller than when using C4.5. OneR is good for noisy datasets but performs substantially worse overall.

18 Discussion on Search type of GATree GATree adopts a less greedy strategy than other learners It tries to minimise the size of the tree and maximise the accuracy GATree are not hill climbing searcher exhaustive But more a type of beam search (exploration / exploitation) However when tuned properly they have the same characteristics

19 Conclusion Derived hypotheses of standard algorithms can substantially deviate from the optimum. Due to their greedy strategy Can be solved by using global metrics of tree quality Compared to greedy induction, GATree produces: Accurate trees Small size Comprehensible GA adapt themselves dynamically on a variety of different target concepts

20 Encoding It is important for a problem to select the proper encoding. Encoding represents the mapping of solved problem to one of the following dimensional space: Value encoding chromosome A: 45.9845 12.8375 102.46556 55.3857 36.39857 Binary encoding Chromosomes in binary encoding are strings of 0 or 1. They can look like that ones shown on next example : chromosome A: 10001010001111101010111 Tree encoding GAs may also be used for program designing and construction. In that case chromosome genes represent programming language commands, mathematical operations and other components of program. We use Natural representation of the search space using actual decision trees and not binary strings....

21 Bias Without Bias, we have no basis for classifying unseen examples, the best we can do: -Memorise training samples -Classify new examples at random The problem with biased algorithm is that we decrease the hypothesis space and we might miss the optimal hypothesis which matches the optimum. Preference bias is based on the learner’s behaviour. C4.5 is biased towards small and accurate trees (preference bias) but uses gain ratio metric / minimum error pruning (procedural bias) desirable when it determines the characteristics of the produced tree. if inadequate, it may affect the quality of the output Procedural bias is based on the learner’s design. GA has a new weak procedural bias it considers a relative large number of hypotheses in a relative efficient manner it employs global metrics of tree quality. a set of minimum numerical performance measurements related to a goal: FITNESS


Download ppt "Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris Kalles Presented by Alexandre Temporel."

Similar presentations


Ads by Google