Download presentation
Presentation is loading. Please wait.
Published byHector Willis Modified over 9 years ago
1
The Performance of Evolutionary Artificial Neural Networks in Ambiguous and Unambiguous Learning Situations Melissa K. Carroll October, 2004
2
Artificial Neural Networks and Supervised Learning
3
Backpropagation and Associated Parameters: Gain Activation Function: Used to compute output of neuron from its inputs Sigmoid function: As gain increases, slope of activation function of neurons increases: Red: gain =1 Blue: gain = 2 Green: gain =.5 Diagram source: http://www.willamette.edu/~g orr/classes/cs449/Maple/Activ ationFuncs/active.html
4
Effects of Learning Rate Diagram source: http://www.willamet te.edu/~gorr/classes/ cs449/linear2.html
5
Methods to Ensure or Speed Up Convergence that Often Work Adjust architecture: add more layers or more neurons per layer Adjust topology, or connections between neurons Add bias neuron that outputs 1 No learning can occur with backprop when neuron is outputting 0 Equivalent to shifting the range of the activation function Reduces number of neurons outputting 0 Add momentum term to weight adjustment equations: Smoothes learning to allow high learning rate without divergence ANN programmer must manipulate all of these parameters using expert knowledge
6
Introduction: Genetic Algorithms (GAs) Another set of adaptive algorithms derived from natural process (evolution) Organisms possess chromosomes made up of genes encoding for traits There is variability among organisms Some individuals will naturally be able to reproduce more in a particular environment, making them more “fit” for that environment By definition, genes of the more fit individuals will become more numerous in the population Population is skewed towards more fit individuals for the given environment Forces of variability then act on these genes, leading to new, more “fit” discoveries
7
The Genetic Algorithm
8
Designing and Training ANNs with GAs: Rationale Designing the right ANN for a particular task requires manipulating all of the parameters described previously, which requires expertise and much trial and error (and sometimes luck!) GAs are optimizers and can optimize these parameters Traditional training algorithms like backpropagation have a tendency to get stuck in local minima of multimodal or “hilly” error curves, missing the global minimum: GAs perform a “global search” and are hence more likely to find the global minimum Diagram source: http://www.willamette.edu/~gorr/ classes/cs449/momrate.html
9
Designing and Training ANNs With GAs: Implementation Direct (Matrix) Encoding Some classes of GAs for evolving ANNs: Darwinian Hybrid Darwinian Baldwinian Lamarckian
10
Introduction: Wisconsin Card Sorting Test (WCST) Psychological task requiring adaptive thinking: measures flexibility in thought; therefore interesting for testing properties of ANN learning Requires subject to resolve ambiguities… Which card was the correct card when negative feedback is given? Which rule was the current rule when a stimulus card matches a target card on more than one dimension?
11
Purpose and Implementation
14
Hypotheses Regarding Learning Highly accurate network trained on unambiguous pattern should produce output identical to the training set Accuracy rate of rule-to-card network should be 100% Calculus proof led to prediction that network trained on ambiguous pattern would output, at each node, the probability of the corresponding rule being the current rule Accuracy rates should be 100%, 50%, and 33.3% for input patterns with 1, 2, and 3 associated target patterns, respectively Minimum error rate for ambiguous pattern is a very high.22916 When whole model is combined, will be interesting to see if networks can generalize to data not seen in training
15
Experiment Performed Compare the performance of six GAs and one non-GA algorithm Algorithms tested: Non-GA “brute force” algorithm: try all combinations of parameters Darwinian evolution-only (Pure Darwinian) Darwinian with additional backpropagation training (Hybrid Darwinian) Baldwinian evolving architecture only Baldwinian evolving architecture and weights Lamarckian One “made up” algorithm: “Reverse Baldwinian” Motivation for Reverse Baldwinian: produce greater variability and evaluate fitness over longer training periods without increasing computation time
16
Hypotheses Regarding Algorithm Performance Good chance GAs would outperform non-GA, but some doubts due to known problems with GAs Hybrid Darwinian more effective than Pure Darwinian based on previous research Baldwinian and Lamarckian more effective than Darwinian based on previous research Lamarckian more effective than Baldwinian due to relatively short runs (app. 40 generations)
17
Results and Discussion
18
Learning Performance Accuracy of Best Networks Found by Best and Second-Worst Algorithms on Unambiguous Rule-to-Card Pattern Accuracy of Best Networks Found by Best and Second-Worst Algorithms on Ambiguous Card-to-Rule Pattern
19
Sample Output of Best Card-to-Rule Learner
20
Nature of Learning Ambiguous Pattern
21
Parameters of Best Non-GA Nets
22
Lowest Error Rate Found by All Algorithms **Algorithms did not include additional 1000 training epochs; error values are the lowest attained by any of the networks produced by the GA run alone.
23
Performance of Pure Darwinian Algorithm
24
Sample Output of Best Pure Darwinian Net on Card-to-Rule Pattern
29
Did Evolution Work At All? Fitness graphs generally show increase in fitness over generations T-tests show that selection mechanism selected more fit individuals Best Lamarckian nets still “better” than best non-GA net after equivalent amounts of training T-tests show that error rates of nets during Lamarckian run were significantly better than error rates for random nets at equivalent time points for unambiguous pattern However, results were the reverse for the ambiguous pattern Due to the nature of the paired t-test performed, these results can’t easily be explained by the theory about assessment time point being critical
30
To Evolve or Not To Evolve General reasons why evolution may not have been appropriate in this case (in addition to those specific to the ambiguous pattern): Patterns may have been easy to learn; backpropagation often outperforms GAs on weight training for easy patterns Crossover often not effective when using matrix encoding scheme Although one GA did outperform non-GA, difference was almost irrelevant since both were highly successful Non-GA is easier to program and almost five times faster to run
31
Suggestions for Future Work Attempt to combine and train the entire ANN model Manipulate GA parameters, such as mutation rate, crossover rate, population size, and number of generations Try different selection mechanisms Use different encoding scheme Experiment with new fitness function for ambiguous pattern Test different GAs or other evolutionary algorithms altogether Investigate ambiguous patterns further, including the role of momentum in their non-linear learning curve
32
What Does It All Mean? Learning power of ANNs: ANNs learned two sub-tasks that are difficult for many humans Ambiguous patterns may be more difficult to design and train with GAs Training ambiguous patterns may require special modifications such as eliminating the momentum term Additional support for existing theories based on prior research GAs not as effective on easy-to-learn patterns Hybrid algorithms generally outperform evolution-only algorithms Clarifying properties of ANNs and GAs is tremendously useful for engineering and may also elucidate properties of natural processes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.