Download presentation
Presentation is loading. Please wait.
Published byCathleen Lee Modified over 8 years ago
1
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science
2
Typical Uses of MOEAs Where have MOEAs proven themselves? Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010) Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009) Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006) Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004) Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003) Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001) Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997) Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995) Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, 1994) Design of laminated ceramic composites (Belegundu et al, 1994) Many engineering/design problems! Many engineering/design problems!
3
New Domains for MOEAs Simulated agents often face multiple objectives Simulated agents often face multiple objectives Automatic discovery of intelligent behavior Automatic discovery of intelligent behavior Video game opponents in Unreal Tournament (van Hoorn, 2009) Video game opponents in Unreal Tournament (van Hoorn, 2009) Predator/prey scenarios (Schrum & Miikkulainen 2009) Predator/prey scenarios (Schrum & Miikkulainen 2009) Race car driving in TORCS (Agapitos et al, 2008) Race car driving in TORCS (Agapitos et al, 2008) Comparatively little so far Comparatively little so far Direct application of MOEA seldom successful Direct application of MOEA seldom successful Success often depends on “shaping” Success often depends on “shaping”
4
What is Shaping? Term from Behavioral Psychology Term from Behavioral Psychology Identified by B. F. Skinner (1938) Identified by B. F. Skinner (1938) Task-Based Example: Train rat to press lever Task-Based Example: Train rat to press lever First reward proximity First reward proximity Then any interaction with lever Then any interaction with lever Then actual pressing of lever Then actual pressing of lever
5
Evolutionary Shaping Environment changes, making task harder Environment changes, making task harder Evolution shapes behavior across generations Evolution shapes behavior across generations Example: Migration given continental drift [1] Example: Migration given continental drift [1] Animals become accustomed to short migration Animals become accustomed to short migration Continental drift increases distance of migration Continental drift increases distance of migration Ability to travel increasing distances required Ability to travel increasing distances required EC models with incremental evolution (ex. [2]) EC models with incremental evolution (ex. [2]) [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008. Arctic Tern Atlantic Salmon
6
Fitness-Based Shaping Not extensively used Not extensively used Little/no domain knowledge needed Little/no domain knowledge needed Multiobjective approach a good fit Multiobjective approach a good fit Selection criteria change Selection criteria change Exploiting ignored objectives (TUG) Exploiting ignored objectives (TUG) Exploiting unfilled niches (BD) Exploiting unfilled niches (BD) Behavior Space Crowded Niches Uncrowded Niches Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches
7
Mutiobjective Optimization Pareto dominance: iff Pareto dominance: iff Assumes maximization Assumes maximization Want nondominated points Want nondominated points NSGA-II used in this work NSGA-II used in this work What to evolve? What to evolve? NNs as control policies NNs as control policies Nondominated
8
Constructive Neuroevolution Genetic Algorithms + Neural Networks Genetic Algorithms + Neural Networks Build structure incrementally (complexification) Build structure incrementally (complexification) Good at generating control policies Good at generating control policies Three basic mutations (no crossover used) Three basic mutations (no crossover used) Perturb WeightAdd ConnectionAdd Node
9
Targeting Unachieved Goals Main ideas: Main ideas: Temporarily deactivate “easy” objectives Temporarily deactivate “easy” objectives Focus on “hard” objectives Focus on “hard” objectives “Hard” and “easy” defined in terms of goal values “Hard” and “easy” defined in terms of goal values Easy: average fitness “persists” above goal (achieved) Easy: average fitness “persists” above goal (achieved) Hard: goal not yet achieved Hard: goal not yet achieved Objectives reactivated when no longer achieved Objectives reactivated when no longer achieved Increase goal values when all achieved Increase goal values when all achieved Evolution Hard Objectives
10
TUG Example Goal achieved Other goals also achieved → Goals increase Reset recency-weighted average Noisy evaluations
11
Behavioral Diversity Originally developed for single-objective tasks [3] Originally developed for single-objective tasks [3] Add behavioral diversity objective Add behavioral diversity objective Encourage exploration of new behaviors Encourage exploration of new behaviors Domain-specific behavior measure required Domain-specific behavior measure required Extensions in this work: Extensions in this work: Multiobjective task Multiobjective task Domain independent method Domain independent method Only requires policy mapping ℝ to ℝ, e.g. NNs Only requires policy mapping ℝ to ℝ, e.g. NNs [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009. NM Senses Actions
12
Behavioral Diversity Details Behavior vector: Behavior vector: Given input vectors, concatenate outputs Given input vectors, concatenate outputs Behavioral diversity objective: Behavioral diversity objective: AVG distance from other behavior vectors AVG distance from other behavior vectors 0.1 2.3 4.3 5.2 3.2 … 0.5 5.3 7.5 3.4 2.1 1.3 4.2 5.6 4.5 7.7 2.4 4.30.7 4.22.1 3.5 … Behavior vector High average distance from other points
13
Battle Domain Evolved monsters (blue) Evolved monsters (blue) Monsters can hurt fighter Monsters can hurt fighter Scripted fighter (green) Scripted fighter (green) Bat can hurt monsters Bat can hurt monsters Three objectives Three objectives Deal damage Deal damage Avoid damage Avoid damage Stay alive Stay alive Previous work required incremental evolution to solve Previous work required incremental evolution to solve
14
Experimental Comparison NN copied to 4 monsters NN copied to 4 monsters Homogeneous teams Homogeneous teams In paper In paper Control: Plain NSGA-II Control: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goals TUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectors BD: NSGA-II with BD using random input vectors Additional methods since publication Additional methods since publication TUG-Low: NSGA-II with TUG using minimal initial goals TUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluations BD-Obs: NSGA-II with BD using inputs from evaluations Each repeated 30 times Each repeated 30 times
15
Attainment Surfaces [4] Result attainment surface Result attainment surface Shows space dominated by single Pareto front Shows space dominated by single Pareto front Summary attainment surface s Summary attainment surface s Union of space dominated in at least s out of n runs Union of space dominated in at least s out of n runs Surface s weakly dominates s+1, etc. Surface s weakly dominates s+1, etc. Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces Surface 1 Surface 2 Surface 3 Individual surfaces intersect [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.
16
Final Summary Attainment Surfaces ControlTUGBD TUG-LowBD-Obs Animation: worst to best summary attainment surface
17
Hypervolume Metric [5] Hypervolume of result attainment surface Hypervolume of result attainment surface Simply “volume” for 3 domain objectives Simply “volume” for 3 domain objectives WRT reference point WRT reference point Slightly less than minimum scores Slightly less than minimum scores Pareto-compliant metric Pareto-compliant metric Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.
18
Hypervolume
19
Successful Behaviors Successful Behaviors BD BD-Obs TUG TUG-Low
20
Discussion Control: more extreme trade-offs Control: more extreme trade-offs BD: more precise timing BD: more precise timing BD-Obs and BD similar BD-Obs and BD similar “Real” inputs give no advantage “Real” inputs give no advantage TUG: more teamwork TUG: more teamwork Particular initial objectives Particular initial objectives TUG-Low more like BD than TUG TUG-Low more like BD than TUG ALL are better than Control ALL are better than Control
21
Future Work How to combine TUG and BD How to combine TUG and BD Naïve combination doesn’t work Naïve combination doesn’t work Scaling up Scaling up Many objectives Many objectives More complex domains More complex domains Current work in Unreal Tournament promising Current work in Unreal Tournament promising
22
Conclusion BD and TUG improve MO evolution BD and TUG improve MO evolution Domain independence! Domain independence! Contrast to task-based shaping Contrast to task-based shaping Expand MOEAs to a new range of domains Expand MOEAs to a new range of domains
23
Questions? Email: schrum2@cs.utexas.edu schrum2@cs.utexas.edu See movies at: http://nn.cs.utexas.edu/?fitness-shaping
24
TUG Details Persistence: Persistence: Recency-weighted average surpasses goal Recency-weighted average surpasses goal Goals: Goals: Initial values based on domain knowledge Initial values based on domain knowledge Or simply the minimal values for objectives Or simply the minimal values for objectives Increase each goal when all are achieved Increase each goal when all are achieved Objectives reactivated when no longer achieved Objectives reactivated when no longer achieved Goal achieved
26
TUG Cycles
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.