Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –

Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu University of Texas at Austin Department of Computer Sciences

Introduction Goal: discover NPC behavior automatically Goal: discover NPC behavior automatically Benefits Benefits Save production time/effort Save production time/effort Learn counterintuitive behaviors Learn counterintuitive behaviors Find weaknesses in static scripts Find weaknesses in static scripts Tailor behavior to human players Tailor behavior to human players

Introduction Challenges Challenges Games are complex Games are complex Multiple objectives Multiple objectives Multi-modal behavior required Multi-modal behavior required RL & Evolution popular approaches RL & Evolution popular approaches How to encourage multi-modal behavior? How to encourage multi-modal behavior?

Typical Agent Architecture One policy One policy Why not several policies? Why not several policies? Agent Environment policy Sensor input Actions

Agent With Multiple Policies Agent Environment policy 2 Sensor input Actions policy 1 policy n … arbitrate Policy for each mode Policy for each mode Individual policies simpler than monolithic policy Individual policies simpler than monolithic policy Must choose which policy to use Must choose which policy to use

Multi-modal Game Game to test multi-modal architecture Game to test multi-modal architecture Make task delineation clear Make task delineation clear Same NPCs perform two distinct tasks Same NPCs perform two distinct tasks Must determine their task from sensors Must determine their task from sensors New Game: “Fight or Flight” New Game: “Fight or Flight”

Fight or Flight Fight Task Player fights with bat Player fights with bat NPCs avoid bat NPCs avoid bat NPCs fight back NPCs fight back Flight Task Player has no weapon Player runs away NPCs confine/attack

NPC Objectives Fight Task Deal damage Deal damage Avoid damage Avoid damage Stay alive Stay alive Flight Task Deal damage Not the same objective as in the Fight task! How do we deal with multiple, competing objectives?

Multi-Objective Optimization Imagine game with two objectives: Imagine game with two objectives: Damage Dealt Damage Dealt Health Remaining Health Remaining A dominates B iff A is strictly better in one objective and at least as good in others A dominates B iff A is strictly better in one objective and at least as good in others Population of points not dominated are best: Pareto Front Population of points not dominated are best: Pareto Front High health but did not deal much damage Dealt lot of damage, but lost lots of health Tradeoff between objectives

NSGA-II Evolution: natural approach for finding optimal population Evolution: natural approach for finding optimal population Non-Dominated Sorting Genetic Algorithm II* Non-Dominated Sorting Genetic Algorithm II* Population P with size N; Evaluate P Population P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´ Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P  P´} size 2N Calculate non-dominated fronts of {P  P´} size 2N New population size N from highest fronts of {P  P´} New population size N from highest fronts of {P  P´} *K. Deb et al. 2000

Neuroevolution Genetic Algorithms + Artificial Neural Networks Genetic Algorithms + Artificial Neural Networks NNs good at generating behavior NNs good at generating behavior GA creates new nets, evaluates them GA creates new nets, evaluates them Four basic mutations (no crossover used) Four basic mutations (no crossover used) Perturb WeightAdd ConnectionAdd Neuron Merge Neurons

New Mode Mutation New mode with inputs from preexisting mode New mode with inputs from preexisting mode Maximum preference neuron determines mode Maximum preference neuron determines mode

Experiment Compare 1Mode vs. ModeMutation Compare 1Mode vs. ModeMutation 10 trials each 10 trials each What to evolve against? What to evolve against? Bot with static policy (instead of player) Bot with static policy (instead of player) Bot has a first person perspective Bot has a first person perspective Fight Task Swing bat constantly Swing bat constantly Approach nearest bot in front Approach nearest bot in front Flight Task Back away from nearest bot in front Back away from nearest bot in front

Incremental Evolution Hard to evolve against proposed bot strategies Hard to evolve against proposed bot strategies Could easily fail to evolve interesting behavior Could easily fail to evolve interesting behavior Incremental evolution against increasing speeds Incremental evolution against increasing speeds 0%, 40%, 80%, 100% 0%, 40%, 80%, 100% Increase speed when all goals are met Increase speed when all goals are met End when goals met at 100% End when goals met at 100%

Goals Average population performance high enough? Average population performance high enough?  Then increase speed Each objective has a goal: Each objective has a goal: Fight Fight At least 50 damage to bot (1 kill) At least 50 damage to bot (1 kill) Less than 20 damage per NPC on average (2 hits) Less than 20 damage per NPC on average (2 hits) Survive at least 800 time steps (80% of trial) Survive at least 800 time steps (80% of trial) Flight Flight At least 100 damage to bot (2 kills) At least 100 damage to bot (2 kills) Average population objective score met goal value? Average population objective score met goal value?  Goal met

Mode Mutation Results Performs well in both tasks Performs well in both tasks Fight Task Fight Task Baiting behavior Baiting behavior One NPC takes damage so others can sneak up behind One NPC takes damage so others can sneak up behind Bot knocked back and forth Bot knocked back and forth Flight Task Flight Task Corralling behavior Corralling behavior Keep bot confined in ring of NPCs Keep bot confined in ring of NPCs Move to scare the bot into enclosure Move to scare the bot into enclosure

Use of Multiple Modes Different modes for baiting and attacking Different modes for baiting and attacking Similar elements of modes co-opted for different tasks Similar elements of modes co-opted for different tasks Many unselected modes Many unselected modes As many as 7 unused modes As many as 7 unused modes Still have outward connections Still have outward connections Are they vestigial? Are they vestigial?

1 Mode Results Only performs well in one task Only performs well in one task Example 1 Example 1 Runs away in Fight task Runs away in Fight task Corralling behavior in Flight task Corralling behavior in Flight task Example 2 Example 2 Overly aggressive in Fight task Overly aggressive in Fight task Lets bot escape in Flight task Lets bot escape in Flight task Population averages of individual objectives are high enough, but few individuals do well in all objectives Population averages of individual objectives are high enough, but few individuals do well in all objectives

Why Different Behaviors? Progression method Progression method Numerically similar performance Numerically similar performance Drastically different distribution of behaviors Drastically different distribution of behaviors 1Mode evolves groups for subsets of objectives 1Mode evolves groups for subsets of objectives ModeMutation biases towards solving all objectives ModeMutation biases towards solving all objectives Changes shape of fitness landscape Changes shape of fitness landscape

Future Work Improve progression Improve progression More granularity in tougher end of task sequence More granularity in tougher end of task sequence Can incremental evolution be avoided? Can incremental evolution be avoided? Improve multiobjective selection Improve multiobjective selection Bias towards middle of trade-off surface Bias towards middle of trade-off surface Other algorithms: Other algorithms: SPEA2 SPEA2 PESA-II PESA-II

Future Work Improve ModeMutation Improve ModeMutation Should new modes be strongly differentiated? Should new modes be strongly differentiated? Different arbitration mechanism? Different arbitration mechanism? Better option than randomly applying mutation? Better option than randomly applying mutation? Different initial connectivity? Different initial connectivity? P(y)P(x)

Conclusion ModeMutation encourages multi-modal behavior ModeMutation encourages multi-modal behavior Biases search toward multi-modal solutions Biases search toward multi-modal solutions ModeMutation better than 1Mode ModeMutation better than 1Mode More successes in shorter amount of time More successes in shorter amount of time Lead to multi-modal behavior in future games Lead to multi-modal behavior in future games

Questions? Movies: Movies: http://nn.cs.utexas.edu/?multimodal09http://nn.cs.utexas.edu/?multimodal09 E-mail: schrum2@cs.utexas.edu schrum2@cs.utexas.edu

Auxiliary Slides

Ignore Achieved Goals for Objectives Goal is met → Drop objective Goal is met → Drop objective Focus selection on most difficult objectives Focus selection on most difficult objectives Prevents stagnation Prevents stagnation Reshaping fitness landscape helps escape peaks Reshaping fitness landscape helps escape peaks Project scores into lower dimension Project scores into lower dimension

Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –

Similar presentations

Presentation on theme: "Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –

Similar presentations

Presentation on theme: "Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –"— Presentation transcript:

Similar presentations

About project

Feedback