Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum.

Slides:



Advertisements
Similar presentations
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Advertisements

Dialogue Policy Optimisation
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Constructing Complex NPC Behavior via Multi- Objective Neuroevolution Jacob Schrum – Risto Miikkulainen –
Playing Evolution Games in the Classroom Colin Garvey GK-12 Fellow.
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Planning under Uncertainty
Genetic Algorithms, Part 2. Evolving (and co-evolving) one-dimensional cellular automata to perform a computation.
Genetic Algorithms. Some Examples of Biologically Inspired AI Neural networks Evolutionary computation (e.g., genetic algorithms) Immune-system-inspired.
The Evolution of Cooperation Shade Shutters School of Life Sciences & Center for Environmental Studies.
1 Chapter 13 Artificial Life: Learning through Emergent Behavior.
Evolving Neural Network Agents in the NERO Video Game Author : Kenneth O. Stanley, Bobby D. Bryant, and Risto Miikkulainen Presented by Yi Cheng Lin.
Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.
Multimodal Problems and Spatial Distribution Chapter 9.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Evolutionary Computation Application Peter Andras peter.andras/lectures.
Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
By Jacob Schrum and Risto Miikkulainen
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
Evolving Multi-modal Behavior in NPCs Jacob Schrum – Risto Miikkulainen –
Evolution, Brains and Multiple Objectives
University of Bologna, Italy How to cheat BitTorrent and why nobody does Simon Patarin and David Hales University of Bologna ECCS 2006,
Chapter 10 Artificial Intelligence. © 2005 Pearson Addison-Wesley. All rights reserved 10-2 Chapter 10: Artificial Intelligence 10.1 Intelligence and.
Learning in Multiagent systems
Constructing Intelligent Agents via Neuroevolution By Jacob Schrum
More on coevolution and learning Jing Xiao April, 2008.
Coevolutionary Models A/Prof. Xiaodong Li School of Computer Science and IT, RMIT University Melbourne, Australia April.
Example Department of Computer Science University of Bologna Italy ( Decentralised, Evolving, Large-scale Information Systems (DELIS)
Kavita Singh CS-A What is Swarm Intelligence (SI)? “The emergent collective intelligence of groups of simple agents.”
Modeling Complex Dynamic Systems with StarLogo in the Supercomputing Challenge
Coevolution Chapter 6, Essentials of Metaheuristics, 2013 Spring, 2014 Metaheuristics Byung-Hyun Ha R2R3.
Predator/Prey Simulation for Investigating Emergent Behavior Jay Shaffstall.
CAP6938 Neuroevolution and Developmental Encoding Real-time NEAT Dr. Kenneth Stanley October 18, 2006.
Artificial Life/Agents Creatures: Artificial Life Autonomous Software Agents for Home Entertainment Stephen Grand, 1997 Learning Human-like Opponent Behaviour.
Controlling the Behavior of Swarm Systems Zachary Kurtz CMSC 601, 5/4/
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Artificial intelligence
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Optimal Self-placement of Heterogeneous Mobile Sensors in Sensor Networks Lidan Miao AICIP Research Oct. 19, 2004.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Evolutionary Design (2) Boris Burdiliak. Topics Representation Representation Multiple objectives Multiple objectives.
Pac-Man AI using GA. Why Machine Learning in Video Games? Better player experience Agents can adapt to player Increased variety of agent behaviors Ever-changing.
The Evolution of Specialisation in Groups – Tags (again!) David Hales Centre for Policy Modelling, Manchester Metropolitan University, UK.
Artificial Intelligence Research in Video Games By Jacob Schrum
CAP6938 Neuroevolution and Artificial Embryogeny Competitive Coevolution Dr. Kenneth Stanley February 20, 2006.
Evolving Multimodal Networks for Multitask Games
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department.
Iterated Prisoner’s Dilemma Game in Evolutionary Computation Seung-Ryong Yang.
Comparative Reproduction Schemes for Evolving Gathering Collectives A.E. Eiben, G.S. Nitschke, M.C. Schut Computational Intelligence Group Department of.
Riza Erdem Jappie Klooster Dirk Meulenbelt EVOLVING MULTI-MODAL BEHAVIOR IN NPC S.
Competition and Cooperation
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Evolutionary Computing Chapter 12. / 26 Chapter 12: Multiobjective Evolutionary Algorithms Multiobjective optimisation problems (MOP) -Pareto optimality.
Evolving Specialisation, Altruism & Group-Level Optimisation Using Tags – The emergence of a group identity? David Hales Centre for Policy Modelling, Manchester.
Evolving Specialisation, Altruism & Group-Level Optimisation Using Tags David Hales Centre for Policy Modelling, Manchester Metropolitan University, UK.
Multimodal Problems and Spatial Distribution A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 9.
CAP6938 Neuroevolution and Artificial Embryogeny Real-time NEAT Dr. Kenneth Stanley February 22, 2006.
CAP6938 Neuroevolution and Artificial Embryogeny Real-time NEAT Dr. Kenneth Stanley February 22, 2006.
Exploring Altruism in Emergent Behavior of Evolving Cooperative Robots
Evolving Multimodal Networks for Multitask Games
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano
Evolution of human cooperation without reciprocity
Lecture 4. Niching and Speciation (1)
Presentation transcript:

Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum

Why Multiple Agents? Many applications Many applications –Physical World  Robotics  Autonomous automobiles  Military applications  Network Systems –Artificial World  Games  Graphics  Entertainment  Artificial Life

Why Multiagent Perspective? Decentralized control Decentralized control –Failure recovery –Individual agents simpler than whole –Some environments don’t support central control Human interaction Human interaction –Humans are also agents –Agents interacting with humans are in MAS

Teamwork in Multiagent Systems Problem divided amongst many agents Problem divided amongst many agents Teamwork often required for success Teamwork often required for success Communication sometimes an issue Communication sometimes an issue How to learn teamwork: open question How to learn teamwork: open question

Direct Approach: Careful Design Hand code everything Hand code everything Benefits: Benefits: –Understand end product Drawbacks: Drawbacks: –Not general –Difficult –Programmer time Common in: Common in: –Robotics –Video games –Most deployed systems What if no one knows how to program it? What if no one knows how to program it?

Learn it: Reinforcement Learning Environment is Markov Decision Process Environment is Markov Decision Process Learn optimal policy Learn optimal policy –Depends on value function (TD methods) –Proven convergence in tabular case –Function approximation needed for bigger problems Problems with Partially Observable MDPs Problems with Partially Observable MDPs Successes in Successes in –Pred/Prey Scenarios (Tan 1993) –Soccer keep away (Kalyanakrishnan, Stone 2009) –Robocup soccer (many…)

Breed it: Evolution Based on evolution via natural selection Based on evolution via natural selection Benefits: Benefits: –Less restrictive policy representation –Demonstrated success in POMDP domains Drawbacks: Drawbacks: –Computationally intensive –Time intensive Focus of talk Focus of talk

Evolution Basics 1. Initialize population P 2. Evaluate all p in P (assign fitness) 3. Derive P’ by selecting/modifying members of P based on their fitness scores 4. Repeat from step 2 with P’ as P until done P’ is usually similar to P, but slightly better P’ is usually similar to P, but slightly better Many variations: Many variations: –Genetic Algorithms, Evolution Strategies, etc.

Evolution in Multiagent Systems 1.Team Composition A.Homogeneous B.Heterogeneous C.Heterogeneous from Subpopulations D.Entire population 2.Type of Selection A.Individual B.Team C.Self-Selection 3.Multiple Objectives Pick one member from each subpopulation to make a team

1.A. Homogeneous Teams Team members share same policy Team members share same policy Members know what to expect from team members Members know what to expect from team members One individual evaluated per trial One individual evaluated per trial Evaluations reliable because of consistent team composition Evaluations reliable because of consistent team composition

1.B. Heterogeneous Teams Team composed of several policies Team composed of several policies Uncertainty as to who teammates will be Uncertainty as to who teammates will be Multiple individuals evaluated per trial Multiple individuals evaluated per trial Evaluation differs depending on choice of team members Evaluation differs depending on choice of team members

1.C. Subpopulations Each slot filled by representative from specific subpopulation Each slot filled by representative from specific subpopulation Subpopulations specialize Subpopulations specialize Individuals know what to expect of members in each slot Individuals know what to expect of members in each slot Team composition is still heterogeneous Team composition is still heterogeneous

1.D. Entire Population The entire population is seen as a cooperating team The entire population is seen as a cooperating team Team level selection not possible Team level selection not possible Population may divide into competing subpopulations Population may divide into competing subpopulations –Mating restrictions –Genetic/Tag-based recognition

2.A. Individual Selection Individuals selected based on own fitness Individuals selected based on own fitness Commonly used with heterogeneous teamsCommonly used with heterogeneous teams Can result in selfish behaviorsCan result in selfish behaviors Altruism relevantAltruism relevant sacrificing own fitness to raise fitness of anothersacrificing own fitness to raise fitness of another Reciprocity relevantReciprocity relevant helping another to get help in returnhelping another to get help in return

2.B. Team Selection Individuals selected based on team fitness Individuals selected based on team fitness –Common fitness, sum, average, etc. Commonly used with homogeneous teamsCommonly used with homogeneous teams Enables slackers in heterogeneous teamsEnables slackers in heterogeneous teams Altruism and reciprocity have no meaningAltruism and reciprocity have no meaning No credit assignment problems between membersNo credit assignment problems between members

2.C. Self-Selection Individuals choose when and with whom to mate Individuals choose when and with whom to mate Common in Artificial Life simulationsCommon in Artificial Life simulations AL studies emergence of biological phenomenaAL studies emergence of biological phenomena Usually involves a spatial componentUsually involves a spatial component Extinction is possibleExtinction is possible Auto restartAuto restart Spawn new membersSpawn new members

3. Multiple Objectives Assume individual has fitness scores: Assume individual has fitness scores: –F = (f1,…,fN) in objectives 1 through N Which values of F are best? Which values of F are best? Traditional approach Traditional approach –fitness(F) = f1*w1 + … + fN*wN for weights w1,…,wN Pareto-based approach Pareto-based approach –Partition population into non-dominated Pareto fronts –Assign fitness based on Pareto-front

Pareto Front Example Each point represents an individual’s scores Each point represents an individual’s scores Point dominates other points in its box Point dominates other points in its box 3 Pareto fronts of non-dominated points 3 Pareto fronts of non-dominated points

Case Studies Review State of the Art Review State of the Art For each study: For each study: –Classify type of selection –Classify team composition –Identify unanswered questions –Future research directions

AntFarm Evolve foraging behavior Evolve foraging behavior –Pheromones to communicate Individual selection Individual selection Entire population as a team Entire population as a team No cooperative foraging! No cooperative foraging! –Likely cause: individual selection  Individual selection offers less incentive for teamwork  Teamwork especially difficult when there is only one team * AntFarm: Towards Simulated Evolution. Collins, Jefferson. 1991

Evolving Communication Exploration task Exploration task –Pheromones to communicate Team selection Team selection Homogeneous teams vs. static bots Homogeneous teams vs. static bots Pairs of objectives, Pareto-based Pairs of objectives, Pareto-based Different behaviors in different runs Different behaviors in different runs –Compromise strategy –Blocking strategy  Teamwork possible with homogeneous teams  Need to move beyond grid-worlds  Move beyond two objectives * Emergence of Communication in Competitive Multi-Agent Systems: A Pareto Multi-Objective Approach. McPartland, Nolfi, Abbass. 2005

SwarmEvolveTags Birds visit food stations Birds visit food stations Energy can be shared Energy can be shared –Sharing based on tags Self-selection Self-selection Entire population as team Entire population as team –Competing subpopulations emerged  Cooperation in entire population without team selection  Altruism via aiding similar individuals  Teamwork as a result of subpopulation homogeneity * Tags and the Evolution of Cooperation in Complex Environments. Spector, Klein, Perry * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Legion-I Roman legions defend countryside and cities Roman legions defend countryside and cities Team level selection Team level selection Homogeneous teams Homogeneous teams Multi-modal behavior Multi-modal behavior –Defend city –Pursue barbarians  Homogeneous team members must fill all roles  Could not learn more complicated/strategic tasks  Example: building roads to speed up travel * Neuroevolution for Adaptive Teams. Bryant, Miikkulainen. 2003

Role-Based Cooperation Toroidal predator/prey grid world Toroidal predator/prey grid world Individual selection Individual selection –Team fitness shared by team members Multi-Agent ESP: subpopulation based Multi-Agent ESP: subpopulation based Simple non-communicating method outperforms communicating method Simple non-communicating method outperforms communicating method  Teamwork without homogeneity  Communication not always needed  May only apply to simple domains  Still need to scale up complexity  Get away from grid worlds * Coevolution of Role-Based Cooperation in Multi-Agent Systems. Yong, Miikkulainen. 2007

NERO Machine Learning game Machine Learning game –Human interaction via fitness function Individual selection Individual selection Entire population is team Entire population is team Multiple objectives Multiple objectives –User defines weights dynamically  Maintenance of fitness function  Old behaviors can be forgotten when learning new ones  Need to learn multiple tasks simultaneously * Evolving Neural Network Agents in the NERO Videogame. Stanley, Bryant, Miikkulainen. 2005

Pareto Multi-objective NPCs Evolved monsters vs. bot with stick Evolved monsters vs. bot with stick Individual selection Individual selection Large heterogeneous teams of 15 Large heterogeneous teams of 15 –Third of entire population Multiple objectives, Pareto-based Multiple objectives, Pareto-based –Credit assignment trick  Learns multiple objectives simultaneously  Different runs can lead to very different results  Different areas of trade-off surface  Population becomes mostly homogeneous * Constructing Complex NPC Behavior via Multi-Objective Neuroevolution. Schrum, Miikkulainen. 2008

Dead End Game * Interactive Opponents Generate Interesting Games. Yannakakis, Hallam Human prey vs. predators Human prey vs. predators Offline evolution vs. bot Offline evolution vs. bot –Team level selection –Homogeneous teams Online evolution vs. human Online evolution vs. human –Individual selection –Small heterogeneous team  Different configurations appropriate at different levels  Sometimes the domain leaves no choice

Cooperating Robots Retrieve tokens Retrieve tokens Simulation → Robots Simulation → Robots Compared selection levels Compared selection levels –Individual vs. Team Compared team compositions Compared team compositions –Homogeneous vs. heterogeneous  Homogeneous better with teamwork and altruism  Homogeneous best with team selection  Heterogeneous best with individual selection  Did not consider subpopulations  Tasks only involved foraging (no other objectives) * Genetic Team Composition and Level of Selection in the Evolution of Cooperation. Waibel, Keller, Floreano. 2008

Summary of Issues More complexity More complexity –Move beyond grid worlds –Need multiple contradictory objectives –Act in continuous, real-time world Best evolutionary configuration Best evolutionary configuration –More comparisons between team compositions  Especially subpopulation-based method –Task/configuration pairings? –Credit assignment issues Multi-modal behavior Multi-modal behavior –What to do and when

Experiment Four monsters vs. bot with stick Four monsters vs. bot with stick –Smaller team makes task harder Compare homogeneous, heterogeneous and subpopulation Compare homogeneous, heterogeneous and subpopulation –Homogeneous uses team selection –Others use individual selection Multiple objectives: Multiple objectives: –Group damage –Individual injury –Individual time alive

Heterogeneous Results Many generations (600+) Many generations (600+) –Not that long in real time Mostly selfish Mostly selfish –Good teamwork can arise though (Baiting) Teamwork depends on population being homogeneous Teamwork depends on population being homogeneous TeamworkSelfish

Homogeneous Results Fewer Generations ( ) Fewer Generations ( ) –Actually longer in real time Always some form a teamwork Always some form a teamwork –Baiting –Timed Assault Baiting Time Assault

Subpopulations Results Many Generations (400+) Many Generations (400+) Each generation takes a lot of real time Each generation takes a lot of real time Easy for slacker subpopulation to persist Easy for slacker subpopulation to persist Limited teamwork Limited teamwork –Only some members participate Cooperating Pair

Discussion Can subpopulation method do better? Can subpopulation method do better? –Better credit assignment –Team level selection (how?) Speed up homogeneous and subpopulations Speed up homogeneous and subpopulations Heterogeneous: discourage selfishness Heterogeneous: discourage selfishness

Future Research Questions Credit assignment issues Credit assignment issues –Cooperating individuals cannot be identified –Objectives define best evolutionary configuration? Complex domains/real problems Complex domains/real problems –Many objectives –Continuous, real-time Potential challenge domains Potential challenge domains –Robocup Soccer –Unreal Tournament

Conclusion Teamwork in Multiagent Systems important area Teamwork in Multiagent Systems important area Evolution has been successful Evolution has been successful Better understand why Better understand why –Team configuration –Level of selection –Presence/absence of credit assignment problems Apply to harder domains Apply to harder domains –Real-time –Continuous/noisy –Multiple contradictory objectives

Questions?

Auxiliary Slides

Cooperation Without Reciprocity Abstract study of the evolution of cooperation Abstract study of the evolution of cooperation Donor/recipient model Donor/recipient model 3 random pairings with option of donating fitness c so that recipient can gain fitness b 3 random pairings with option of donating fitness c so that recipient can gain fitness b Choice to donate based on similarity of tags Choice to donate based on similarity of tags Individual selection with entire population as team Individual selection with entire population as team –Subpopulations emerged based on tags Donation rate changes cyclically, but generally stays high (73%) for c < b Donation rate changes cyclically, but generally stays high (73%) for c < b Need to apply in actual domain requiring teamwork Need to apply in actual domain requiring teamwork * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Cooperation Without Reciprocity Results

Team Composition in MAS Taxonomy proposed by Stone*: Taxonomy proposed by Stone*: Heterogeneous Communicating Agents Heterogeneous Non-communicating Agents Homogeneous Communicating Agents Homogeneous Non-communicating Agents * Multiagent Systems: A Survey from a Machine Learning Perspective. Stone Definition of communication is broad: Definition of communication is broad: –Message passing, blackboard, information sharing, etc.