Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum.

Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum

Why Multiple Agents? Many applications Many applications –Physical World  Robotics  Autonomous automobiles  Military applications  Network Systems –Artificial World  Games  Graphics  Entertainment  Artificial Life

Why Multiagent Perspective? Decentralized control Decentralized control –Failure recovery –Individual agents simpler than whole –Some environments don’t support central control Human interaction Human interaction –Humans are also agents –Agents interacting with humans are in MAS

Teamwork in Multiagent Systems Problem divided amongst many agents Problem divided amongst many agents Teamwork often required for success Teamwork often required for success Communication sometimes an issue Communication sometimes an issue How to learn teamwork: open question How to learn teamwork: open question

Direct Approach: Careful Design Hand code everything Hand code everything Benefits: Benefits: –Understand end product Drawbacks: Drawbacks: –Not general –Difficult –Programmer time Common in: Common in: –Robotics –Video games –Most deployed systems What if no one knows how to program it? What if no one knows how to program it?

Learn it: Reinforcement Learning Environment is Markov Decision Process Environment is Markov Decision Process Learn optimal policy Learn optimal policy –Depends on value function (TD methods) –Proven convergence in tabular case –Function approximation needed for bigger problems Problems with Partially Observable MDPs Problems with Partially Observable MDPs Successes in Successes in –Pred/Prey Scenarios (Tan 1993) –Soccer keep away (Kalyanakrishnan, Stone 2009) –Robocup soccer (many…)

Breed it: Evolution Based on evolution via natural selection Based on evolution via natural selection Benefits: Benefits: –Less restrictive policy representation –Demonstrated success in POMDP domains Drawbacks: Drawbacks: –Computationally intensive –Time intensive Focus of talk Focus of talk

Evolution Basics 1. Initialize population P 2. Evaluate all p in P (assign fitness) 3. Derive P’ by selecting/modifying members of P based on their fitness scores 4. Repeat from step 2 with P’ as P until done P’ is usually similar to P, but slightly better P’ is usually similar to P, but slightly better Many variations: Many variations: –Genetic Algorithms, Evolution Strategies, etc.

Evolution in Multiagent Systems 1.Team Composition A.Homogeneous B.Heterogeneous C.Heterogeneous from Subpopulations D.Entire population 2.Type of Selection A.Individual B.Team C.Self-Selection 3.Multiple Objectives Pick one member from each subpopulation to make a team

1.A. Homogeneous Teams Team members share same policy Team members share same policy Members know what to expect from team members Members know what to expect from team members One individual evaluated per trial One individual evaluated per trial Evaluations reliable because of consistent team composition Evaluations reliable because of consistent team composition

1.B. Heterogeneous Teams Team composed of several policies Team composed of several policies Uncertainty as to who teammates will be Uncertainty as to who teammates will be Multiple individuals evaluated per trial Multiple individuals evaluated per trial Evaluation differs depending on choice of team members Evaluation differs depending on choice of team members

1.C. Subpopulations Each slot filled by representative from specific subpopulation Each slot filled by representative from specific subpopulation Subpopulations specialize Subpopulations specialize Individuals know what to expect of members in each slot Individuals know what to expect of members in each slot Team composition is still heterogeneous Team composition is still heterogeneous

1.D. Entire Population The entire population is seen as a cooperating team The entire population is seen as a cooperating team Team level selection not possible Team level selection not possible Population may divide into competing subpopulations Population may divide into competing subpopulations –Mating restrictions –Genetic/Tag-based recognition

2.A. Individual Selection Individuals selected based on own fitness Individuals selected based on own fitness Commonly used with heterogeneous teamsCommonly used with heterogeneous teams Can result in selfish behaviorsCan result in selfish behaviors Altruism relevantAltruism relevant sacrificing own fitness to raise fitness of anothersacrificing own fitness to raise fitness of another Reciprocity relevantReciprocity relevant helping another to get help in returnhelping another to get help in return

2.B. Team Selection Individuals selected based on team fitness Individuals selected based on team fitness –Common fitness, sum, average, etc. Commonly used with homogeneous teamsCommonly used with homogeneous teams Enables slackers in heterogeneous teamsEnables slackers in heterogeneous teams Altruism and reciprocity have no meaningAltruism and reciprocity have no meaning No credit assignment problems between membersNo credit assignment problems between members

2.C. Self-Selection Individuals choose when and with whom to mate Individuals choose when and with whom to mate Common in Artificial Life simulationsCommon in Artificial Life simulations AL studies emergence of biological phenomenaAL studies emergence of biological phenomena Usually involves a spatial componentUsually involves a spatial component Extinction is possibleExtinction is possible Auto restartAuto restart Spawn new membersSpawn new members

3. Multiple Objectives Assume individual has fitness scores: Assume individual has fitness scores: –F = (f1,…,fN) in objectives 1 through N Which values of F are best? Which values of F are best? Traditional approach Traditional approach –fitness(F) = f1*w1 + … + fN*wN for weights w1,…,wN Pareto-based approach Pareto-based approach –Partition population into non-dominated Pareto fronts –Assign fitness based on Pareto-front

Pareto Front Example Each point represents an individual’s scores Each point represents an individual’s scores Point dominates other points in its box Point dominates other points in its box 3 Pareto fronts of non-dominated points 3 Pareto fronts of non-dominated points

Case Studies Review State of the Art Review State of the Art For each study: For each study: –Classify type of selection –Classify team composition –Identify unanswered questions –Future research directions

AntFarm Evolve foraging behavior Evolve foraging behavior –Pheromones to communicate Individual selection Individual selection Entire population as a team Entire population as a team No cooperative foraging! No cooperative foraging! –Likely cause: individual selection  Individual selection offers less incentive for teamwork  Teamwork especially difficult when there is only one team * AntFarm: Towards Simulated Evolution. Collins, Jefferson. 1991

Evolving Communication Exploration task Exploration task –Pheromones to communicate Team selection Team selection Homogeneous teams vs. static bots Homogeneous teams vs. static bots Pairs of objectives, Pareto-based Pairs of objectives, Pareto-based Different behaviors in different runs Different behaviors in different runs –Compromise strategy –Blocking strategy  Teamwork possible with homogeneous teams  Need to move beyond grid-worlds  Move beyond two objectives * Emergence of Communication in Competitive Multi-Agent Systems: A Pareto Multi-Objective Approach. McPartland, Nolfi, Abbass. 2005

SwarmEvolveTags Birds visit food stations Birds visit food stations Energy can be shared Energy can be shared –Sharing based on tags Self-selection Self-selection Entire population as team Entire population as team –Competing subpopulations emerged  Cooperation in entire population without team selection  Altruism via aiding similar individuals  Teamwork as a result of subpopulation homogeneity * Tags and the Evolution of Cooperation in Complex Environments. Spector, Klein, Perry. 2004 * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Legion-I Roman legions defend countryside and cities Roman legions defend countryside and cities Team level selection Team level selection Homogeneous teams Homogeneous teams Multi-modal behavior Multi-modal behavior –Defend city –Pursue barbarians  Homogeneous team members must fill all roles  Could not learn more complicated/strategic tasks  Example: building roads to speed up travel * Neuroevolution for Adaptive Teams. Bryant, Miikkulainen. 2003

Role-Based Cooperation Toroidal predator/prey grid world Toroidal predator/prey grid world Individual selection Individual selection –Team fitness shared by team members Multi-Agent ESP: subpopulation based Multi-Agent ESP: subpopulation based Simple non-communicating method outperforms communicating method Simple non-communicating method outperforms communicating method  Teamwork without homogeneity  Communication not always needed  May only apply to simple domains  Still need to scale up complexity  Get away from grid worlds * Coevolution of Role-Based Cooperation in Multi-Agent Systems. Yong, Miikkulainen. 2007

NERO Machine Learning game Machine Learning game –Human interaction via fitness function Individual selection Individual selection Entire population is team Entire population is team Multiple objectives Multiple objectives –User defines weights dynamically  Maintenance of fitness function  Old behaviors can be forgotten when learning new ones  Need to learn multiple tasks simultaneously * Evolving Neural Network Agents in the NERO Videogame. Stanley, Bryant, Miikkulainen. 2005

Pareto Multi-objective NPCs Evolved monsters vs. bot with stick Evolved monsters vs. bot with stick Individual selection Individual selection Large heterogeneous teams of 15 Large heterogeneous teams of 15 –Third of entire population Multiple objectives, Pareto-based Multiple objectives, Pareto-based –Credit assignment trick  Learns multiple objectives simultaneously  Different runs can lead to very different results  Different areas of trade-off surface  Population becomes mostly homogeneous * Constructing Complex NPC Behavior via Multi-Objective Neuroevolution. Schrum, Miikkulainen. 2008

Dead End Game * Interactive Opponents Generate Interesting Games. Yannakakis, Hallam. 2004 Human prey vs. predators Human prey vs. predators Offline evolution vs. bot Offline evolution vs. bot –Team level selection –Homogeneous teams Online evolution vs. human Online evolution vs. human –Individual selection –Small heterogeneous team  Different configurations appropriate at different levels  Sometimes the domain leaves no choice

Cooperating Robots Retrieve tokens Retrieve tokens Simulation → Robots Simulation → Robots Compared selection levels Compared selection levels –Individual vs. Team Compared team compositions Compared team compositions –Homogeneous vs. heterogeneous  Homogeneous better with teamwork and altruism  Homogeneous best with team selection  Heterogeneous best with individual selection  Did not consider subpopulations  Tasks only involved foraging (no other objectives) * Genetic Team Composition and Level of Selection in the Evolution of Cooperation. Waibel, Keller, Floreano. 2008

Summary of Issues More complexity More complexity –Move beyond grid worlds –Need multiple contradictory objectives –Act in continuous, real-time world Best evolutionary configuration Best evolutionary configuration –More comparisons between team compositions  Especially subpopulation-based method –Task/configuration pairings? –Credit assignment issues Multi-modal behavior Multi-modal behavior –What to do and when

Experiment Four monsters vs. bot with stick Four monsters vs. bot with stick –Smaller team makes task harder Compare homogeneous, heterogeneous and subpopulation Compare homogeneous, heterogeneous and subpopulation –Homogeneous uses team selection –Others use individual selection Multiple objectives: Multiple objectives: –Group damage –Individual injury –Individual time alive

Heterogeneous Results Many generations (600+) Many generations (600+) –Not that long in real time Mostly selfish Mostly selfish –Good teamwork can arise though (Baiting) Teamwork depends on population being homogeneous Teamwork depends on population being homogeneous TeamworkSelfish

Homogeneous Results Fewer Generations (100-200) Fewer Generations (100-200) –Actually longer in real time Always some form a teamwork Always some form a teamwork –Baiting –Timed Assault Baiting Time Assault

Subpopulations Results Many Generations (400+) Many Generations (400+) Each generation takes a lot of real time Each generation takes a lot of real time Easy for slacker subpopulation to persist Easy for slacker subpopulation to persist Limited teamwork Limited teamwork –Only some members participate Cooperating Pair

Discussion Can subpopulation method do better? Can subpopulation method do better? –Better credit assignment –Team level selection (how?) Speed up homogeneous and subpopulations Speed up homogeneous and subpopulations Heterogeneous: discourage selfishness Heterogeneous: discourage selfishness

Future Research Questions Credit assignment issues Credit assignment issues –Cooperating individuals cannot be identified –Objectives define best evolutionary configuration? Complex domains/real problems Complex domains/real problems –Many objectives –Continuous, real-time Potential challenge domains Potential challenge domains –Robocup Soccer –Unreal Tournament

Conclusion Teamwork in Multiagent Systems important area Teamwork in Multiagent Systems important area Evolution has been successful Evolution has been successful Better understand why Better understand why –Team configuration –Level of selection –Presence/absence of credit assignment problems Apply to harder domains Apply to harder domains –Real-time –Continuous/noisy –Multiple contradictory objectives

Questions? schrum2@cs.utexas.edu

Auxiliary Slides

Cooperation Without Reciprocity Abstract study of the evolution of cooperation Abstract study of the evolution of cooperation Donor/recipient model Donor/recipient model 3 random pairings with option of donating fitness c so that recipient can gain fitness b 3 random pairings with option of donating fitness c so that recipient can gain fitness b Choice to donate based on similarity of tags Choice to donate based on similarity of tags Individual selection with entire population as team Individual selection with entire population as team –Subpopulations emerged based on tags Donation rate changes cyclically, but generally stays high (73%) for c < b Donation rate changes cyclically, but generally stays high (73%) for c < b Need to apply in actual domain requiring teamwork Need to apply in actual domain requiring teamwork * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Cooperation Without Reciprocity Results

Team Composition in MAS Taxonomy proposed by Stone*: Taxonomy proposed by Stone*: Heterogeneous Communicating Agents Heterogeneous Non-communicating Agents Homogeneous Communicating Agents Homogeneous Non-communicating Agents * Multiagent Systems: A Survey from a Machine Learning Perspective. Stone. 2000 Definition of communication is broad: Definition of communication is broad: –Message passing, blackboard, information sharing, etc.

Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum.

Similar presentations

Presentation on theme: "Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum.

Similar presentations

Presentation on theme: "Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum."— Presentation transcript:

Similar presentations

About project

Feedback