Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.

Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang

2 Agenda Motivation Iterated Prisoner’s Dilemma Game Related Works Strategic Coalition Improving Generalization Ability Experimental Results Conclusion

3 Motivation Evolutionary approach Understanding complex behaviors by investigating simulation results using evolutionary process Giving a way to find optimal strategies in a dynamic environment IPD game Model complex phenomena such as social and economic behaviors Provide a testbed to model dynamic environment Objectives Obtaining multiple good strategies Forming coalition to improve generalization ability

4 Iterated Prisoner’s Dilemma Game (1/2) Overview Prisoner’s possible choice Defection Cooperation Characteristics Non-cooperative Non-zerosum Types of Game 2IPD (2-player Iterated Prisoner’s Dilemma) game NIPD (N-player Iterated Prisoner’s Dilemma) game CooperateDefect CooperateR / RT / S DefectS / TP / P Payoff Matrix of 2IPD Game by Axelrod, R.(1984) CooperateDefect Cooperate3 / 30 / 5 Defect5 / 01 / 1

5 Iterated Prisoner’s Dilemma Game (2/2) Representation of Strategy History TableRecent Action ∙∙∙ Last ActionRecent Action ∙∙∙ Last Action Own HistoryOpponent’s History 010 ∙∙∙ 1 l = 2 : Example History 11 01 2 N History

6 Related Works Previous Study Paul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning Issues Topics are broad about coalition formation in multi-agent environment Darwen and Yao have studied coalition in IPD game, but different Focused on cooperation, the number of player, payoff variances, etc

7 What is Different? Co-evolutionary Learning Selection Method Rank Based Roulette wheel Tournament Coalition Formation Coalition keeps surviving to next generation Condition to form coalition is flexible Decision Making in Coalition Adapting several decision making methods to coalition Borda Function, Condorect Function Average Payoff, Highest Payoff Weighted Voting

8 Evolving Strategy To evolve strategy, we use ; Genetic algorithm Co-evolutionary learning Strategic coalition Evolutionary Process

9 Evolution of Agents (1/2) CiCi C1C1 CkCk Before PopulationCurrent Population Next Population CiCi C1C1 CkCk CjCj CiCi C1C1 CkCk CjCj ClCl Evolution of Agents Agents can develop their strategy using co-evolutionary learning Weak agents are removed from the population Evolution of Coalition Formed coalition survives to next generation Agents can join coalition generation by generation Coalition survives or grows up

10 Evolution of Agents (2/2) Problem : Possibility of evolving by weak agents Caused by removing better agent from the population who belongs to coalition Making new agents by mixing better agents within coalition Population CkCk CiCi CjCj A1A1 A2A2 Random Extraction Coalition Mutation AiAi Repeat as the number of agents belong to coalition

11 Strategic Coalition (1/2) What is Coalition? A cooperative game as a set A of agents in which each subset of A is called coalition － Matthias Klusch and Andreas Gerber, 2002 A group of agents that work jointly in order to accomplish their tasks － Onn M. Shehory, 1995 Coalition in the IPD game Forming coalition through round-robin game Pursuing more payoff using generalization ability Coalition forms autonomously without supervision

12 Definitions Definition 1 : Coalition Value Definition 2 : Payoff Function Definition 3 : Coalition Identification Strategic Coalition (2/2) (1) (2) (3) Definition 4 : Decision Making Definition 5 : Payoff Distribution

13 Coalition Formation (1/2) A1A1 A2A2 A3A3 A4A4 AkAk AnAn AmAm A5A5 AjAj...... AiAi A2A2 AiAi A5A5 A3A3 C1C1 AjAj...... C2C2 CiCi A1A1 A4A4 C1C1 AkAk AlAl C2C2 AmAm AnAn CiCi............ Initial Population Population Including coalition 2IPD game Form Coalition AiAi A5A5 A5A5 C1C1 C2C2 CiCi......

14 Coalition Formation (2/2) Algorithm 2IPD Game Exceeds iteration per generation? Game type? Agent vs. Agent Agent vs. Coalition Coalition vs. Coalition Satisfy condition for forming coalition? Forming Coalition Joining Coalition Genetic Operation Satisfy condition? N N N Y Y Stop Y Forming coalition 1.Round-robin 2IPD game 2.Obtain rank 3.Determine confidence of agent according to the rank Joining coalition 1.Round-robin 2IPD game 2.Obtain rank 3.If number of agents > max. number of agents within a coalition, remove the weakest agent 4.Determine confidence of each agent

15 Coalition Decision Making Decision making To decide coalition’s opinion Use weighted voting method Sharing profits Distribution payoff with each agent’s confidence Rank influences each weight Determining next action of coalition : Weight for cooperation of coalition C i : Weight for defection of coalition C i CiCi CjCj CkCk ClCl ∑ ∑ CiCi CjCj CkCk ClCl Previous ActionNext Action C D or

16 Weight of Agents Adjusting weight Give incentive to agents in coalition It reflects decision making of coalition CiCi CjCj CkCk ClCl ∑ ∑ CiCi CjCj CkCk ClCl Previous ActionNext Action C D or Adjusting weight

17 Improving Generalization Ability (1/2) Problem of one good strategy Not adaptive to dynamic environment Obtain multiple good strategies for specific environment Ex) Biological immune system Method Fitness sharing Adjust confidences of multiple strategies by evolution Co-evolution Coalition formation

18 Improving Generalization Ability (2/2) How good a player performs against unknown player Evaluation Random Generation of 100 Strategies 2IPD Game Extract Top Strategies in the Population 1 0001110... 2 0000100... 3 0100100... 4 0001100... 5 0010010... 10 0000010... Top Strategies Genetically Evolved Strategies IPD Game

19 Test Strategy Test Strategies StrategyCharacteristics Tit-For-TatInitially cooperate, and then follow opponent TriggerInitially cooperate. Once opponent defects, continuously defect AllDAlways defect CDCDCooperate and defect over and over CCDCooperate and cooperate and defect RandomRandom move Example Strategy 00101100 00011111 11111111 01010101 00100100 11010011 Tit-for-Tat Trigger AllD CDCD CCD Random

20 Example of Game Tit-for-Tat 10111001111010111101 00110001001110110001 Vs. Evolved Strategy 0000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 11101111111100101011111111110100 3511135111 3011130111 Payoff 1 2345 1 2345

21 Test Environment Population size : 100 Crossover rate : 0.3 Mutation rate : 0.001 Number of generations : 200 Number of iterations : a third of population Training set : Well-known 6 strategies Experimental Result

22 Evolved Strategy vs. Random Rank Genotype of Evolved strategy Random Avg. PayoffS.D.Avg. PayoffS.D 1 2 3 4 5 6 7 8 9 10 10111001111010111101 00111001111010111101 10111011111111111111 00111011111111111101 00111011111011111011 00110000111111111111 00111011111111111011 00111011111011111011 10111111111111111111 10111001111111111101 3.080000 2.800000 2.920000 2.880000 2.940000 2.680000 3.040000 3.160000 3.480000 2.760000 1.998399 1.989975 1.998399 1.996397 1.989070 1.690444 1.999600 1.993590 1.941546 1.985548 0.480000 0.550000 0.520000 0.570000 0.540000 2.350000 0.490000 0.500000 0.380000 0.560000 0.499600 0.497494 0.499600 0.667158 0.555338 1.996873 0.499900 0.670820 0.485386 0.496387 Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the game against Random test strategies with high payoffs. Experimental Result

23 Evolved Strategy vs. Tit-for-Tat Rank Genotype of Evolved strategy Tit-for-Tat Avg. PayoffS.D.Avg. PayoffS.D 1 2 3 4 5 6 7 8 9 10 11000100001011011100 01101100001010011100 10001000001011011100 00000100001010011100 10001000001011011100 01010100001011011100 11001000001010011100 11001100001011011110 01110100001011011100 01010100011011011100 3.020000 3.000000 1.040000 1.080000 2.980000 3.000000 1.040000 3.000000 3.020000 3.000000 1.636948 0.000000 0.397995 0.560000 0.345832 1.624808 0.397995 0.000000 1.636948 0.000000 2.640000 3.000000 0.990000 1.020000 2.970000 2.670000 0.990000 3.000000 2.640000 3.000000 2.061650 0.000000 0.099499 0.423792 0.411218 2.044774 0.099499 0.000000 2.061650 0.000000 Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well. Experimental Result

24 Evolved Strategy vs. Trigger Rank Genotype of Evolved strategy Trigger Avg. PayoffS.D.Avg. PayoffS.D 1 2 3 4 5 6 7 8 9 10 10111011110011101000 10111011110011101001 00111011110011111000 10111011110011111001 10111111110010111000 00111011110011111001 10111011110011111001 00111011110011111001 1.040000 1.060000 1.040000 1.080000 1.040000 1.060000 1.040000 0.397995 0.443170 0.397995 0.483322 0.397995 0.443170 0.397995 0.990000 1.010000 0.990000 1.030000 0.990000 1.010000 0.990000 0.099499 0.223383 0.099499 0.298496 0.099499 0.223383 0.099499 Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively. Experimental Result

25 Evolved Strategy vs. AllD Rank Genotype of Evolved strategy ALLD Avg. PayoffS.D.Avg. PayoffS.D 1 2 3 4 5 6 7 8 9 10 00111111111110101111 00111011111110101111 10111111111110101111 00111111111110101111 10111011111110101111 00111111111110101111 00111111111110101011 00111111111110101111 1.000000 0.000000 1.000000 1.040000 1.000000 0.000000 0.397995 0.000000 The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game. Experimental Result

26 Number of Coalition Generation Coalition Coalition survives next generation. In early evolutionary process, most of coalition are formed. It makes genetic diversity high and better choice against opponents. Coalition can grow if the conditions of agents are satisfied. Experimental Result

27 Comparing the Results The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well. Experimental Result

28 Bias of the Strategy Bias Generation Bias shows how next choice of the strategies is selected against its opponents. The higher rate of bias means that a strategy chooses more “cooperation” than “defection” with a bias rate and vice versa. Experimental Result

29 Conclusions Conclusion Strategic coalition might be a robust method that can adapt to a dynamic environment Decision making methods influence the results, but not serious The evolved strategies by coalition generalize well against various opponents Discussion Can the strategic coalition be adapted to n-IPD game ? Which parameters in IPD game influence generalization ability ? How can make opponent strategies to test ? How can adapt this problem to real world ?

30 Examples (1) Market Observer

31 Examples (2) Forest Prediction

Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.

Similar presentations

Presentation on theme: "Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang.

Similar presentations

Presentation on theme: "Iterated Prisoner’s Dilemma Game in Evolutionary Computation 2003. 10. 2 Seung-Ryong Yang."— Presentation transcript:

Similar presentations

About project

Feedback