Presentation is loading. Please wait.

Presentation is loading. Please wait.

26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Player A Player B Game theoryGame theory: Two players alternative offering gameTwo.

Similar presentations


Presentation on theme: "26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Player A Player B Game theoryGame theory: Two players alternative offering gameTwo."— Presentation transcript:

1 26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Player A Player B Game theoryGame theory: Two players alternative offering gameTwo players alternative offering game Subgame perfect equilibrium found Slight game modification Slight game modification  Laborious work on new solutions Perfect rationality assumptionrationality Technical details (non-trivial) Co-evolution Incentive methods iIncentive methods invented What is my share??  Proposal: EC for approximating solutions on new games

2 Evolutionary Bargaining Strategies Edward Tsang Nanlin Jin Acknowledgement: Abhinay Muthoo

3 26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Conclusions  Demonstrated GP’s flexibility –Models with known and unknown solutionsModels with known and unknown solutions –Outside option –Incomplete, asymmetric and limited information  Co-evolution is an alternative approximation method to find game theoretical solutions –Perfect rationality assumption relaxed –Relatively quick for approximate solutions –Relatively easy to modify for new models  Genetic Programming with incentive / constraints –Constraints helped to focus the search in promising spaces  Lots remain to be done…

4 26 December 2015All Rights Reserved, Edward Tsang Two players alternative offering game Player 1: How about 70% for me 30% for you? Player 2: No, I want 60% Player 1: No, how about 50-50? Player 2: No, I want at least 55%  If neither players have any incentive to compromise, this can go on for ever  Payoff drops over time – incentive to compromise  A’s Payoff = x A exp(– r A tΔ) A’s Payoff Let r 1 be 0.1, Δ be 1 t = 0, Player 1’s pay off is 70% t = 2, Player 1’s pay off is 50%  e  0.1  2 = 41%

5 26 December 2015All Rights Reserved, Edward Tsang Payoff decreases over time

6 26 December 2015All Rights Reserved, Edward Tsang Bargaining in Game Theory  Rubinstein 1982 Model:  = Cake to share between A and B (= 1) A and B make alternate offers x A = A’s share (x B =  – x A ) r A = A’s discount rate t = # of rounds, at time Δ per round  A’s payoff x A drops as time goes by A’s Payoff = x A exp(– r A tΔ)  Important Assumptions: –Both players rational –Both players know everything  Equilibrium solution for A:  A = (1 –  B ) / (1 –  A  B ) where  i = exp(– r i Δ) Notice: No time t here 0  ? xAxA xBxB AB Optimal offer: x A =  A at t=0 In reality: Offer at time t = f (r A, r B, t) Is it necessary? Is it rational? (What is rational?)(What is rational?)

7 26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Strategies  Prisoners’ Dilemma –Co-evolution –GA vs PBIL  Bubinstein’s Models –Offer at time t = f (r A, r B, t) –x A *, x B * emerged as best results –Other solutions evolved occasionally  Current work –Asymmetric information –Outside options

8 26 December 2015All Rights Reserved, Edward Tsang Evolutionary Rubinstein Bargaining Overview  Game theorists solved Rubinstein bargaining problem –Subgame Perfect Equilibrium (SPE)  Slight problem alterations lead to different solutions –E.g. Outside option, Asymmetric info, Different time scales  Evolutionary computation –Succeeded in solving a wide range of problems –What can it do for finding subgame equilibrium?  Co-evolution is an alternative approximation method to find game theoretical solutions –Less time for approximate SPEs –Less modifications for new problems

9 Evolutionary Computation for Bargaining Technical Details

10 26 December 2015All Rights Reserved, Edward Tsang Issues Addressed in EC for Bargaining  Representation –Should t be in the language?  One or two population?  How to evaluate fitness –Fixed or relative fitness?  How to contain search space?  Discourage irrational strategies: –Ask for x A >1? –Ask for more over time? –Ask for more when  A is low? / AA BB  1 BB   1 Details of GP

11 26 December 2015All Rights Reserved, Edward Tsang Two populations – co-evolution  We want to deal with asymmetric games –E.g. two players may have different information  One population for training each player’s strategies  Co-evolution, using relative fitness –Alternative: use absolute fitness Evolve over time Player 1 … Player 2 ………

12 26 December 2015All Rights Reserved, Edward Tsang Representation of Strategies  A tree represents a mathematical function g  Terminal set: {1,  A,  B }  Functional set: {+, , ×, ÷}  Given g, player with discount rate r plays at time t g × (1 – r) t  Language can be enriched: –Could have included e or time t to terminal set –Could have included power ^ to function set  Richer language  larger search space  harder search problem

13 26 December 2015All Rights Reserved, Edward Tsang Incentive Method: Constrained Fitness Function  No magic in evolutionary computation –Larger search space  less chance to succeed  Constraints are heuristics to focus a search –Focus on space where promising solutions may lie  Incentives for the following properties in the function returned: –The function returns a value in (0, 1]The function returns a value in (0, 1] –Everything else being equal, lower  A  smaller shareEverything else being equal, lower  A  smaller share –Everything else being equal, lower  B  larger shareEverything else being equal, lower  B  larger share Note: this is the key to search effectiveness

14 26 December 2015All Rights Reserved, Edward Tsang Incentives for Bargaining  F(g i ) =g i  GF(s(g i )) is the game fitness (GF) of a strategy (s) based on the function g i, which is generated by genetic programmingstrategy (s)g i  When g i is outside (0, 1], the strategy does not enter the games; the bigger |g i |, the lower the fitness GF(s(g i )) GF(s(g i )) + B B ≥ 3 (tournament selection is used) If g i in (0,1] & SM i SM i > 0 & SM j > 0SM j GF(s(g i )) GF(s(g i )) + ATT(i) + ATT(j)ATT(i) ATT(j)If g i in (0,1] & (SM i ≤ 0 or SM j ≤ 0)SM iSM j ATT(i) ATT(i) + ATT(j) – e (–1/|gi|)ATT(j) If g i NOT in (0,1]

15 26 December 2015All Rights Reserved, Edward Tsang C1: Incentive for Feasible Solutions  The function returns a value in (0, 1]  The function participates in game plays  The game fitness (GF) is measured  A bonus (B) incentive is added to GF –B is set to 3 –Since tournament is used in selection, the absolute value of B does not matter (as long as B>3) f

16 26 December 2015All Rights Reserved, Edward Tsang C2: Incentive for Rational  A  Everything else being equal, lower  A  smaller share for A Given a function g i :  The sensitive measure SM i (  i,  j, α) measures how much g i decreases when  i increases by αSM i (  i,  j, α) 1 If SM i (  i,  j, α) > 1 e –(1/ SMi(  i,  j, α)) If SM i (  i,  j, α) ≤ 1 Attribute ATT(i) = f 0 < ATT(i) ≤ 1

17 26 December 2015All Rights Reserved, Edward Tsang C3: Incentive for Rational  B  Everything else being equal, lower  B  larger share for A Given a function g i :  The sensitive measure SM j (  i,  j, α) measures how much g i increases when  j increases by αSM j (  i,  j, α) 1 If SM j (  i,  j, α) > 1 e –(1/ SMj(  i,  j, α)) If SM j (  i,  j, α) ≤ 1 Attribute ATT(j) = f 0 < ATT(j) ≤ 1

18 26 December 2015All Rights Reserved, Edward Tsang Sensitivity Measure (SM) for  i  SM i (  i,  j, α) = [g i (  i  (1 + α),  j )  g i (  i,  j )] ÷ g i (  i,  j ) If  i  (1 + α) < 1 [g i (  i,  j )  g i (  i  (1  α),  j )] ÷ g i (  i,  j ) If  i  (1 + α)  1  SM i measures the derivatives of g i -- how much g i decreases when  i increases by α (g i is the function which fitness is to be measured) f

19 26 December 2015All Rights Reserved, Edward Tsang Sensitivity Measure (SM) for  j  SM j (  i,  j, α) = [g i (  i,  j )  g i (  i,  j  (1 + α))] ÷ g i (  i,  j ) If  j  (1 + α) < 1 [g i (  i,  j  (1  α))  g i (  i,  j )] ÷ g i (  i,  j ) If  j  (1 + α)  1  SM j measures the derivatives of g i -- how much g i increases when  j increases by α (g i is the function which fitness is to be measured) f

20 26 December 2015All Rights Reserved, Edward Tsang Bargaining Models Tackled Determin ants Complete Information Uncertainty 1-sided2-sided Discount Factors * Rubinstein 82* Rubinstein 85 x Imprecise info Ignorance x Bilateral ignorance + Outside Options * Binmore 85 x Uncertainty + Outside Options More could be done easily * = Game theoretical solutions knownGame theoretical solutions known x = game theoretic solutions unknown game theoretic solutions unknown

21 26 December 2015All Rights Reserved, Edward Tsang Models with known equilibriums Complete Information  Rubinstein 82 model: –Alternative offering, both A and B know  A &  B  Binmore 85 model, outside options: –As above, but each player has an outside offer, w A and w B Incomplete Information  Rubinstein 85 model: –B knows  A &  B –A knows  A –A knows  B is  w with probability w 0,  s (>  w ) otherwise

22 26 December 2015All Rights Reserved, Edward Tsang Models with unknown equilibriums Modified Rubinstein 85 / Binmore 85 models:  1-sided Imprecise information –B knows  A &  B ; A knows  A and a normal distribution of  B  1-sided Ignorance –B knows both  A and  B ; A knows  A but not  B  2-sided Ignorance –B knows  B but not  A ; A knows  A but not  B  Rubinstein 85 + 1-sided outside option

23 26 December 2015All Rights Reserved, Edward Tsang Equilibrium with Outside Option xA*xA* Conditions AA w A   A  A w B   B  B 1  w B w A  A (1  w B )w B >  B  B  B w A +(1  B )w A >  A  A w B  B (1  w A ) 1  w B w A >  A (1  w B )w B >  B (1  w A ) wAwA w A + w A > 1–

24 26 December 2015All Rights Reserved, Edward Tsang Equilibrium in Uncertainty – Rub85  2 =  w  2 =  s x1*x1* t*t* x1*x1* t*t* w 0 < w* VsVs 0 VsVs 0 w 0 > w*x w0 0 1 – ((1 – x w0 ) /  w ) 1

25 26 December 2015All Rights Reserved, Edward Tsang Rubinstein Solution vs Experimental Results (  A,  B ) Rubinstein Solution x A ’ Experimental x A μσ (0.4, 0.4)0.71430.89730.0247 (0.4, 0.6)0.52630.50900.0096 (0.4, 0.9)0.15630.14690.1467 (0.9, 0.4)0.93750.91070.0106 (0.9, 0.6)0.86960.80000.1419 (0.9, 0.9)0.52630.50650.1097 (0.9, 0.99)0.09170.14740.1023 x A : agreement made by the best strategies in the final (300 th ) generation Population size 100; Crossover rates 0 to 0.1; Mutation rates 0.01 to 0.5; Tournament size 3

26 Running GP in Bargaining Representation, Evaluation Selection, Crossover, Mutation

27 26 December 2015All Rights Reserved, Edward Tsang Representation  Given  A and  B, every tree represents a constant + 1 BB  AA   1 AA / AA BB  1 BB   1

28 26 December 2015All Rights Reserved, Edward Tsang Population Dynamics Population of strategies Select, Crossover, Mutation New Population Population of strategies Select, Crossover, Mutation New Population Player 1Player 2 Evaluate Fitness

29 26 December 2015All Rights Reserved, Edward Tsang Evaluation  Given the discount factors, each tree is translated into a constant x –It represents the demand represented by the tree.  All trees where x 1 are evaluated using rules defined by the incentive methodincentive method  All trees where 0  x  1 enter game playing  Every tree for Player 1 is played against every tree for Player 2

30 26 December 2015All Rights Reserved, Edward Tsang Evaluation Through Bargaining  Incentive method ignored here for simplicity.46.31.65.20 Player 1 Fitness.75000 0.75.24 0.96.36 0 1.08.590 0 1.18 Player 1 Demands Demands by Player 2’s strategies

31 26 December 2015All Rights Reserved, Edward Tsang Selection  A random number r between 0 and 1 is generated  If, say, r=0.48 (which is >0.43), then rule R3 is selected Rule (Demand)FitnessNormalizedAccumulated R1 (0.75)0.750.19 R2 (0.96)0.960.240.43 R3 (1.08)1.080.270.70 R4 (1.18)1.180.301.00 Sum:3.971

32 26 December 2015All Rights Reserved, Edward Tsang Crossover + 1 BB  AA   1 AA / AA BB  1 BB   1 AA /  1 BB  1 + 1 BB    1 AA AA BB  Parents Offspring

33 26 December 2015All Rights Reserved, Edward Tsang Mutation  With a small probability, a branch (or a node) may be randomly changed + 1 BB    1 AA AA BB  + BB AA  /  1 AA AA BB 

34 Discussions on Bargaining http://cswww.essex.ac.uk/CSP/bargain

35 26 December 2015All Rights Reserved, Edward Tsang What is Rationality?  Are we all logical?  What if Computation is involved?  Does Consequential Closure hold? –If we know P is true and P  Q, then we know Q is true –We know all the rules in Chess, but not the optimal moves due to combinatorial explosioncombinatorial explosion  “Rationality” depends on computation power! –Think faster  “more rational” (The CIDER Theory)The CIDER Theory

36 26 December 2015All Rights Reserved, Edward Tsang Combinatorial Explosion 1 2 3 4 5 6 7 8 ABCD EFG H 124816 3264 128 10 19  Put 1 penny in square 1  2 pennies in square 2  4 pennies in square 3, etc.  Even the world’s richest man can’t afford it –10 19 p = £100,000,000 Billion

37 26 December 2015All Rights Reserved, Edward Tsang Stochastic Search, Motivation  Schedule 30 jobs to 10 machines: –Search space: 10 30 leaf nodes  Generously allow: –Explore one in every 10 10 leaf nodes! –Examine 10 10 nodes per second!  Problem may take 300 years to solve!!! –May be lucky to find first solution –But finding optimality takes time –Complete methods limited by combinatorial explosioncombinatorial explosion

38 Game Theory Hall of Frame John HarsanyiJohn NashReinhard Selten Robert AumannThomas Schelling 1994 Nobel Prize 2005 Nobel Prize Alvin RothLloyd Shapley 2012 Nobel Prize

39 26 December 2015All Rights Reserved, Edward Tsang 1994 Nobel Economic Prize Winners John Harsanyi (Berkeley) Incomplete information John Forbes Nash John Forbes Nash (Princeton) Non-cooperative games Reinhard SeltenReinhard Selten (Bonn) Bounded rationality (after Herbert Simon) Experimental economicsHerbert Simon

40 26 December 2015All Rights Reserved, Edward Tsang 1978 Nobel Economic Prize Winner  Artificial intelligence  “For his pioneering research into the decision- making process within economic organizations"  “The social sciences, I thought, needed the same kind of rigor and the same mathematical underpinnings that had made the "hard" sciences so brilliantly successful. ”  Bounded Rationality –A Behavioral model of Rational Choice 1957 Sources: http://nobelprize.org/economics/laureates/1978/ http://nobelprize.org/economics/laureates/1978/simon-autobio.htmlhttp://nobelprize.org/economics/laureates/1978/ Herbert Simon (CMU) Artificial intelligence

41 26 December 2015All Rights Reserved, Edward Tsang 2005 Nobel Economic Prizes Winners  Robert J. Aumann, and Thomas C. Schelling won 2005’s Noel memorial prize in economic sciences  For having enhanced our understanding of conflict and cooperation through game-theory analysis Robert J. Aumann 75 Thomas C. Schelling 84 Source: http://www.msnbc.msn.com/id/9649575/ Updated: 2:49 p.m. ET Oct. 10, 2005http://www.msnbc.msn.com/id/9649575/

42 26 December 2015All Rights Reserved, Edward Tsang Robert J Aumann Winner of 2005 Nobel Economic Prize  Born 1930  Hebrew Univ of Jerusalem & US National Academy of Sciences  “Producer of Game Theory” (Schelling)  Repeated games  Defined “Correlated Equilibrium” –Uncertainty not random –But depend on info on opponent  Common knowledge

43 26 December 2015All Rights Reserved, Edward Tsang Thomas C Schelling Winner of 2005 Nobel Economic Prize  Born 1921  University of Maryland  “User of Game Theory” (Schelling)  Book “The Strategy of Conflict” 1960 –Bargaining theory and strategic behavior  “Book Arms and Influence” 1966 –foreign affairs, national security, nuclear strategy,...  Paper “Dynamic models of segregation” 1971 –Small preference to one’s neighbour  segregation

44 26 December 2015All Rights Reserved, Edward Tsang Alvin E Roth Winner of 2012 Nobel Economic Prize  Born 1951  Columbia, Stanford  “Game theory for real world problems”  Case study in game theory –Stable Marriage Problem –National Resident Matching Problem

45 26 December 2015All Rights Reserved, Edward Tsang Lloyd Shapley Winner of 2012 Nobel Economic Prize  Born 1923  UCLA  Mathematical economics  Game theory –Including stochastic games –Applications include the stable marriage problem  Shapley-Shubik power index –For ranking planning and group decision-making


Download ppt "26 December 2015All Rights Reserved, Edward Tsang Evolutionary Bargaining Player A Player B Game theoryGame theory: Two players alternative offering gameTwo."

Similar presentations


Ads by Google