Presentation is loading. Please wait.

Presentation is loading. Please wait.

Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.

Similar presentations


Presentation on theme: "Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences."— Presentation transcript:

1 Satisfaction Equilibrium Stéphane Ross

2 Canadian AI 20062 / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences (rewards) of their opponents  Agents may not observe the actions of their opponents In this context, most game theoretic solution concepts are hardly applicable We may try to define equilibrium concepts that :  do not require complete information  are achievable through learning, over repeated play

3 Canadian AI 20063 / 21 Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

4 Canadian AI 20064 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

5 Canadian AI 20065 / 21 Game model : Number of agents : Joint action space : Set of possible outcomes, the outcome function., agent i’s reward function. Agent only knows, and. After each turn, every agent observes an outcome.

6 Canadian AI 20066 / 21 Game model Observations:  The agents do not know the game matrix  They are unable to compute best responses and Nash Equilibrium.  They can only reason on their history of actions and rewards. Aa,?b,? Bc,?d,? a,b,c,d

7 Canadian AI 20067 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan

8 Canadian AI 20068 / 21 Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning:  If an agent is satisfied by its current reward, it should keep playing the same strategy  An unsatisfied agent may decide to change its strategy according to some exploration function An equilibrium will arise when all agents are satisfied. Satisfaction Equilibrium

9 Canadian AI 20069 / 21 Formally :  is the satisfaction function of agent : if (agent i is satisfied) if (agent i is not satisfied)  is the satisfaction threshold of agent A joint strategy is a satisfaction equilibrium if :  Satisfaction Equilibrium

10 Canadian AI 200610 / 21 Example Prisoner’s dilemma Possible satisfaction matrix : CD C-1, -1-10, 0 D0, -10-8,-8 Dominant strategy : D Nash Equilibrium : (D,D) Pareto-Optimal : (C,C), (D,C), (C,D) CD C1, 1 0, 1 D 1, 00, 0 CD C1, 1 0, 1 D 1, 0 1, 1

11 Canadian AI 200611 / 21 Satisfaction Equilibrium However, even if a satisfaction equilibrium exists, it may be unreachable : ABC A1,1 0,1 B 1,0 0,1 C 1,00,11,0

12 Canadian AI 200612 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan

13 Canadian AI 200613 / 21 Satisfaction Equilibrium Learning If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning:  Choose a strategy randomly  If satisfied, keep playing the same strategy  Else choose a new strategy randomly We can also use other exploration functions which favour actions that have not been explored often  Ex:

14 Canadian AI 200614 / 21 Satisfaction Equilibrium Learning We use a simple update rule:  When the agent is satisfied, we increment its satisfaction threshold by some variable  If the agent is unsatisfied, we decrement its satisfaction threshold of  is multiplied by a factor each turn such that it converges to 0 We also use a limited history of our previous satisfaction states and thresholds for each action to bound the value of the satisfaction threshold

15 Canadian AI 200615 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan

16 Canadian AI 200616 / 21 Results Fixed satisfaction thresholds  In simple games, we were always able to reach a satisfaction equilibrium.  Using a biased exploration improves the speed of convergence of the algorithm. Learning the satisfaction thresholds  We are generally able to learn the optimal satisfaction equilibrium in simple games.  Using a biased exploration improves the convergence percentage of the algorithm.  The factor and history size affects the convergence of the algorithm and need to be adjusted to get optimal results.

17 Canadian AI 200617 / 21 Results – Prisoner’s dilemma

18 Canadian AI 200618 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan

19 Canadian AI 200619 / 21 Conclusion It is possible to learn stable outcomes without observing anything but our own rewards Satisfaction equilibria can be defined on any Pareto-Optimal solution.  However, satisfaction equilibria are not always reachable The proposed learning algorithms achieves good performance in simple games  However, they require game-specific adjustments for optimal performance

20 Canadian AI 200620 / 21 Conclusion For more information, you can consult my publications at:  http://www.damas.ift.ulaval.ca/~ross http://www.damas.ift.ulaval.ca/~ross Thank You!

21 Canadian AI 200621 / 21 Questions ?


Download ppt "Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences."

Similar presentations


Ads by Google