Download presentation
Presentation is loading. Please wait.
Published byShannon Baker Modified over 9 years ago
1
Satisfaction Equilibrium Stéphane Ross
2
Canadian AI 20062 / 21 Problem In real life multiagent systems : Agents generally do not know the preferences (rewards) of their opponents Agents may not observe the actions of their opponents In this context, most game theoretic solution concepts are hardly applicable We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play
3
Canadian AI 20063 / 21 Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
4
Canadian AI 20064 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
5
Canadian AI 20065 / 21 Game model : Number of agents : Joint action space : Set of possible outcomes, the outcome function., agent i’s reward function. Agent only knows, and. After each turn, every agent observes an outcome.
6
Canadian AI 20066 / 21 Game model Observations: The agents do not know the game matrix They are unable to compute best responses and Nash Equilibrium. They can only reason on their history of actions and rewards. Aa,?b,? Bc,?d,? a,b,c,d
7
Canadian AI 20067 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
8
Canadian AI 20068 / 21 Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should keep playing the same strategy An unsatisfied agent may decide to change its strategy according to some exploration function An equilibrium will arise when all agents are satisfied. Satisfaction Equilibrium
9
Canadian AI 20069 / 21 Formally : is the satisfaction function of agent : if (agent i is satisfied) if (agent i is not satisfied) is the satisfaction threshold of agent A joint strategy is a satisfaction equilibrium if : Satisfaction Equilibrium
10
Canadian AI 200610 / 21 Example Prisoner’s dilemma Possible satisfaction matrix : CD C-1, -1-10, 0 D0, -10-8,-8 Dominant strategy : D Nash Equilibrium : (D,D) Pareto-Optimal : (C,C), (D,C), (C,D) CD C1, 1 0, 1 D 1, 00, 0 CD C1, 1 0, 1 D 1, 0 1, 1
11
Canadian AI 200611 / 21 Satisfaction Equilibrium However, even if a satisfaction equilibrium exists, it may be unreachable : ABC A1,1 0,1 B 1,0 0,1 C 1,00,11,0
12
Canadian AI 200612 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
13
Canadian AI 200613 / 21 Satisfaction Equilibrium Learning If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly We can also use other exploration functions which favour actions that have not been explored often Ex:
14
Canadian AI 200614 / 21 Satisfaction Equilibrium Learning We use a simple update rule: When the agent is satisfied, we increment its satisfaction threshold by some variable If the agent is unsatisfied, we decrement its satisfaction threshold of is multiplied by a factor each turn such that it converges to 0 We also use a limited history of our previous satisfaction states and thresholds for each action to bound the value of the satisfaction threshold
15
Canadian AI 200615 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
16
Canadian AI 200616 / 21 Results Fixed satisfaction thresholds In simple games, we were always able to reach a satisfaction equilibrium. Using a biased exploration improves the speed of convergence of the algorithm. Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium in simple games. Using a biased exploration improves the convergence percentage of the algorithm. The factor and history size affects the convergence of the algorithm and need to be adjusted to get optimal results.
17
Canadian AI 200617 / 21 Results – Prisoner’s dilemma
18
Canadian AI 200618 / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
19
Canadian AI 200619 / 21 Conclusion It is possible to learn stable outcomes without observing anything but our own rewards Satisfaction equilibria can be defined on any Pareto-Optimal solution. However, satisfaction equilibria are not always reachable The proposed learning algorithms achieves good performance in simple games However, they require game-specific adjustments for optimal performance
20
Canadian AI 200620 / 21 Conclusion For more information, you can consult my publications at: http://www.damas.ift.ulaval.ca/~ross http://www.damas.ift.ulaval.ca/~ross Thank You!
21
Canadian AI 200621 / 21 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.