Satisfaction Equilibrium Stéphane Ross
Canadian AI / 21 Problem In real life multiagent systems : Agents generally do not know the preferences (rewards) of their opponents Agents may not observe the actions of their opponents In this context, most game theoretic solution concepts are hardly applicable We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play
Canadian AI / 21 Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Canadian AI / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions
Canadian AI / 21 Game model : Number of agents : Joint action space : Set of possible outcomes, the outcome function., agent i’s reward function. Agent only knows, and. After each turn, every agent observes an outcome.
Canadian AI / 21 Game model Observations: The agents do not know the game matrix They are unable to compute best responses and Nash Equilibrium. They can only reason on their history of actions and rewards. Aa,?b,? Bc,?d,? a,b,c,d
Canadian AI / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
Canadian AI / 21 Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should keep playing the same strategy An unsatisfied agent may decide to change its strategy according to some exploration function An equilibrium will arise when all agents are satisfied. Satisfaction Equilibrium
Canadian AI / 21 Formally : is the satisfaction function of agent : if (agent i is satisfied) if (agent i is not satisfied) is the satisfaction threshold of agent A joint strategy is a satisfaction equilibrium if : Satisfaction Equilibrium
Canadian AI / 21 Example Prisoner’s dilemma Possible satisfaction matrix : CD C-1, -1-10, 0 D0, -10-8,-8 Dominant strategy : D Nash Equilibrium : (D,D) Pareto-Optimal : (C,C), (D,C), (C,D) CD C1, 1 0, 1 D 1, 00, 0 CD C1, 1 0, 1 D 1, 0 1, 1
Canadian AI / 21 Satisfaction Equilibrium However, even if a satisfaction equilibrium exists, it may be unreachable : ABC A1,1 0,1 B 1,0 0,1 C 1,00,11,0
Canadian AI / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
Canadian AI / 21 Satisfaction Equilibrium Learning If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly We can also use other exploration functions which favour actions that have not been explored often Ex:
Canadian AI / 21 Satisfaction Equilibrium Learning We use a simple update rule: When the agent is satisfied, we increment its satisfaction threshold by some variable If the agent is unsatisfied, we decrement its satisfaction threshold of is multiplied by a factor each turn such that it converges to 0 We also use a limited history of our previous satisfaction states and thresholds for each action to bound the value of the satisfaction threshold
Canadian AI / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
Canadian AI / 21 Results Fixed satisfaction thresholds In simple games, we were always able to reach a satisfaction equilibrium. Using a biased exploration improves the speed of convergence of the algorithm. Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium in simple games. Using a biased exploration improves the convergence percentage of the algorithm. The factor and history size affects the convergence of the algorithm and need to be adjusted to get optimal results.
Canadian AI / 21 Results – Prisoner’s dilemma
Canadian AI / 21 Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Presentation Plan
Canadian AI / 21 Conclusion It is possible to learn stable outcomes without observing anything but our own rewards Satisfaction equilibria can be defined on any Pareto-Optimal solution. However, satisfaction equilibria are not always reachable The proposed learning algorithms achieves good performance in simple games However, they require game-specific adjustments for optimal performance
Canadian AI / 21 Conclusion For more information, you can consult my publications at: Thank You!
Canadian AI / 21 Questions ?