Decision Theory and Game Theory

Decision Theory and Game Theory
An Introduction to Multi-Agent Systems Multi-Agent Systems: Algorithm, Game-Theoretic, and Logic Foundations

Decision Theory Probability Self-interested agents
Utilities and Preferences Rationality

Decision Theory Decision Theory is a game between agents and nature.
Such as Lotteries, slot machines.

Probability Xi is a variable that captures some aspect of the current state of the environment. is a possible value of Xi. is some possibility of  [0, 1]. . Joint probability if X1 and X2 are independent. If not

Self-Interested Agents
What does it mean to say that an agent is self-interested? Not that they want to harm other agents Not that they only care about things that benefit them That the agent has its preference over how the environment is, and that its actions are motivated by this description

State-Action Diagram Set of agents Each agent has a set of actions
Actions define outcomes For each possible set of actions there is an outcome. Outcomes define payoffs Agents’ derive utility from different outcomes 5 10 -8

Utilities and Preferences
Assume we have just two agents: Ag = {i, j} Assume W = {w1, w2, …}is the set of “outcomes” that agents have preferences over We capture preferences by utility functions: ui = W   uj = W   Utility functions lead to preference orderings over outcomes: w i w’ means ui(w)  ui(w’) preferred w >i w’ means ui(w) > ui(w’) strictly preferred w ~i w’ means w i w’ or w’ i w indifferent

What is Utility? Utility is not money (but it is a useful analogy)
Typical relationship between utility & money:

Rationality Agent attempts to maximize its expected utility. u 1 0.5
Risk seeking Risk neutral Risk averse 1 0.5 Lottery 1: $0.5M w.p. 1 Lottery 2: $1M w.p. 0.5 $ w.p. 0.5 Agent’s strategy is the choice of lottery Risk aversion => insurance companies

Game Theory What is a game Strategies Dominant strategies
Nash equilibrium Pareto optimality Competition games Cooperation games Coordination games Axelrod’s tournament Bounded rationality

What is a Game Game: Formal representation of a situation of strategic interdependence We focus on games where: There are 2 or more players. There is some choice of action where strategy matters. The game has one or more outcomes, e.g. someone wins, someone loses. The outcome depends on the strategies chosen by all players; there is strategic interaction. What does this rule out? Games of pure chance, e.g. lotteries, slot machines. (Strategies don't matter). Games without strategic interaction between players, e.g. Solitaire

Strategies Strategy: Strategy profile: s=(s1,…,sn)
A strategy, si, is a comprehensive plan of action; defines actions agent i should take for all possible states of the world Prisoners’ Dilemma: Defect, Confess Strategy profile: s=(s1,…,sn) s-i = (s1,…,si-1,si+1,…,sn) Utility function: ui(si, s-i) Note that the utility of an agent depends on the strategy profile, not just its own strategy We assume agents are expected utility maximizers

Three Elements of a Game:
The players how many players are there? Two-players and Two-actions does nature/chance play a role? Pure strategies Mixed strategies (with probability) A complete description of the strategies of each player A description of the consequences (payoffs) for each player for every possible profile of strategy choices of all players.

Assumptions Game Theorists Make
Payoffs are known and fixed. People treat expected payoffs the same as certain payoffs (they are risk neutral). Example: a risk neutral person is indifferent between $25 for certain or a 25% chance of earning $100 and a 75% chance of earning 0. We can relax this assumption to capture risk averse behavior. All players behave rationally. They understand and seek to maximize their own payoffs. They are flawless in calculating which actions will maximize their payoffs. The rules of the game are common knowledge: Each player knows the set of players, strategies and payoffs from all possible combinations of strategies.

Multi-Agent Encounters
We need a model of the environment in which these agents will act… agents simultaneously choose an action to perform, and as a result of the actions they select, an outcome in W will result the actual outcome depends on the combination of actions assume each agent has just two possible actions that it can perform, C (“cooperate”) and D (“defect”) Environment behavior given by state transformer function:

Multiagent Encounters
Here is a state transformer function: (This environment is sensitive to actions of both agents.) Here is another: (Neither agent has any influence in this environment.) And here is another: (This environment is controlled by j.)

Rational Action Suppose we have the case where both agents can influence the outcome, and they have utility functions as follows: With a bit of abuse of notation: Then agent i’s preferences are: “C” is the rational choice for i. (Because i prefers all outcomes that arise through C over all outcomes that arise through D.)

Normal Form We can characterize the previous scenario in a payoff matrix: Agent i is the column player Agent j is the row player Normal form is a way of describing a game. It represent the game by way of a matrix. This approach can be of greater use in identifying Dominated Strategies and Nash Equilibra.

Dominant Strategies Given any particular strategy (either C or D) of agent i, there will be a number of possible outcomes We say s1 dominates s2 if every outcome possible by i playing s1 is preferred over every outcome possible by i playing s2 A rational agent will never play a dominated strategy So in deciding what to do, we can delete dominated strategies Unfortunately, there isn’t always a unique undominated strategy

Nash Equilibrium In general, we will say that two strategies s1 and s2 are in Nash equilibrium if: under the assumption that agent i plays s1, agent j can do no better than play s2; and under the assumption that agent j plays s2, agent i can do no better than play s1. Neither agent has any incentive to deviate from a Nash equilibrium

Nash Equilibrium Interpretations: Criticisms
Focal points, self-enforcing agreements, stable social convention, consequence of rational inference. Criticisms They may not be unique (Bach or Stravinsky) Ways of overcoming this Refinements of equilibrium concept, Mediation, Learning Do not exist in all games (in form defined) They may be hard to find People don’t always behave based on what equilibria would predict (ultimatum games and notions of fairness,…)

Pareto Optimality Sometimes, one outcome O is at least as good for every agent as another outcome O`, and there is some agent who strictly prefers O to O` In this case, it seems reasonable to say that O is better than O` We say that O Pareto-dominates O` An outcome O* is Pareto-optimal (or Pareto-efficient) if there is no other outcome that Pareto-dominates it. Implied by social welfare maximization.

Competition Games Where preferences of agents are diametrically opposed we have strictly competitive scenarios Competition Games Players have exactly opposed interests There must be precisely two players (otherwise they can’t have exactly opposed interests) For all strategy profile sS,  some constant C, s.t. ui(s) + uj(s) = C

Zero-Sum Games ui(s) + uj(s) = 0
Zero-sum games are those where utilities sum to zero: ui(s) + uj(s) = 0 Eg. Matching Pennies (a zero-sum game) i Head Tail H 1, -1 -1, 1 T j

Cooperation Game Players have exactly the same interests.
No conflict: all players want the same things sS, i,j, ui(s) = uj(s) i A B 1, 1 0, 0 j Two Nash equilibria: (A, A) and (B, B) They are also Pareto optimality

Coordination Game - The Prisoner’s Dilemma
Prisoner’s Dilemma is any game where T > R > P > S. D C P, P S, T T, S R, R

Prisoner’s Dilemma Two people are arrested for a crime. If neither suspect confesses both are released. If both confess then they get sent to jail. If one confesses and the other does not, then the confessor gets a light sentence and the other gets a heavy sentence. Remain Silent = Cooperate Confess = Defect D C P, P T, S S, T R, R

The Prisoner’s Dilemma
Payoff matrix for prisoner’s dilemma: Top left: If both defect, then both get punishment for mutual defection Top right: If i cooperates and j defects, i gets sucker’s payoff of 1, while j gets 4 Bottom left: If j cooperates and i defects, j gets sucker’s payoff of 1, while i gets 4 Bottom right: Reward for mutual cooperation

The individual rational action is defect This guarantees a payoff of no worse than 2, whereas cooperating guarantees a payoff of at most 1 So defection is the best response to all possible strategies: both agents defect, and get payoff = 2 But intuition says this is not the best outcome: Surely they should both cooperate and each get payoff of 3!

This apparent paradox is the fundamental problem of multi-agent interactions. It appears to imply that cooperation will not occur in societies of self-interested agents. Real world examples Nuclear arms reduction (“why don’t I keep mine …”) Free rider systems – public transport In the UK – television licenses

Axelrod’s Tournament Suppose you play iterated prisoner’s dilemma against a range of opponents… What strategy should you choose, so as to maximize your overall payoff? Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma

Strategies in Axelrod’s Tournament
ALLD: “Always defect” — the hawk strategy; TIT-FOR-TAT: On round u = 0, cooperate On round u > 0, do what your opponent did on round u – 1 TESTER: On 1st round, defect. If the opponent retaliated, then play TIT-FOR-TAT. Otherwise intersperse cooperation and defection. JOSS: As TIT-FOR-TAT, except periodically defect

Recipes for Success in Axelrod’s Tournament
Axelrod suggests the following rules for succeeding in his tournament: Don’t be envious: Don’t play as if it were zero sum! Be nice: Start by cooperating, and reciprocate cooperation Retaliate appropriately: Always punish defection immediately, but use “measured” force — don’t overdo it Don’t hold grudges: Always reciprocate cooperation immediately

Game of Chicken Consider another type of encounter — the game of chicken: (Think of James Dean in Rebel without a Cause: swerving = coop, driving straight = defect.) Strategies (c,d) and (d,c) are in Nash equilibrium Difference to prisoner’s dilemma: Mutual defection is most feared outcome. (Whereas sucker’s payoff is most feared in prisoner’s dilemma.) It refers to a situation in which there is a competition for a shared resource and the contestants can choose either conciliation or conflict.

Bounded Rationality By Herbert Simon, perfectly rational decisions are often not feasible in practice due to the finite computational resources available for making them. Game theory assumes that it is possible to characterize an agent’s preferences with respect to possible outcomes. Humans, however, find it extremely hard to consistently define their preference over outcomes. Most game theoretic negotiation techniques tend to assume the availability of unlimited computational resources to find an optimal solution – they have the characteristics of NP-hard problems.

Decision Theory and Game Theory

Similar presentations

Presentation on theme: "Decision Theory and Game Theory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Theory and Game Theory

Similar presentations

Presentation on theme: "Decision Theory and Game Theory"— Presentation transcript:

Similar presentations

About project

Feedback