Download presentation
Presentation is loading. Please wait.
1
6/2/2001 Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong and D.J. Wu
2
6/2/2001 Research Motivation How design and control cooperative agent systems in strategic situation. How well do different identify-centric agents perform against each other. How well do various adaptive mechanism perform. Value of intelligence: What intelligence buys you?
3
6/2/2001 Methodology Adaptive artificial agents play iterated ultimatum game. Ultimatum game is the most fundamental building block for negotiation (e.g., Croson, 1996) Reinforcement learning (a simple version) Regimes of play –Two agents play against each other –Populations of different type of agents
4
6/2/2001 One-shot Ultimatum Game Two players A and B. Player A has endowment of N. Player A offers x [0, N] (N = 100 in this study) Player B can either accept the offer or reject the offer. A x B (N-x, x) (0,0) Accept Reject
5
6/2/2001 One-shot Ultimatum Game (Cont.) Classical Game Theory –Player A offer a tiny amount , and player B will always accept this offer. –Infinite number of Nash Equilibria along the line of x + y = N. Behavior Game Theory –Human beings in the lab do not behave as classical game theory predicted (e.g, people tends to be fair, and reject offers that do not meet their threshold amounts of share).
6
6/2/2001 Repeated Ultimatum Game A “supergame” consists of iterations of the ultimatum game. Indefinite episodes –Agents do not know how many iterations are yet to come. No single best strategy for the repeated ultimatum game.
7
6/2/2001 Reinforcement Learning Favoring actions producing better results. Estimating the values of state-action pairs. Sample-average for estimation/evaluation. -greedy for selection.
8
6/2/2001 Reinforcement Learning (Cont.) Algorithm Initialize Q(s, a) = 0 Repeat for each episode Choose action a from current state Receive immediate payoff r, and arrive at the next state. Q(s, a) <- Q B (s, a)*(k-1)/k + r/k Until n episodes have been played.
9
6/2/2001 Experiment 0: Repeated One- Shot Game Agents have no memory of past actions. Agents find the game-theoretic result. No cooperation among agents.
10
6/2/2001 Experiment 1: Learning Agent Against Fixed Rules Fixing player B’s strategy IF (currentOffer >= p * Endowment) Accept currentOffer. ELSE Reject currentOffer. 0 < p < 1
11
6/2/2001 Experiment 1 (Cont.) Player A will propose an offer no greater than his last offer if player B accepted his last offer. Player A eventually learns the value of p, and proposes only the amount of pN.
12
6/2/2001 Experiment 2: Learning Agent Against Dynamic Rules The value of p is changing along the game playing period. Agent A can track the change very well given enough time periods The values of p in different episodes 120005000700010000 p0.400.350.450.600.40
13
6/2/2001 Experiment 3: Learning Agent Against Rotating Rules The value of p is changing with a rotating pattern, i.e. p t-1 =.40, p t =.50, p t+1 =.60. Player A converges to a proposal of 60 which the highest value of p * 100. Memory of at least one previous move might lead player A track the rotated rules.
14
6/2/2001 Experiment 4: Learning Simultaneously Both agents have memory of one previous move. Player B chooses the value of p for each episode according to: IF b t –1 is “accept” THEN p t = d t-1 / N ELSE p t [0, N] / N
15
6/2/2001 Experiment 4 (Cont.) Decision-making process using finite automata Agent A: Reject Accept Reject dd*ad*a d* b Accept
16
6/2/2001 Experiment 4 (Cont.) Agent B: D D C D p = d p* C C p
17
6/2/2001 Experiment 4 - Result Cooperation emerges through co-evolution within 2000 episodes. Player A converges at proposing 55 or 56, and correspondingly, player B converges at setting his lower limit at 55 or 56.
18
6/2/2001 Value of Intelligence Will smart agents be able to do better than dumb ones through learning? Experiment: –5a: A population of smart agents play against a population of various dumb agents –5b: A population of smart agents play against each other and against a population of various dumb agents.
19
6/2/2001 Experiment 5a: One Smart Agent vs. Multiple Dumb Agents Three types of dumb agents using fixed rules: –db1: demand/accept 70 or higher; –db2: demand/accept 50 or higher; –db3: demand/accept 30 or higher. Smart agent learns via reinforcement learning. There is 25 percent possibility that a smart agent can be chosen to play the game. Tracking the changing population of dumb agents for each generation.
20
6/2/2001 Experiment 5a : Process Draw one smart agent with 25 percent possibility; otherwise draw one dumb agent randomly in proportional to their frequency. Draw another dumb agent randomly in proportional to their frequency. Decide the role of each agent (proposer or responder). Agents play the one-shot game against each other. Go to the first step until a certain number of games, e.g. 1000 episodes, has been completed. Update frequency of the dumb agent.
21
6/2/2001 Experiment 5a – Results. Fair dumb (db2: demand/accept 50 or higher) agents take over the dumb agent population. Smart agents learn to be fair.
22
6/2/2001 Experiment 5a – Result (Cont.)
23
6/2/2001 Experiment 5a – Result (Cont.)
24
6/2/2001 Experiment 5b: Multiple Smart Agents vs. Dumb Agents Smart agents can play against each other.
25
6/2/2001 Experiment 5b (Cont.)
26
6/2/2001 Comparison of 5a & 5b
27
6/2/2001 Impact of Memory Repeat experiment 5a and 5b, but introduce different memory size for each experiment.
28
6/2/2001 Conclusions Artificial agents using reinforcement learning are able to play the ultimatum game efficiently and effectively. Agent intelligence and memory have impacts on performance. Agent-based approach replicates and explains real human behavior better.
29
6/2/2001 Future Research Toward cooperative agent systems in strategic situations in virtual communities, especially in electronic commerce such as in supply chains. Currently investigating two versions of the trust games: “The classical economic trust game” vs. “The Mad Mex Game”. Comments?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.