On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA 30602 Twenty.

Slides:



Advertisements
Similar presentations
6.896: Topics in Algorithmic Game Theory Lecture 20 Yang Cai.
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Markov Decision Process
Partially Observable Markov Decision Process (POMDP)
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Bayesian Games Yasuhiro Kirihata University of Illinois at Chicago.
An Introduction to... Evolutionary Game Theory
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Compressing Mental Model Spaces and Modeling Human Strategic Intent.
ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 1 Please pick up a copy of the course syllabus from the front desk.
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
1 Graphical Models for Online Solutions to Interactive POMDPs Prashant Doshi Yifeng Zeng Qiongyu Chen University of Georgia Aalborg University National.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
Algoritmi per Sistemi Distribuiti Strategici
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.
Planning under Uncertainty
Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak
Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.
Equilibria in Social Belief Removal Thomas Meyer Meraka Institute Pretoria South Africa Richard Booth Mahasarakham University Thailand.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Harsanyi transformation Players have private information Each possibility is called a type. Nature chooses a type for each player. Probability distribution.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
MAKING COMPLEX DEClSlONS
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.
Chapter 9 Games with Imperfect Information Bayesian Games.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Regret Minimizing Equilibria of Games with Strict Type Uncertainty Stony Brook Conference on Game Theory Nathanaël Hyafil and Craig Boutilier Department.
A Study of Central Auction Based Wholesale Electricity Markets S. Ceppi and N. Gatti.
Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.
Modeling Reasoning in Strategic Situations Avi Pfeffer MURI Review Monday, December 17 th, 2007.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
ECO290E: Game Theory Lecture 13 Dynamic Games of Incomplete Information.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Algorithmic, Game-theoretic and Logical Foundations
Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.
The set of SE models include s those that are BE. It further includes models that include identical distributions over the subject agent’s action observation.
1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
5.1.Static Games of Incomplete Information
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Negotiating Socially Optimal Allocations of Resources U. Endriss, N. Maudet, F. Sadri, and F. Toni Presented by: Marcus Shea.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
ECO290E: Game Theory Lecture 3 Why and How is Nash Equilibrium Reached?
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
Yifeng Zeng Aalborg University Denmark
Biomedical Data & Markov Decision Process
Strategic Information Transmission
Markov Decision Processes
Markov Decision Processes
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Presentation transcript:

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty First National Conference on AI (AAAI 2006) Piotr J. Gmytrasiewicz Dept. of Computer Science University of Illinois at Chicago Chicago, IL 60607

Background on Interactive POMDPs Subjective Equilibrium in I-POMDPs and Sufficient Conditions Difficulty in Satisfying the Conditions Outline

Interactive POMDPs Background –Well-known framework for decision-making in single agent partially observable settings: POMDP –Traditional analysis of multiagent interactions: Game theory Problem “... there is currently no good way to combine game theoretic and POMDP control strategies.” - Russell and Norvig AI: A Modern Approach, 2 nd Ed.

Interactive POMDPs Environment State Optimize an agent’s preferences given beliefs General Problem Setting Beliefs action observation Beliefsobservation action

Interactive POMDPs Key ideas: Integrate game theoretic concepts into a decision theoretic framework –Include possible models of other agents in your decision making  intentional (types) and subintentional models –Address uncertainty by maintaining beliefs over the state and models of other agents  Bayesian learning –Beliefs over intentional models give rise to interactive belief systems  Interactive epistemology, recursive modeling –Computable approximation of the interactive belief system  Finitely nested belief system –Compute best responses to your beliefs  Subjective rationality

Interactive POMDPs Interactive state space –Include models of other agents into the state space Beliefs in I-POMDPs (computable)

Belief Update: The belief update function for I-POMDP i involves: –Use the other agent’s model to predict its action(s) –Anticipate the other agent’s observations and how it updates its model –Use your own observations to correct your beliefs Interactive POMDPs Formal Definition and Relevant Properties Prediction: Correction: Policy Computation – Analogously to POMDPs (given the new belief update)

Example Multiagent Tiger Problem Task: Maximize collection of gold over a finite or infinite number of steps while avoiding tiger Each agent hears growls (GL or GR) as well as creaks (S,CL, or CR) Each agent may open doors or listen (OL,OR, or L) Each agent is unable to perceive other’s observation Agents i & j

Subjective Equilibrium and Conditions for Achieving It

Theoretical Analysis: Joint observation histories (paths of play) in the multiagent tiger problem Subjective Equilibrium in I-POMDPs

Agents i and j’s joint policies induce a true distribution over the future observation sequences True distribution over obs. histories Agent i’s beliefs over j’s models and its own policy induce a subjective distribution over the future observation sequences Subjective distribution over obs. histories

Absolute Continuity Condition (ACC) Subjective distribution should not rule out the observation histories considered possible by the true distribution Cautious beliefs  “Grain of truth” assumption “Grain of truth” is sufficient but not necessary to satisfy the ACC Subjective Equilibrium in I-POMDPs

Proposition 1 (Convergence): Under ACC, an agent’s belief over other’s models updated using the I-POMDP belief update converges with probability 1 –Proof sketch: Show that Bayesian learning in I-POMDPs is a Martingale Apply the Martingale Convergence Theorem (Doob53)  -closeness of distributions: Subjective Equilibrium in I-POMDPs ≤ ≤ 

Lemma (Blackwell&Dubins62): For all agents, if their initial beliefs satisfy ACC, then after finite time T(  ), each of their beliefs are  -close to the true distribution over the future observation paths Subjective  -Equilibrium (Kalai&Lehrer93): A profile of strategies of agents each of which is an exact best response to a belief that is  -close to the true distribution over the observation history –Subjective equilibrium is stable under learning and optimization Prediction

Subjective Equilibrium in I-POMDPs Main Result Proposition 2: If agents’ beliefs within the I-POMDP framework satisfy the ACC, then after finite time T, their strategies are in subjective  -equilibrium, where  is a function of T –When  = 0, subjective equilibrium obtains –Proof follows from the convergence of the I-POMDP belief update and (Blackwell&Dubins62) –ACC is a sufficient condition, but not a necessary one

Difficulty in Practically Satisfying the Conditions

Computational Difficulties in Achieving Equilibrium There exist computable strategies that admit no computable exact best responses ( Nachbar&Zame96 ) If possible strategies are assumed computable, then i’s best response may not be computable. Therefore, j’s cautious beliefs  grain of truth –Subtle tension between prediction and optimization –Strictness of ACC

Computational Difficulties in Achieving Equilibrium Proposition 3 (Impossibility): Within the finitely nested I-POMDP framework, all the agents’ beliefs will never simultaneously satisfy the grain of truth assumption Difficult to realize the equilibrium!

Summary Absolute Continuity Condition (ACC) –More realistic: “grain of truth” condition –Grain of truth condition is stronger than ACC Equilibria in I-POMDPs –Theoretical convergence to subjective equilibrium given ACC Strictness of ACC –Impossible to simultaneously satisfy grain of truth –Computational obstacles to satisfying ACC Future Work: Investigate the connection between subjective equilibrium and Nash equilibrium

Thank You Questions

Introduction Significance: Real world applications 1.Robotics Planetary exploration  Surface mapping by rovers  Coordinate to explore pre- defined region optimally Uncertainty due to sensors Robot soccer  Coordinate with teammates and deceive opponents Anticipate and track others’ actions RoboCup Competition SpiritOpportunity

Interactive POMDPs Limitations of Nash Equilibrium –Not suitable for general control Incomplete: Does not say what to do off-equilibria Non-unique: Multiple solutions, no way to choose “…game theory has been used primarily to analyze environments that are at equilibrium, rather than to control agents within an environment.” - Russell and Norvig AI: A Modern Approach, 2 nd Ed.