Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

Slides:

Advertisements

Similar presentations

Dialogue Policy Optimisation

Advertisements

Markov Decision Process

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

Department of Computer Science Undergraduate Events More

Decision Theoretic Planning

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

An Introduction to Markov Decision Processes Sarah Hickmott

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Markov Decision Processes

Planning under Uncertainty

Bulding Practical Agent Teams: A hybrid perspective Milind Tambe Computer Science Dept University of Southern California Joint work with.

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara.

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Department of Computer Science Undergraduate Events More

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Distributed Scheduling. What is Distributed Scheduling? Scheduling: –A resource allocation problem –Often very complex set of constraints –Tied directly.

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

Instructor: Vincent Conitzer

MAKING COMPLEX DEClSlONS

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.

Department of Computer Science Christopher Amato Carnegie Mellon University Feb 5 th, 2010 Increasing Scalability in Algorithms for Centralized and Decentralized.

Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.

Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

A Study of Central Auction Based Wholesale Electricity Markets S. Ceppi and N. Gatti.

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

Algorithmic, Game-theoretic and Logical Foundations

1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.

Software Multiagent Systems: CS543 Milind Tambe University of Southern California

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

Department of Computer Science Undergraduate Events More

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Keep the Adversary Guessing: Agent Security by Policy Randomization

Announcements Grader office hours posted on course website

Making complex decisions

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Intelligent Agents (Ch. 2)

Multi-Agent Exploration

Markov Decision Processes

Markov Decision Processes

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

Instructor: Vincent Conitzer

Chapter 17 – Making Complex Decisions

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

Teamwork When agents act together

Understanding Teamwork  Ordinary traffic  Driving in a convoy  Two friends A & B together drive in a convoy  B is secretly following A  Pass play in Soccer  Contracting with a software company  Orchestra

Understanding Teamwork Not just a union of simultaneous coordinated actions Different from contracting Together Joint Goal Co-labor  Collaborate

Why Teamwork? Why not: Master-Slave? Contracts?

Why Teams Robust organizations Responsibility to substitute Mutual assistance Information communicated to peers Still capable of structure (not necessarily flat) Subteams, subsubteams Variations in capabilities and limitations

Approach Theory Practical teamwork architectures

Taking a step back…

Key Approaches in Multiagent Systems Market mechanisms Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid DCOP/ POMDP/ AUCTIONS/ BDI Essential in large-scale multiagent teams Synergistic interactions

Key Approaches for Multiagent Teams    Markets BDI Dis POMDPs Local interactions UncertaintyLocal utility Human usability & plan structure DCOP      Markets BDI Dis POMDPs Local interactions UncertaintyLocal utility Human usability & plan structure DCOP BDI-POMDP Hybrid

Distributed POMDPs Three papers on the web pages: What to read: Ignore all the proofs Ignore complexity results JAIR article: the model and the results at the end Understand fundamental principles

Domain: Teamwork for Disaster Response

Multiagent Team Decision Problem (MTDP) MTDP: S: s1, s2, s3… Single global world state, one per epoch A: domain-level actions; A = {A1, A2, A3,…An} Ai is a set of actions for each agent i Joint action

MTDP P: Transition function: P(s’ | s, a1, a2, …an) R A : Reward R(s, a1, a2,…an) One common reward; not separate Central to teamwork

MTDP (cont’d)  : observations Each agent: different finite sets of possible observations  O: probability of observation O(destination-state, joint-action, joint-observation) P(o1,o2..om | a1, a2,…am, s’)

Simple Scenario Cost of action: -0.2 Must fight fires together Observe own location and fire status

MTDP Policy  he problem: Find optimal JOINT policies One policy for each agent  i : Action policy Maps belief state into domain actions (Bi  A) for each agent Belief state: sequence of observations

MTDP Domain Types Collectively partially observable: general case, no assumptions Collectively observable: Team (as a whole) observes state For all joint observations, there is a state s, such that, for all other states s’ not equal to s, Pr (o1,o2…on | s’) = 0 Pr (o1, o2, …on | s ) = ? Pr (s | o1,o2..on) = ? Individually observable: each agent observes the state For all individual observations, there is a state s, such that for all other states s’ not equal to s, Pr (oi | s’) = 0

From MTDP to COM-MTDP Two separate actions: communication vs domain actions Two separate reward types: Communication rewards and domain rewards Total reward: sum two rewards Explicit treatment of communication Analysis

Communicative MTDPs(COM-MTDPs)  : communication capabilities, possible “speech acts” e.g., “I am moving to fire1.” R  : communication cost (over messages) e.g., saying, “I am moving to fire1,” has a cost R   Why ever communicate?

Two Stage Decision Process Agent World Observes Actions SE1 P1 b1 P2 SE2 b2 Communications to and from P1: Communication policy P2: Action policy Two state estimators Two belief State updates

COM-MTDP Continued  Belief state (each Bi history of observations, Communication) Two stage belief update Stage 1: Pre-communication belief state for agent i (updates just from observations)  i 0      i 1      i t-1   t-1   i t   Stage 2: Post-communication belief state for i (updates from observations and communication)  i 0      i 1      i t-1   t-1   i t   t  Cannot create probability distribution over states

COM-MTDP Continued  he problem: Find optimal JOINT policies One policy for each agent   : Communication policy Maps pre-communication belief state into message (Bi   for each agent  A : Action policy Maps post-communication belief state into domain actions (Bi  A) for each agent

More Domain Types General Communication: no assumptions on R  Free communication: R  (s,  ) = 0 No communication: R  (s,  ) is negatively infinite

Teamwork Complexity Results Individual observability Collective observability Collective Partial obser. No communication P-complete NEXP complete NEXP complete General communication P-complete NEXP complete NEXP complete Full communication P-complete PSPACE complete

Classifying Different Models Individual observability Collective observability Collective Partial obser. No communication MMDP DEC-POMDP POIPSG General communication XUAN-LESSER COM-MTDP Full communication

True or False If agents communicated all their observations at each step then the distributed POMDP would be essentially a single agent POMDP In distributed POMDPs, each agent plans its own policy Solving Distributed POMDPs with two agents is of same complexity as solving two separate individual POMDPs

Algorithms

NEXP-complete No known efficient algorithms Brute force search 1. Generate space of possible joint policies 2. For each policy in policy space 3.Evaluate over finite horizon T Complexity: No. of policies Cost of evaluation

Locally optimal search Joint equilibrium based search for policies JESP

Nash Equilibrium in Team Games Nash equilibrium vs Global optimal reward for the team 3,67,1 5,18,2 6,06,2 x y z uv A B x y z uv A B

JESP: Locally Optimal Joint Policy x y z uv A B w Iterate keeping one agent’s policy fixed More complex policies the same way

Joint Equilibrium-based Search Description of algorithm: 1. Repeat until convergence 2.For each agent i 3.Fix policy of all agents apart from i 4.Find policy for i that maximizes joint reward Exhaustive-JESP: brute force search in policy space of agent I Expensive

JESP: Joint Equilibrium Search (Nair et al, IJCAI 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy for all except agent K Find optimal response policy for agent K Optimal response policy for K, given fixed policies for others in MTDP: Transformed to a single-agent POMDP problem: “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states Fast computation of optimal response

Extended State, Belief State Sample progression of beliefs: HL and HR are observations a2: Listen

Run-time Results Method Exhaustive-JESP DP-JESP

Is JESP guaranteed to find the global optimal? Random restarts

Not All Agents are Equal Scaling up Distributed POMDPs for Agent Networks

Runtime

POMDP vs. distributed POMDP Distributed POMDPs more complex Joint transition and observation functions Better policy Free communication = POMDP Less dependency = lower complexity

BDI vs. distributed POMDP BDI teamworkDistributed POMDP teamwork Explicit joint goalExplicit joint reward Plan/organization hierarchiesUnstructured plans/teams Explicit commitmentsImplicit commitments No costs / uncertaintiesCosts & uncertainties included