Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

Slides:

Advertisements

Similar presentations

Dialogue Policy Optimisation

Advertisements

Adopt Algorithm for Distributed Constraint Optimization

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

MULTI-ROBOT SYSTEMS Maria Gini (work with Elizabeth Jensen, Julio Godoy, Ernesto Nunes, abd James Parker,) Department of Computer Science and Engineering.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Decision Theoretic Planning

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

Security via Strategic Randomization Milind Tambe Fernando Ordonez Praveen Paruchuri Sarit Kraus (Bar Ilan, Israel) Jonathan Pearce, Jansuz Marecki James.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Planning under Uncertainty

Bulding Practical Agent Teams: A hybrid perspective Milind Tambe Computer Science Dept University of Southern California Joint work with.

Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara

Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara.

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

Generalizing Plans to New Environments in Multiagent Relational MDPs Carlos Guestrin Daphne Koller Stanford University.

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

AFOSR MURI. Salem, MA. June 4, /10 Coordinated UAV Operations: Perspectives and New Results Vishwesh Kulkarni Joint Work with Jan De Mot, Sommer.

1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

Optimization technology Recent history dates back to invention of Operations Research (OR) techniques by mathematicians such as George Dantzig (1940’s)

Multirobot Coordination in USAR Katia Sycara The Robotics Institute

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.

Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.

Distributed Scheduling. What is Distributed Scheduling? Scheduling: –A resource allocation problem –Often very complex set of constraints –Tied directly.

Instructor: Vincent Conitzer

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

MAKING COMPLEX DEClSlONS

Distributed Real-Time Systems for the Intelligent Power Grid Prof. Vincenzo Liberatore.

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Controlling and Configuring Large UAV Teams Paul Scerri, Yang Xu, Jumpol Polvichai, Katia Sycara and Mike Lewis Carnegie Mellon University and University.

Ness Shroff Dept. of ECE and CSE The Ohio State University Grand Challenges in Methodologies for Complex Networks.

Presented: 11/05/09http://teamcore.usc.edu Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport Milind Tambe, Jason Tsai,

TEAMCORE: Rapid, Robust Teams From Heterogeneous, Distributed Agents Milind Tambe & David V. Pynadath.

Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

Dangers in Multiagent Rescue using DEFACTO Janusz Marecki Nathan Schurr, Milind Tambe, University of Southern California Paul Scerri Carnegie Mellon University.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

Multiagent System Katia P. Sycara 일반대학원 GE 랩 성연식.

1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.

The DEFACTO System: Training Incident Commanders Nathan Schurr Janusz Marecki, Milind Tambe, Nikhil Kasinadhuni, and J. P. Lewis University of Southern.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Control-Theoretic Approaches for Dynamic Information Assurance George Vachtsevanos Georgia Tech Working Meeting U. C. Berkeley February 5, 2003.

Algorithmic and Domain Centralization in Distributed Constraint Optimization Problems John P. Davin Carnegie Mellon University June 27, 2005 Committee:

Keep the Adversary Guessing: Agent Security by Policy Randomization

Intelligent Agents (Ch. 2)

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Multi-Agent Exploration

The story of distributed constraint optimization in LA: Relaxed

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran, Janusz Marecki, Jay Modi, Ranjit Nair Steven Okamoto, Praveen Paruchuri, Jonathan Pearce, David Pynadath, Nathan Schurr, Pradeep Varakantam, Paul Scerri TEAMCORE GROUP teamcore.usc.edu

Long-Term Research Goal Building heterogeneous, dynamic teams Types of entities: Agents, people, sensors, resources, robots,.. Scale: 1000s or more Domains: Highly uncertain, real-time, dynamic Large-scale disaster rescue Agent facilitated human orgsSpace missions

Key Approaches in Multiagent Systems Market mechanisms Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid DCOP/ POMDP/ AUCTIONS/ BDI Essential in large-scale multiagent teams Synergistic interactions

Why Hybrid Approaches?              Markets BDI Dis POMDPs DCOP Markets BDI Dis POMDPs DCOP BDI-POMDP Hybrid DCOP-POMDP Hybrid Local interactions UncertaintyLocal utility Human usability & plan structure Local interactions UncertaintyLocal utility Human usability & plan structure

Hybrid Multiagent Systems: Examples Team Scale & Complexity Task complexity Small-scale homogeneous Small-scale heterogeneous Large-scale heterogeneous Low Medium High Papers Meet Maker Teamcore proxy Scheduler agent Teamcore proxy Interest Matcher Teamcore proxy Reschedule meetings Decide presenters Order our meals Teamcore proxy Teamcore proxy “ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner

Task scheduling/allocation algorithms Communication algorithms (Scerri et al 03, Pynadath/Tambe 03) TEAMCORE PROXIES BDI Team Plan Role1Role_iRole_n Role allocation Adjustable autonomy Communication Task allocation Adjustable autonomy BDIGames Hybrid Teams: Underlying Algorithms DCOP POMDP Multi-agent Agent-human (adjustable autonomy) Offline optimal DCOP : ADOPT Complete (Modi et al 04; Ali et al 05) BDI+POMDP hybrid : (Nair & T, 05) POMDPs: Transfer-of-control policy (Scerri et al, 02; Varakantham et al, 05) Online approx DCOP+Games: K-coordination incomplete ( Maheswaran, et al 04) DCOP : LA incomplete (Scerri et al 05) BDI plans: (Schurr et al 05) Multi-agentAgent-human Explicit BDI + decision-theoretic filter : (Pynadath & T, 03) Distributed POMDPs (Pynadath & T, 02; Nair, et al 04) Implicit BDI : Plan Recognition (Kaminka & T 00) Distributed POMDPs : (Nair et al 03) Probabilistic plan recognition : Overhearing (Kaminka et al 02)

Outline: Sampling Recent Results Task allocation: Agent-human, offline (adjustable autonomy) Key result: BDI/POMDP hybrid; speedup POMDPs Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, on-line Domain :

Adjustable Autonomy in Teams (Scerri, Pynadath & T, JAIR 02; Chalupsky et al, IAAI 01) Agents dynamically adjust level of autonomy: Key question: When transfer control to humans & when not? Previous: one-shot transfer-of-control human or agent Too rigid in teams, e.g. human fails to decide quickly, miscoordination  Agents misbehave, humans apologize Solution: MDPs for flexible back-&-forth transfer-of-control policies e.g. ask-human, delay, ask-human, cancel-team-activity Address user response uncertainty, costs to the team,… Exploit hybrids for decompositions into smaller MDPs

E-Elves: Hybrids and Adjustable Autonomy Reschedule meetings Teamcore proxy BDI Team Plan Communication Role allocation Adjustable autonomy Proxy algorithms Communication Role allocation Adj. Autonomy: MDPs for transfer-of-control policies meeting M1 Role: Enable user attend; Avoid waste of team time BDI Plans provide structure with roles MDPs operate within BDI structure

Time Personal Assistants: Observational Uncertainty Problem: Monitor user over time & decide, e.g., transfer-of-control policy Observational uncertainty, e.g. user observations POMDPs not MDPs: policy generation extremely slow Dynamic belief bounds (Max belief probability): big speedups Reduces region of search for dominant policy vectors …. States …

Speedups: Dynamic Belief Bounds (DB) (Varakantam, Maheswaran, T AAMAS’05) Using Lagrangian techniques, solve the following in poly-time:

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Key result: Distributed POMDPs; Analysis of BDI programs Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, on-line Domain :

Multiagent Tiger Problem AB open left, open right, listen, Communicate? open left, open right, listen, or Communicate ? 12 Shared reward Listen has ― small cost ― unreliable What did 2 hear?What did 1 hear? Communication cost Reset on open What is the best joint policy over horizon T?

COM-MTDP (Pynadath & T, JAIR ’ 02) Communicating Multiagent Team Decision Problem : S: set of world states. S = {SL, SR} A = × i A i : set of joint actions. A i = {OpenLeft, OpenRight, Listen} P: state transition function Ω : set of joint observations. Ω i = {HL, HR} O: O i : joint observation function R: joint reward function ActionSL→SLSL→SRSR→SRSR→SL ActionStateHLHR SL *0.5 ActionSLSR

COM-MTDP Bi : Belief state (each Bi history of observations, messages) Individual policies   : B i  i (Domain action) E.g.  Listen   : B i  i (Communication) Goal: Find joint policies   and   maximize total expected reward over T   communication capabilities R  : communication cost ActionSLSR CC

Complexity Results in COM-MTDP Individual observability Collective observability Collective Partial obser No observability. No communication P-complete NEXP complete NEXP complete NP complete General communication P-complete NEXP complete NEXP Complete NP complete Full communication P-complete PSPACE complete NP complete Complexity: Local optimality Hybrids

JESP: Joint Equilibrium Search in COM-MTDP (Nair et al, IJCAI 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy of all agents except K Generate K’s optimal response Optimal response policy for K, given fixed policies for other agents: Transformed to a single-agent POMDP problem: “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states

JESP and Communication (Nair, T, Yokoo & Roth, AAMAS’04) Run-time vs communication frequency Synchronized Compact belief state t=3 (SL (HL HL) p1) (SL (HL HR) p2) (SL (HR HR) p3) (SL (HR HL) p4) (SR (HL HL) p5) (SR (HL HR) p6) (SR (HR HR) p7) (SR (HR HL) p8) JESP

BDI + COM-MTDP Hybrid I RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) AND F matches goal of team’s plan AND (F  team state) Then possible communicative goal CG to communicate F RULE2 (TEAMCORE: Rule1+Rule2): If possible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG Why hybrid with BDI ? Ease of use for human developers Hierarchical plan & organization structure improve scalability BUT: Quantitative team optimization difficult (given uncertainty/cost) COM-MTDP: Quantiative evaluation of communication heuristics

BDI + POMDP Hybrid I Domain Team Plans Communication Role allocation Adjustable autonomy Proxy algorithms Distributed POMDP Model  Heuristic   A  A : Fixed action policy COM-MTDP: Evaluate alternate communication policies Feedback for modifying proxy communication algorithms Derive locally, globally optimal communication Policy  Optimal  PROXIES

Compare Communication Policies Given domain, for different observability conditions & comm costs:  Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal  Optimal: TEAMCORE : O(|S||  |) T

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Key result: Synergistic interaction: BDI+POMDP Domain : Task allocation: Multiagent, on-line Domain:

BDI+POMDP II: Role Allocation Task: Urgently move supplies from X to some refugees at Y Three routes with varying length Scouts make a route safe for transports Uncertainty: In actions and observations Scouts may fail along a route (and transports may replace scouts) Scouts’ failure may not be observable to transports How many scouts?On which routes? XY Route1 Route2 Route3 Scout Transport FAIL

BDI+POMDP II: Role Allocation RoboCup Rescue

Hybrid BDI+POMDP for Role Allocation (Nair & T JAIR 05) Domain Team Plans Role allocation Communication Adjustable autonomy Proxy algorithms Distributed POMDP Model MTDP: Evaluate alternate role-taking policies Feedback for specific role allocation in team plans Search policy space for optimal role-taking policy  Opt Role-taking Policy for executing roles POMDPs optimize team plan: Role allocation Team plans constrain POMDP policy search: Significant speedup

BDI-POMDP Hybrids: Advantage II Belief-based policy evaluation Not entire observation histories, only beliefs required by BDI plans E.g.  history: T=1: ; T=2: T=2:

BDI-POMDP Hybrids: Advantage III Organization hierarchy Plan hierarchy Best role allocation: How many helos in SctTeam A, B, C & Transport — More efficient policy search exploiting structure Exploit BDI team plan structure for more efficient policy search Humans specify BDI team plans

BDI-POMDP Hybrid: Advantage III Hierarchical Policy Groups 6Helos in task force ……

BDI-POMDP Hybrid: Advantage III Obtaining upperbound Policy Group Value Obtain max for each component over all start states & observation histories Dependence: Start of next component based on end state of previous Why speedup: Avoid duplicate start states & observation histories DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] [84][3300][36] Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 … SafeRoute=1 Transport=3 SafeRoute=2 Transport=4 …

Helicopter Domain: BDI + POMDP Synergy BDI helps POMDPsPOMDPs help BDI

RoboCup Rescue BDI helps POMDPs POMDPs help BDI Distributed POMDP

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, off-line Distributed constraint optimization + distributed POMDPs Distributed constraint optimization + graphical game perspective Domain :

ADOPT algorithm for DCOPs (Modi, Shen, T, Yokoo AIJ 05) x1 x2 x3x4 Cost = 0 x1 x2 x3x4 Cost = 7 di dj f(di,dj) ADOPT: First asynchronous complete algorithm ADOPT’s asynchrony significant speedups

Speeding up ADOPT via Preprocessing (Maheswaran et al, AAMAS’04; Ali, Koenig, T, AAMAS’05)

Hybrid DCOP-POMDP (Nair, T, Varakantam, Yokoo, AAAI’05) Add uncertainty in DCOP Add interaction structure to distributed POMDPs Networked-distributed-POMDPs Exploits network interaction locality Locally interacting distributed JESP Locally optimal joint policy Significant speedups over JESP x1 x2 x3x4

Hybrid: DCOP and Graphical Games (Pearce, Maheswaran, T PDCS’04) Incomplete, i.e. locally optimal algorithms k-optimal algorithm: k agents maximize local utility No k-subset can deviate to improve quality Key features: Multiple diverse k-optimal solutions Each of high relative quality Robust against k failures Higher global quality

Summary Game Theory Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Folk Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid techniques: First class citizenship Synergies: Build on each other’s strengths Key future research: just the beginning of hybrid techniques  Science of hybrid techniques in multiagent systems  Positive (or negative) interactions among techniques?  How to exploit (or avoid) them?  Exploit hybrids in large-scale multiagent systems

Humans in Agent Worlds Virtual environments Training, entertainment, education Large numbers of agents Realistic interactions with humans Los Angeles Fire Dept Large-scale disaster simulations (Schurr, Marecki, T, Scerri, IAAI’05)

Agents in Human Worlds World is growing flat & interconnected Command and control disappearing Geography is history Collaborations important Cross regional, national boundaries Research on agent teams: “agents net” infrastructure Allow rapid virtual organizations

Thank You Mentors Collaborators Prof. LesserProf. Grosz Prof. Yokoo Prof. Kraus

Thank You! Spring’05

CONTACT Milind Tambe THANK YOU!

Thank You!