Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran, Janusz Marecki, Jay Modi, Ranjit Nair Steven Okamoto, Praveen Paruchuri, Jonathan Pearce, David Pynadath, Nathan Schurr, Pradeep Varakantam, Paul Scerri TEAMCORE GROUP teamcore.usc.edu

Long-Term Research Goal Building heterogeneous, dynamic teams Types of entities: Agents, people, sensors, resources, robots,.. Scale: 1000s or more Domains: Highly uncertain, real-time, dynamic Large-scale disaster rescue Agent facilitated human orgsSpace missions

Key Approaches in Multiagent Systems Market mechanisms Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid DCOP/ POMDP/ AUCTIONS/ BDI Essential in large-scale multiagent teams Synergistic interactions

Why Hybrid Approaches?              Markets BDI Dis POMDPs DCOP Markets BDI Dis POMDPs DCOP BDI-POMDP Hybrid DCOP-POMDP Hybrid Local interactions UncertaintyLocal utility Human usability & plan structure Local interactions UncertaintyLocal utility Human usability & plan structure

Hybrid Multiagent Systems: Examples Team Scale & Complexity Task complexity Small-scale homogeneous Small-scale heterogeneous Large-scale heterogeneous Low Medium High Papers Meet Maker Teamcore proxy Scheduler agent Teamcore proxy Interest Matcher Teamcore proxy Reschedule meetings Decide presenters Order our meals Teamcore proxy Teamcore proxy “ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner

Task scheduling/allocation algorithms Communication algorithms (Scerri et al 03, Pynadath/Tambe 03) TEAMCORE PROXIES BDI Team Plan Role1Role_iRole_n Role allocation Adjustable autonomy Communication Task allocation Adjustable autonomy BDIGames Hybrid Teams: Underlying Algorithms DCOP POMDP Multi-agent Agent-human (adjustable autonomy) Offline optimal DCOP : ADOPT Complete (Modi et al 04; Ali et al 05) BDI+POMDP hybrid : (Nair & T, 05) POMDPs: Transfer-of-control policy (Scerri et al, 02; Varakantham et al, 05) Online approx DCOP+Games: K-coordination incomplete ( Maheswaran, et al 04) DCOP : LA incomplete (Scerri et al 05) BDI plans: (Schurr et al 05) Multi-agentAgent-human Explicit BDI + decision-theoretic filter : (Pynadath & T, 03) Distributed POMDPs (Pynadath & T, 02; Nair, et al 04) Implicit BDI : Plan Recognition (Kaminka & T 00) Distributed POMDPs : (Nair et al 03) Probabilistic plan recognition : Overhearing (Kaminka et al 02)

Outline: Sampling Recent Results Task allocation: Agent-human, offline (adjustable autonomy) Key result: BDI/POMDP hybrid; speedup POMDPs Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, on-line Domain :

Adjustable Autonomy in Teams (Scerri, Pynadath & T, JAIR 02; Chalupsky et al, IAAI 01) Agents dynamically adjust level of autonomy: Key question: When transfer control to humans & when not? Previous: one-shot transfer-of-control human or agent Too rigid in teams, e.g. human fails to decide quickly, miscoordination  Agents misbehave, humans apologize Solution: MDPs for flexible back-&-forth transfer-of-control policies e.g. ask-human, delay, ask-human, cancel-team-activity Address user response uncertainty, costs to the team,… Exploit hybrids for decompositions into smaller MDPs

E-Elves: Hybrids and Adjustable Autonomy Reschedule meetings Teamcore proxy BDI Team Plan Communication Role allocation Adjustable autonomy Proxy algorithms Communication Role allocation Adj. Autonomy: MDPs for transfer-of-control policies meeting M1 Role: Enable user attend; Avoid waste of team time BDI Plans provide structure with roles MDPs operate within BDI structure

Time Personal Assistants: Observational Uncertainty Problem: Monitor user over time & decide, e.g., transfer-of-control policy Observational uncertainty, e.g. user observations POMDPs not MDPs: policy generation extremely slow Dynamic belief bounds (Max belief probability): big speedups Reduces region of search for dominant policy vectors …. States 0.7 0.3 0.9 0 0.8 0.9 0.5 0.1 …. 1.0 0 0

Speedups: Dynamic Belief Bounds (DB) (Varakantam, Maheswaran, T AAMAS’05) Using Lagrangian techniques, solve the following in poly-time:

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Key result: Distributed POMDPs; Analysis of BDI programs Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, on-line Domain :

Multiagent Tiger Problem AB open left, open right, listen, Communicate? open left, open right, listen, or Communicate ? 12 Shared reward Listen has ― small cost ― unreliable What did 2 hear?What did 1 hear? Communication cost Reset on open What is the best joint policy over horizon T?

COM-MTDP (Pynadath & T, JAIR ’ 02) Communicating Multiagent Team Decision Problem : S: set of world states. S = {SL, SR} A = × i A i : set of joint actions. A i = {OpenLeft, OpenRight, Listen} P: state transition function Ω : set of joint observations. Ω i = {HL, HR} O: O i : joint observation function R: joint reward function ActionSL→SLSL→SRSR→SRSR→SL 0.5 1.00 0 ActionStateHLHR SL0.850.15 *0.5 ActionSLSR +20-50 -100 -2

COM-MTDP Bi : Belief state (each Bi history of observations, messages) Individual policies   : B i  i (Domain action) E.g.  Listen   : B i  i (Communication) Goal: Find joint policies   and   maximize total expected reward over T   communication capabilities R  : communication cost ActionSLSR CC

Complexity Results in COM-MTDP Individual observability Collective observability Collective Partial obser No observability. No communication P-complete NEXP complete NEXP complete NP complete General communication P-complete NEXP complete NEXP Complete NP complete Full communication P-complete PSPACE complete NP complete Complexity: Local optimality Hybrids

JESP: Joint Equilibrium Search in COM-MTDP (Nair et al, IJCAI 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy of all agents except K Generate K’s optimal response Optimal response policy for K, given fixed policies for other agents: Transformed to a single-agent POMDP problem: “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states

JESP and Communication (Nair, T, Yokoo & Roth, AAMAS’04) Run-time vs communication frequency Synchronized Compact belief state t=3 (SL (HL HL) p1) (SL (HL HR) p2) (SL (HR HR) p3) (SL (HR HL) p4) (SR (HL HL) p5) (SR (HL HR) p6) (SR (HR HR) p7) (SR (HR HL) p8) JESP

BDI + COM-MTDP Hybrid I RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) AND F matches goal of team’s plan AND (F  team state) Then possible communicative goal CG to communicate F RULE2 (TEAMCORE: Rule1+Rule2): If possible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG Why hybrid with BDI ? Ease of use for human developers Hierarchical plan & organization structure improve scalability BUT: Quantitative team optimization difficult (given uncertainty/cost) COM-MTDP: Quantiative evaluation of communication heuristics

BDI + POMDP Hybrid I Domain Team Plans Communication Role allocation Adjustable autonomy Proxy algorithms Distributed POMDP Model  Heuristic   A  A : Fixed action policy COM-MTDP: Evaluate alternate communication policies Feedback for modifying proxy communication algorithms Derive locally, globally optimal communication Policy  Optimal  PROXIES

Compare Communication Policies Given domain, for different observability conditions & comm costs:  Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal  Optimal: TEAMCORE : O(|S||  |) T

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Key result: Synergistic interaction: BDI+POMDP Domain : Task allocation: Multiagent, on-line Domain:

BDI+POMDP II: Role Allocation Task: Urgently move supplies from X to some refugees at Y Three routes with varying length Scouts make a route safe for transports Uncertainty: In actions and observations Scouts may fail along a route (and transports may replace scouts) Scouts’ failure may not be observable to transports How many scouts?On which routes? XY Route1 Route2 Route3 Scout Transport FAIL

BDI+POMDP II: Role Allocation RoboCup Rescue

Hybrid BDI+POMDP for Role Allocation (Nair & T JAIR 05) Domain Team Plans Role allocation Communication Adjustable autonomy Proxy algorithms Distributed POMDP Model MTDP: Evaluate alternate role-taking policies Feedback for specific role allocation in team plans Search policy space for optimal role-taking policy  Opt Role-taking Policy for executing roles POMDPs optimize team plan: Role allocation Team plans constrain POMDP policy search: Significant speedup

BDI-POMDP Hybrids: Advantage II Belief-based policy evaluation Not entire observation histories, only beliefs required by BDI plans E.g.  history: T=1: ; T=2: T=2:

BDI-POMDP Hybrids: Advantage III Organization hierarchy Plan hierarchy Best role allocation: How many helos in SctTeam A, B, C & Transport — More efficient policy search exploiting structure Exploit BDI team plan structure for more efficient policy search Humans specify BDI team plans

BDI-POMDP Hybrid: Advantage III Hierarchical Policy Groups 6Helos in task force 6 15 6 06 6 24 6 33 6 42 6 1 5 100 6 1 5 001 6 2 4 110 6 2 4 002 ……. 4167 0 1926 2773 3420 2926 6 24 3420

BDI-POMDP Hybrid: Advantage III Obtaining upperbound Policy Group Value Obtain max for each component over all start states & observation histories Dependence: Start of next component based on end state of previous Why speedup: Avoid duplicate start states & observation histories 6 24 3420 DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] [84][3300][36] Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 … SafeRoute=1 Transport=3 SafeRoute=2 Transport=4 …

Helicopter Domain: BDI + POMDP Synergy BDI helps POMDPsPOMDPs help BDI

RoboCup Rescue BDI helps POMDPs POMDPs help BDI Distributed POMDP

Outline Task allocation: Agent-human, offline (adjustable autonomy) Domain : Communication: Multiagent, explicit & implicit Domain: Task allocation: Multiagent, off-line Domain : Task allocation: Multiagent, off-line Distributed constraint optimization + distributed POMDPs Distributed constraint optimization + graphical game perspective Domain :

ADOPT algorithm for DCOPs (Modi, Shen, T, Yokoo AIJ 05) x1 x2 x3x4 Cost = 0 x1 x2 x3x4 Cost = 7 di dj f(di,dj) 1 2 0 ADOPT: First asynchronous complete algorithm ADOPT’s asynchrony significant speedups

Speeding up ADOPT via Preprocessing (Maheswaran et al, AAMAS’04; Ali, Koenig, T, AAMAS’05)

Hybrid DCOP-POMDP (Nair, T, Varakantam, Yokoo, AAAI’05) Add uncertainty in DCOP Add interaction structure to distributed POMDPs Networked-distributed-POMDPs Exploits network interaction locality Locally interacting distributed JESP Locally optimal joint policy Significant speedups over JESP x1 x2 x3x4

Hybrid: DCOP and Graphical Games (Pearce, Maheswaran, T PDCS’04) Incomplete, i.e. locally optimal algorithms k-optimal algorithm: k agents maximize local utility No k-subset can deviate to improve quality Key features: Multiple diverse k-optimal solutions Each of high relative quality Robust against k failures Higher global quality

Summary Game Theory Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Folk Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid techniques: First class citizenship Synergies: Build on each other’s strengths Key future research: just the beginning of hybrid techniques  Science of hybrid techniques in multiagent systems  Positive (or negative) interactions among techniques?  How to exploit (or avoid) them?  Exploit hybrids in large-scale multiagent systems

Humans in Agent Worlds Virtual environments Training, entertainment, education Large numbers of agents Realistic interactions with humans Los Angeles Fire Dept Large-scale disaster simulations (Schurr, Marecki, T, Scerri, IAAI’05)

Agents in Human Worlds World is growing flat & interconnected Command and control disappearing Geography is history Collaborations important Cross regional, national boundaries Research on agent teams: “agents net” infrastructure Allow rapid virtual organizations

Thank You Mentors Collaborators Prof. LesserProf. Grosz Prof. Yokoo Prof. Kraus

Thank You! TEAMCORE@USC Spring’05

CONTACT Milind Tambe tambe@usc.edu http://teamcore.usc.edu THANK YOU!

Thank You! TEAMCORE@USC

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

Similar presentations

Presentation on theme: "Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

Similar presentations

Presentation on theme: "Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,"— Presentation transcript:

Similar presentations

About project

Feedback