Download presentation
Presentation is loading. Please wait.
Published byEzra Crawford Modified over 9 years ago
1
Bulding Practical Agent Teams: A hybrid perspective Milind Tambe tambe@usc.edu Computer Science Dept University of Southern California Joint work with the TEAMCORE GROUP http://teamcore.usc.edu SBIA 2004
2
Long-Term Research Goal Building large-scale heterogeneous teams Types of entities: Agents, people, sensors, resources, robots,.. Scale: 1000s or more Domains: Highly uncertain, real-time, dynamic Activities: Form teams, persist for long durations, coordinate, adapt… Some applications: Large-scale disaster rescue Agent facilitated human orgs Large area security
3
Domains and Motivations Team Scale & Complexity Task & domain complexity Small-scale homogeneous Small-scale heterogeneous Large-scale heterogeneous Low Medium High
4
Motivation: BDI+POMDP Hybrids Teamcore proxy Team proxy TOP: Team plans, organizations, agents Extinguish Fires Execute Rescue civilians Extinguish [Ambulance team] [RAP team] [Fire company] Clear Roads Compute Optimal Policy using Distributed Partially Observable Markov Decision Processes (POMDPs) BDI approach Frameworks: Teamcore/Machinetta, GPGP,… +ve: Ease of use for human developers; coordinate large-scale teams -ve: Quantitative team evaluations difficult (given uncertainty/cost) Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,… +ve: Quantitative of team performance evaluation easy (with uncertainty) -ve: Scale-up difficult, difficult for human developers to program policies Distributed POMDP approach
5
BDI + POMDP Synergy Teamcore proxy Team proxy Extinguish Fires Execute Rescue civilians Extinguish [Ambulance] [RAP team] [Fire company] Clear Roads Distributed POMDPs for TOP & proxy analysis and refinement Combine “traditional” TOP approaches with distributed POMDPs POMDPs improve TOP/proxies: E.g., Improve role allocation TOP constrain POMDP policy search: Orders of magnitude speedup
6
Role allocation algorithmsCommunication algorithms Overall Research Framework Teamwork proxy infrastructure Offline, optimalOn-line approximate Agent-agent Adopt DCOP: Asynch complete distributed constraint optimize (Modi et al, 03, Maheswar et al 04) Equilibrium, threshold (Okamoto,03, Maheswar et al, 04) Agent-human (Adjustable autonomy) Optimal transfer-of- control strategies (via MDPs/POMDPs) (Scerri et al,’02) ? Explicit Implicit (Plan recognition) Agent- agent BDI teamwork theories + decision theoretic filter (Pynadath/Tambe, 03) Socially attentive Monitoring (Kaminka et al 00) Agent- human ? Monitoring by overhearing (Kaminka et al 02) Distributed POMDP Analysis: Multiagent Team Decision Problem (MTDP) (Nair et al 03b, Nair et al 04, Paruchuri et al 04)
7
Electric Elves: 24/7 from 6/00 to 12/00 (Chalupsky et al, IAAI’2001) “ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner Papers Meet Maker Teamcore proxy Scheduler agent Teamcore proxy Interest Matcher Teamcore proxy Reschedule meetings Decide presenters Order our meals Teamcore proxy Teamcore proxy
8
Modules within the Proxies: AA (Scerri, Pynadath and Tambe, JAIR’2002) Reschedule meetings Teamcore proxy Team-oriented Program Communication Role allocation Adjustable autonomy Proxy algorithms Communication Role allocation Adj. Autonomy: MDPs for transfer-of-control policies : Meeting Role: user arrives on time MDP Policies: Planned sequence of transfers of control, coordination changes E.g., ADAH: Ask, delay, ask, cancel
9
Back to Hybrid BDI-POMDP Frameworks
10
Motivation: Communication in Proxies Proxy’s heuristic “BDI” communication rules example: RULE1(“joint intentions” {Levesque et al 90}): If (fact F agent’s private state) AND F matches goal of team’s plan AND (F team state) Then possible communicative goal CG to communicate F RULE2: If possible communicative goal CG AND ( miscoordination-cost > Communication-cost) Then Communicate CG
11
Motivation: Earlier BDI Evaluation CommNoComm ISIS97-CMUnited973.271.73 ISIS97-Andhill97- 3.38-4.36 ISIS98-CMUnited974.043.91 ISIS98-Andhill97-1.53-2.13 Helicopter domain Testing Communication Selectivity (Pynadath & Tambe, JAAMAS’03) Testing teamwork in RoboCup (Tambe et al, IJCAI’99) Quantiative analysis of optimality or complexity of optimal response difficult Challenge in domains with significant uncertainty and costs
12
Distributed POMDPs COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03) S: states of the world (e.g., helicopter position, enemy position) Ai: Actions (Communicate action, domain action ) P: State transition probabilities R: Reward; sub-divided based on action types STATE ii
13
COM-MTDP: Analysis of Comunication : observations (e.g., E enemy-on-radar, NE enemy-not-on-radar) O: probability of observation given destination state & past action Belief state (each Bi history of observations, messages) Individual policies : B i i (Domain action) : B i i (Communication) Goal: Find joint policies and maximize total expected reward STATE E,EE,NENE,NENE,E 0.10.40.10.4 Table per state, previous action Landmark1, Landmark2, E,NE…
14
Complexity Results in COM-MTDP Individual observability Collective observability Collective Partial obser No observability. No communication P-complete NEXP complete NEXP complete NP complete General communication P-complete NEXP complete NEXP Complete NP complete Full communication P-complete PSPACE complete NP complete Complexity: I.Locally optimal solution (No global team optimality) II.Hybrid approach: POMDP + BDI
15
Approach I: Locally Optimal Policy (Nair et al 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy for all except agent K Find optimal response policy for agent K Find optimal response policy for agent K, given fixed policies for others: Problem becomes finding an optimal policy for a single agent POMDP “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states Significant speedup over exhaustive search, but problem size limited
16
II: Hybrid BDI + POMDP Domain Team-oriented Program Communication Role allocation Adjustable autonomy Proxy algorithms Distributed POMDP Model (Exploit TOP) Vary Commun policies A A : Fixed action policy COM-MTDP: Evaluate alternate communication policies Feedback for modifying proxy communication algorithms Derive locally, globally optimal communication Policy Optimal
17
Compare Communication Policies over Different Domains Given domain, for different observability conditions & comm costs: Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal Optimal: TEAMCORE : O(|S|| |) T
18
Distributed POMDPs to Analyze Role Allocations: RMTDP
19
Role Allocation: Illustration Task: Move cargo from X to Y, large reward for cargo at destination Three routes with varying length and failure rates Scouts make a route safe for transports Uncertainty: In actions and observations Scouts may fail along a route (and transports may replace scouts) Scouts failure rate decreases if more scouts to a route Scouts’ failure may not be observable to transports
20
Team-Oriented Program Organization hierarchy Plan hierarchy Best initial role allocation: How many helos in SctTeam A, B, C & Transport TOP: Almost entire RMTDP policy is completely fixed Policy gap only on step 1: Best role allocation in initial state for each agent Assume six helicopter agents: 84 combinations (84 RMTDP policies)
21
Analyzing Role Allocation in Teamwork Domain Team-oriented Program Role allocation Communication Adjustable autonomy Proxy algorithms Distributed POMDP Model R-MTDP: Evaluate alternate role-taking policies Feedback for specific role allocation in TOP Search policy space for optimal role-taking policy Opt Role-taking Role execution Policy S1 S2 S3 S4 S5 ? …. Fill in gaps In policies
22
RMTDP Policy Search: Efficiency Improvements Belief-based policy evaluation Not entire observation histories, only beliefs required by TOP Form hierarchical policy groups for branch-&-bound search Obtain upper bound on values of policies within a policy-group If individual policies higher valued than a group, prune the group Exploit TOP for generating policy groups, and for upper bounds E.g., history: T=1: ; T=2: history: T=1: ; T=2: E.g., T=2:
23
MaxExp: Hierarchical Policy Groups 6 6 15 6 06 6 24 6 33 6 42 6 1 5 100 6 1 5 001 6 2 4 110 6 2 4 002 ……. 4167 0 1926 2773 3420 2926
24
MaxExp: Upperbound Policy Group Value Obtain max for each component over all start states & observation histories If each component independent: Can evaluate each separately Dependence: Start of next component based on end state of previous Why speedup: No duplicate start states: multiple paths of previous component merge No duplicate observation histories 6 24 3420 DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] [84][3300][36] Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 … SafeRoute=1 Transport=3 SafeRoute=2 Transport=4 …
25
Helicopter Domain: Computational Savings NOPRUNE-OBS: No pruning, maintain full observation history NOPRUNE: No pruning, maintain beliefs not observation histories MAXEXP: Pruning using MAXEXP heuristic, using beliefs NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound
26
Does RMTDP Improve Role Allocation?
27
RoboCup Rescue: Computational Savings
28
RoboCupRescue: RMTDP Improves Role Allocation
29
SUMMARY Team proxy COM-MTDP & R-MTDP: Distributed POMDPs for analysis Combine “traditional” TOP approaches with distributed POMDPs Exploit POMDPs to improve TOP/teamcore proxies Exploit TOP to constrain POMDP policy search Key policy evaluation complexity results TOP: Team plans organizations, agents
30
Future Work Agent-based Simulation technology Visualization Trainee
31
Thank You Contact: Milind Tambe tambe@usc.edu http://teamcore.usc.edu/tambe http://teamcore.usc.edu
32
Key Papers cited in this Presentation Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004).Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004.Towards a formalization of teamwork with resource constraints, Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), 2004. (Post-script/PDF)."Communication for Improving Policy Computation in Distributed POMDPs"Post-scriptPDF Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004."Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling" Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted)“Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7:71--100, 2003.** [pdf] **Automated teamwork among heterogeneous software agents and humans** [pdf] ** Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003Role allocation and reallocation in multiagent teams: Towards a practical analysis Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003A prototype infrastructure for distributed robot, agent, person teams Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages 171-228 ** [pdf] **Towards adjustable autonomy for the real-world** [pdf] ** Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002The communicative multiagent team decision problem: Analyzing teamwork theories and models Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **Monitoring teams by overhearing: A multiagent plan-recognition approach** [pdf] **
33
All the Co-authors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.