Bulding Practical Agent Teams: A hybrid perspective Milind Tambe Computer Science Dept University of Southern California Joint work with.

Slides:

Advertisements

Similar presentations

A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.

Advertisements

Dialogue Policy Optimisation

Adopt Algorithm for Distributed Constraint Optimization

Markov Decision Process

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Partially Observable Markov Decision Process (POMDP)

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

Some questions o What are the appropriate control philosophies for Complex Manufacturing systems? Why????Holonic Manufacturing system o Is Object -Oriented.

MULTI-ROBOT SYSTEMS Maria Gini (work with Elizabeth Jensen, Julio Godoy, Ernesto Nunes, abd James Parker,) Department of Computer Science and Engineering.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

SARSOP Successive Approximations of the Reachable Space under Optimal Policies Devin Grady 4 April 2013.

Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

Security via Strategic Randomization Milind Tambe Fernando Ordonez Praveen Paruchuri Sarit Kraus (Bar Ilan, Israel) Jonathan Pearce, Jansuz Marecki James.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Planning under Uncertainty

Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

Conflicts about Teamwork: Hybrids to the Rescue Milind Tambe University of Southern California with Emma Bowring, Hyuckchul Jung, Gal Kaminka, Rajiv Maheswaran,

Multirobot Coordination in USAR Katia Sycara The Robotics Institute

Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.

Distributed Constraint Optimization * some slides courtesy of P. Modi

01/16/2002 Reliable Query Reporting Project Participants: Rajgopal Kannan S. S. Iyengar Sudipta Sarangi Y. Rachakonda (Graduate Student) Sensor Networking.

Instructor: Vincent Conitzer

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Controlling and Configuring Large UAV Teams Paul Scerri, Yang Xu, Jumpol Polvichai, Katia Sycara and Mike Lewis Carnegie Mellon University and University.

Presented: 11/05/09http://teamcore.usc.edu Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport Milind Tambe, Jason Tsai,

TEAMCORE: Rapid, Robust Teams From Heterogeneous, Distributed Agents Milind Tambe & David V. Pynadath.

ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Dangers in Multiagent Rescue using DEFACTO Janusz Marecki Nathan Schurr, Milind Tambe, University of Southern California Paul Scerri Carnegie Mellon University.

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

Multiagent System Katia P. Sycara 일반대학원 GE 랩 성연식.

Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.

1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.

Proposal of Asynchronous Distributed Branch and Bound Atsushi Sasaki†, Tadashi Araragi†, Shigeru Masuyama‡ †NTT Communication Science Laboratories, NTT.

Software Multiagent Systems: CS543 Milind Tambe University of Southern California

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

The DEFACTO System: Training Incident Commanders Nathan Schurr Janusz Marecki, Milind Tambe, Nikhil Kasinadhuni, and J. P. Lewis University of Southern.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Keep the Adversary Guessing: Agent Security by Policy Randomization

Intelligent Agents (Ch. 2)

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Multi-Agent Exploration

The story of distributed constraint optimization in LA: Relaxed

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Leveraging AI for Disaster Preparedness and Response

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Bulding Practical Agent Teams: A hybrid perspective Milind Tambe Computer Science Dept University of Southern California Joint work with the TEAMCORE GROUP SBIA 2004

Long-Term Research Goal Building large-scale heterogeneous teams Types of entities: Agents, people, sensors, resources, robots,.. Scale: 1000s or more Domains: Highly uncertain, real-time, dynamic Activities: Form teams, persist for long durations, coordinate, adapt… Some applications: Large-scale disaster rescue Agent facilitated human orgs Large area security

Domains and Motivations Team Scale & Complexity Task & domain complexity Small-scale homogeneous Small-scale heterogeneous Large-scale heterogeneous Low Medium High

Motivation: BDI+POMDP Hybrids Teamcore proxy Team proxy TOP: Team plans, organizations, agents Extinguish Fires Execute Rescue civilians Extinguish [Ambulance team] [RAP team] [Fire company] Clear Roads Compute Optimal Policy using Distributed Partially Observable Markov Decision Processes (POMDPs) BDI approach Frameworks: Teamcore/Machinetta, GPGP,… +ve: Ease of use for human developers; coordinate large-scale teams -ve: Quantitative team evaluations difficult (given uncertainty/cost) Frameworks: MTDP, DEC-MDP/DEC-POMDP, POIPSG,… +ve: Quantitative of team performance evaluation easy (with uncertainty) -ve: Scale-up difficult, difficult for human developers to program policies Distributed POMDP approach

BDI + POMDP Synergy Teamcore proxy Team proxy Extinguish Fires Execute Rescue civilians Extinguish [Ambulance] [RAP team] [Fire company] Clear Roads Distributed POMDPs for TOP & proxy analysis and refinement Combine “traditional” TOP approaches with distributed POMDPs POMDPs improve TOP/proxies: E.g., Improve role allocation TOP constrain POMDP policy search: Orders of magnitude speedup

Role allocation algorithmsCommunication algorithms Overall Research Framework Teamwork proxy infrastructure Offline, optimalOn-line approximate Agent-agent Adopt DCOP: Asynch complete distributed constraint optimize (Modi et al, 03, Maheswar et al 04) Equilibrium, threshold (Okamoto,03, Maheswar et al, 04) Agent-human (Adjustable autonomy) Optimal transfer-of- control strategies (via MDPs/POMDPs) (Scerri et al,’02) ? Explicit Implicit (Plan recognition) Agent- agent BDI teamwork theories + decision theoretic filter (Pynadath/Tambe, 03) Socially attentive Monitoring (Kaminka et al 00) Agent- human ? Monitoring by overhearing (Kaminka et al 02) Distributed POMDP Analysis: Multiagent Team Decision Problem (MTDP) (Nair et al 03b, Nair et al 04, Paruchuri et al 04)

Electric Elves: 24/7 from 6/00 to 12/00 (Chalupsky et al, IAAI’2001) “ More & More computers are ordering food,…we need to think about marketing [to these computers]” local Subway owner Papers Meet Maker Teamcore proxy Scheduler agent Teamcore proxy Interest Matcher Teamcore proxy Reschedule meetings Decide presenters Order our meals Teamcore proxy Teamcore proxy

Modules within the Proxies: AA (Scerri, Pynadath and Tambe, JAIR’2002) Reschedule meetings Teamcore proxy Team-oriented Program Communication Role allocation Adjustable autonomy Proxy algorithms Communication Role allocation Adj. Autonomy: MDPs for transfer-of-control policies  : Meeting Role: user arrives on time MDP Policies: Planned sequence of transfers of control, coordination changes  E.g., ADAH: Ask, delay, ask, cancel

Back to Hybrid BDI-POMDP Frameworks

Motivation: Communication in Proxies Proxy’s heuristic “BDI” communication rules example: RULE1(“joint intentions” {Levesque et al 90}): If (fact F  agent’s private state) AND F matches goal of team’s plan AND (F  team state) Then possible communicative goal CG to communicate F RULE2: If possible communicative goal CG AND (   miscoordination-cost > Communication-cost) Then Communicate CG

Motivation: Earlier BDI Evaluation CommNoComm ISIS97-CMUnited ISIS97-Andhill ISIS98-CMUnited ISIS98-Andhill Helicopter domain Testing Communication Selectivity (Pynadath & Tambe, JAAMAS’03) Testing teamwork in RoboCup (Tambe et al, IJCAI’99) Quantiative analysis of optimality or complexity of optimal response difficult Challenge in domains with significant uncertainty and costs

Distributed POMDPs COM-MTDP (Pynadath and Tambe, 02) RMTDP (Nair, Tambe, Marsella 03) S: states of the world (e.g., helicopter position, enemy position) Ai: Actions (Communicate action, domain action ) P: State transition probabilities R: Reward; sub-divided based on action types STATE ii

COM-MTDP: Analysis of Comunication   : observations (e.g., E enemy-on-radar, NE enemy-not-on-radar) O: probability of observation given destination state & past action   Belief state (each Bi history of observations, messages)  Individual policies   : B i  i (Domain action)   : B i  i (Communication)  Goal: Find joint policies   and   maximize total expected reward STATE E,EE,NENE,NENE,E Table per state, previous action Landmark1, Landmark2, E,NE…

Complexity Results in COM-MTDP Individual observability Collective observability Collective Partial obser No observability. No communication P-complete NEXP complete NEXP complete NP complete General communication P-complete NEXP complete NEXP Complete NP complete Full communication P-complete PSPACE complete NP complete Complexity: I.Locally optimal solution (No global team optimality) II.Hybrid approach: POMDP + BDI

Approach I: Locally Optimal Policy (Nair et al 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy for all except agent K Find optimal response policy for agent K Find optimal response policy for agent K, given fixed policies for others: Problem becomes finding an optimal policy for a single agent POMDP “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states Significant speedup over exhaustive search, but problem size limited

II: Hybrid BDI + POMDP Domain Team-oriented Program Communication Role allocation Adjustable autonomy Proxy algorithms Distributed POMDP Model (Exploit TOP)          Vary Commun policies  A  A : Fixed action policy COM-MTDP: Evaluate alternate communication policies Feedback for modifying proxy communication algorithms Derive locally, globally optimal communication Policy  Optimal 

Compare Communication Policies over Different Domains Given domain, for different observability conditions & comm costs:  Evaluate Teamcore (rule1+rule2); Jennings, others, compare with optimal  Optimal: TEAMCORE : O(|S||  |) T

Distributed POMDPs to Analyze Role Allocations: RMTDP

Role Allocation: Illustration Task: Move cargo from X to Y, large reward for cargo at destination Three routes with varying length and failure rates Scouts make a route safe for transports Uncertainty: In actions and observations Scouts may fail along a route (and transports may replace scouts) Scouts failure rate decreases if more scouts to a route Scouts’ failure may not be observable to transports

Team-Oriented Program Organization hierarchy Plan hierarchy Best initial role allocation: How many helos in SctTeam A, B, C & Transport TOP: Almost entire RMTDP policy is completely fixed Policy gap only on step 1: Best role allocation in initial state for each agent Assume six helicopter agents: 84 combinations (84 RMTDP policies)

Analyzing Role Allocation in Teamwork Domain Team-oriented Program Role allocation Communication Adjustable autonomy Proxy algorithms Distributed POMDP Model R-MTDP: Evaluate alternate role-taking policies Feedback for specific role allocation in TOP Search policy space for optimal role-taking policy  Opt Role-taking Role execution Policy S1 S2 S3 S4 S5 ? …. Fill in gaps In policies

RMTDP Policy Search: Efficiency Improvements Belief-based policy evaluation Not entire observation histories, only beliefs required by TOP Form hierarchical policy groups for branch-&-bound search Obtain upper bound on values of policies within a policy-group If individual policies higher valued than a group, prune the group Exploit TOP for generating policy groups, and for upper bounds E.g., history: T=1: ; T=2: history: T=1: ; T=2: E.g., T=2:

MaxExp: Hierarchical Policy Groups ……

MaxExp: Upperbound Policy Group Value Obtain max for each component over all start states & observation histories If each component independent: Can evaluate each separately Dependence: Start of next component based on end state of previous Why speedup: No duplicate start states: multiple paths of previous component merge No duplicate observation histories DoScouting [Scout 2; Transport 4] DoTransport [Transport from previous] RemainScouts [Scout from previous] [84][3300][36] Team-A =2 Team-B =0 Team-C =0 Transport =4 Team-A =1 Team-B =1 Team-C =0 Transport =4 … SafeRoute=1 Transport=3 SafeRoute=2 Transport=4 …

Helicopter Domain: Computational Savings NOPRUNE-OBS: No pruning, maintain full observation history NOPRUNE: No pruning, maintain beliefs not observation histories MAXEXP: Pruning using MAXEXP heuristic, using beliefs NOFAIL: MAXEXP enhanced with “no failure” for quicker upper bound

Does RMTDP Improve Role Allocation?

RoboCup Rescue: Computational Savings

RoboCupRescue: RMTDP Improves Role Allocation

SUMMARY Team proxy COM-MTDP & R-MTDP: Distributed POMDPs for analysis Combine “traditional” TOP approaches with distributed POMDPs Exploit POMDPs to improve TOP/teamcore proxies Exploit TOP to constrain POMDP policy search Key policy evaluation complexity results TOP: Team plans organizations, agents

Future Work Agent-based Simulation technology Visualization Trainee

Thank You Contact: Milind Tambe

Key Papers cited in this Presentation Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Proceedings of the 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004).Distributed Algorithms for DCOP: A Graphical Game-Based Approach. Praveen Paruchuri, Milind Tambe, Fernando Ordonez, Sarit Kraus, Towards a formalization of teamwork with resource constraints, International Joint Conference on Autonomous Agents and Multiagent Systems, 2004.Towards a formalization of teamwork with resource constraints, Ranjit Nair, Maayan Roth, Makoto Yokoo and Milind Tambe: "Communication for Improving Policy Computation in Distributed POMDPs". In Proceedings of The Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04), (Post-script/PDF)."Communication for Improving Policy Computation in Distributed POMDPs"Post-scriptPDF Rajiv T. Maheswaran, Milind Tambe, Emma Bowring, Jonathan P. Pearce, Pradeep Varakantham "Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling". In Proceedings of the third International Joint Conference on Agents and Multi Agent Systems, AAMAS-2004."Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling" Modi, P.J., Shen, W., Tambe, M., Yokoo, M. “Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” Artificial Intelligence Journal (accepted)“Solving Distributed Constraint Optimization Problems Optimally, Efficiently and Asynchronously” D.V.Pynadath and M.Tambe. Automated teamwork among heterogeneous software agents and humans. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS). 7: , 2003.** [pdf] **Automated teamwork among heterogeneous software agents and humans** [pdf] ** Nair, R., Tambe, M., Yokoo, M., Pynadath, D. and Marsella, S. Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Proceedings of the International Joint conference on Artificial Intelligence (IJCAI), 2003Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings Nair, R., Tambe, M., and Marsella, S. Role allocation and reallocation in multiagent teams: Towards a practical analysis Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003Role allocation and reallocation in multiagent teams: Towards a practical analysis Scerri, P., Johnson, L., Pynadath, D., Rosenbloom, P. Si, M., Schurr, N. and Tambe, M. A prototype infrastructure for distributed robot, agent, person teams Proceedings of the second International Joint conference on agents and multiagent systems (AAMAS), 2003A prototype infrastructure for distributed robot, agent, person teams Scerri, P. Pynadath, D. and Tambe, M. Towards adjustable autonomy for the real-world Journal of AI Research (JAIR), 2002, Volume 17, Pages ** [pdf] **Towards adjustable autonomy for the real-world** [pdf] ** Pynadath, D. and Tambe, M. The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of AI Research (JAIR), 2002The communicative multiagent team decision problem: Analyzing teamwork theories and models Kaminka, G., Pynadath, D. and Tambe, M. Monitoring teams by overhearing: A multiagent plan-recognition approach Journal of AI Research (JAIR), 2002 ** [pdf] **Monitoring teams by overhearing: A multiagent plan-recognition approach** [pdf] **

All the Co-authors