1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute.

Slides:

Advertisements

Similar presentations

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Advertisements

Markov Models for Multi-Agent Coordination Maayan Roth Multi-Robot Reading Group April 13, 2005.

COORDINATION and NETWORKING of GROUPS OF MOBILE AUTONOMOUS AGENTS.

Co-ordination and Co-operation Between CGF Agents Dr Jeremy Baxter Parallel and Distributed Simulation Group S&P Sector, Malvern.

Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.

Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.

Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.

AAMAS 2009, Budapest1 Analyzing the Performance of Randomized Information Sharing Prasanna Velagapudi, Katia Sycara and Paul Scerri Robotics Institute,

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Correctness. Until now We’ve seen how to define dataflow analyses How do we know our analyses are correct? We could reason about each individual analysis.

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs Rosemary Emery-Montemerlo joint work with Geoff Gordon, Jeff Schneider.

Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

A Free Market Architecture for Distributed Control of a Multirobot System The Robotics Institute Carnegie Mellon University M. Bernardine Dias Tony Stentz.

Wireless Distributed Sensor Tracking: Computation and Communication Bart Selman, Carla Gomes, Scott Kirkpatrick, Ramon Bejar, Bhaskar Krishnamachari, Johannes.

1 University of Southern California Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind.

Markov Decision Processes

Department of Computer Science Undergraduate Events More

Dynamics of Learning & Distributed Adaptation PI: James P. Crutchfield, Santa Fe Institute Second PI Meeting, April 2001, SFe Dynamics of Learning:

Reinforcement Learning Yishay Mansour Tel-Aviv University.

A Decentralised Coordination Algorithm for Mobile Sensors School of Electronics and Computer Science University of Southampton {rs06r2, fmdf08r, acr,

Controlling and Configuring Large UAV Teams Paul Scerri, Yang Xu, Jumpol Polvichai, Katia Sycara and Mike Lewis Carnegie Mellon University and University.

Collectively Cognitive Agents in Cooperative Teams Jacek Brzeziński, Piotr Dunin-Kęplicz Institute of Computer Science, Polish Academy of Sciences Barbara.

TEAMCORE: Rapid, Robust Teams From Heterogeneous, Distributed Agents Milind Tambe & David V. Pynadath.

ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.

K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.

Feb 24, 2003 Agent-based Proactive Teamwork John Yen University Professor of IST School of Information Sciences and Technology The Pennsylvania State University.

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Department of Computer Science Undergraduate Events More

Using Reinforcement Learning to Model True Team Behavior in Uncertain Multiagent Settings in Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS.

Multi-Agent Systems: Overview and Research Directions CMSC 477/677 March 15, 2005 Prof. Marie desJardins.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets Lynne E. Parker Autonomous Robots, 2002 Yousuf Ahmad Distributed Information.

Algorithmic, Game-theoretic and Logical Foundations

1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.

MURI Telecon, Update 7/26/2012 Summary, Part I:  Completed: proving and validating numerically optimality conditions for Distributed Optimal Control (DOC)

Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.

Behavior-based Multirobot Architectures. Why Behavior Based Control for Multi-Robot Teams? Multi-Robot control naturally grew out of single robot control.

Department of Computer Science Undergraduate Events More

Learning Team Behavior Using Individual Decision Making in Multiagent Settings Using Interactive DIDs Muthukumaran Chandrasekaran THINC Lab, CS Department.

Thrust IIB: Dynamic Task Allocation in Remote Multi-robot HRI Jon How (lead) Nick Roy MURI 8 Kickoff Meeting 2007.

The DEFACTO System: Training Incident Commanders Nathan Schurr Janusz Marecki, Milind Tambe, Nikhil Kasinadhuni, and J. P. Lewis University of Southern.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Distributed cooperation and coordination using the Max-Sum algorithm

On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

Information Sharing in Large Heterogeneous Teams Prasanna Velagapudi Robotics Institute Carnegie Mellon University FRC Seminar - August 13, 2009.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.

Keep the Adversary Guessing: Agent Security by Policy Randomization

A Crash Course in Reinforcement Learning

Thrust IC: Action Selection in Joint-Human-Robot Teams

The story of distributed constraint optimization in LA: Relaxed

Markov Decision Processes

Markov Decision Processes

Reinforcement Learning with Partially Known World Dynamics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Presentation transcript:

1 Multiagent Teamwork: Analyzing the Optimality and Complexity of Key Theories and Models David V. Pynadath and Milind Tambe Information Sciences Institute and Department of Computer Science University of Southern California

2 Agent Teamwork Agents, robots, sensors, spacecraft, etc.  Performing a common task  Operating in an uncertain environment  Distinct, uncertain observations  Distinct actions with uncertain effects  Limited, costly communication Battlefield SimulationSatellite ClustersDisaster Rescue

3 Motivation Performance Complexity Optimal New algorithm Theoretical Approaches No communication Practical Systems ? ? ? ? Optimal Outline of Results 1)Unified teamwork framework 2)Complexity of optimal teamwork 3)New coordination algorithm 4)Optimality-Complexity evaluation of existing methods

4 Enemy Radar Example Domain: Helicopter Team Goal Did they see that? I destroyed the enemy radar.

5 Communicative Multiagent Team Decision Problem (COM-MTDP) S: states of the world  e.g., position of helicopters, position of the enemy A: domain-level actions  e.g., fly below radar, fly normal altitude P: transition probability function  e.g., world dynamics, effects of actions  : communication capabilities, possible “speech acts”  e.g., “I have destroyed enemy radar.”

6 COM-MTDPs (cont’d)  : observations  e.g., enemy radar, position of other helicopter O: probability (for each agent) of observation  Maps state and actions into distribution over observations (e.g., sensor noise model) R: reward (over states, actions, messages)  e.g., good if we reach destination, better if we reach it earlier  e.g., saying, “I have destroyed enemy,” has a cost Teamwork Definition:  All members share same preferences (i.e., R)

7 Problem Complexity COM-MTDPs Free communication Collectively Observable Individually Observable No communication

8 To Communicate or Not To Communicate Local decision of one agent at a single point in time:  “I have achieved a joint goal.”  “Should I tell my teammate?” Joint intentions theory: “I must attain mutual belief.”  Always communicate [Jennings] STEAM:  “I must communicate if the expected cost of miscoordination outweighs the cost of communication.” [Tambe]  Each cost is a fixed parameter specified by designer

9 Communicate if and only if:  E[R | communicate]  E[R | do not communicate] Locally Optimal Criterion for Communication Expectation over possible histories of states and beliefs up to current time Expected reward over future trajectories of states and beliefs WITH communication Expected reward over future trajectories of states and beliefs WITHOUT communication Expected cost of communicating

10 Empirical Results Communication Cost Observability V_opt-V

11 Empirical Results

12 Empirical Results

13 Silent Optimality vs. Complexity Optimality Complexity Globally Optimal Locally Optimal STEAM Jennings seconds (log) ,000 E[R] Observability = 0.2 Comm. Cost = 0.7

14 Jennings Optimality vs. Complexity Optimality Complexity Globally Optimal Locally Optimal STEAM Silent seconds (log) ,000 E[R] Observability = 0.2 Comm. Cost = 0.3

15 Summary COM-MTDPs provide a unified framework for agent teamwork  Representation subsumes many existing agent models  Policy space subsumes many existing prescriptive theories This framework supports deeper analyses of teamwork problems  Quantitative characterization of optimality-efficiency tradeoff, for different policies, in different domains  Derivation of novel coordination algorithms  Detailed proofs  Source code  JAIR article