Multi-Agent Exploration

Slides:

Advertisements

Similar presentations

* Distributed Algorithms in Multi-channel Wireless Ad Hoc Networks under the SINR Model Dongxiao Yu Department of Computer Science The University of Hong.

Advertisements

Data and Computer Communications

Hadi Goudarzi and Massoud Pedram

Adopt Algorithm for Distributed Constraint Optimization

1 Intrusion Monitoring of Malicious Routing Behavior Poornima Balasubramanyam Karl Levitt Computer Security Laboratory Department of Computer Science UCDavis.

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

Coverage by Directional Sensors Jing Ai and Alhussein A. Abouzeid Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Agent-based sensor-mission assignment for tasks sharing assets Thao Le Timothy J Norman WambertoVasconcelos

CLUSTERING IN WIRELESS SENSOR NETWORKS B Y K ALYAN S ASIDHAR.

Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.

COORDINATION and NETWORKING of GROUPS OF MOBILE AUTONOMOUS AGENTS.

Bilal Gonen University of Alaska Anchorage Murat Yuksel University of Nevada, Reno.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Zachary Wilson Computer Science Department University of Nebraska, Omaha Advisor: Dr. Raj Dasgupta.

DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe.

Agent-Based Coordination of Sensor Networks Alex Rogers School of Electronics and Computer Science University of Southampton

Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.

Multirobot Coordination in USAR Katia Sycara The Robotics Institute

Probability Grid: A Location Estimation Scheme for Wireless Sensor Networks Presented by cychen Date ： 3/7 In Secon (Sensor and Ad Hoc Communications and.

Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.

Adaptive Traffic Light Control with Wireless Sensor Networks Presented by Khaled Mohammed Ali Hassan.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Shannon Lab 1AT&T – Research Traffic Engineering with Estimated Traffic Matrices Matthew Roughan Mikkel Thorup

An Introduction Table Of Context Sensor Network PreviewRouting in Sensor NetworksMobility in Sensor Networks Structure and characteristics of nodes and.

doc.: IEEE /211r0 Submission March 2002 M. BenvenisteSlide 1 SELF-CONFIGURABLE WIRELESS LAN SYSTEMS Mathilde Benveniste, Ph.D.

A Cooperative Diversity- Based Robust MAC Protocol in wireless Ad Hoc Networks Sangman Moh, Chansu Yu Chosun University, Cleveland State University Korea,

Stochastic sleep scheduling (SSS) for large scale wireless sensor networks Yaxiong Zhao Jie Wu Computer and Information Sciences Temple University.

Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.

Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.

Patch Based Mobile Sink Movement By Salman Saeed Khan Omar Oreifej.

Distributed Tracking Using Kalman Filtering Aaron Dyreson, Faculty Advisor: Ioannis Schizas, Ph.D. Department of Electrical Engineering, The University.

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

Boundary Assertion in Behavior-Based Robotics Stephen Cohorn - Dept. of Math, Physics & Engineering, Tarleton State University Mentor: Dr. Mircea Agapie.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

1 Deterministic Collision-Free Communication Despite Continuous Motion ALGOSENSORS 2009 Saira Viqar Jennifer L. Welch Parasol Lab, Department of CS&E TEXAS.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

Probabilistic Coverage in Wireless Sensor Networks Authors : Nadeem Ahmed, Salil S. Kanhere, Sanjay Jha Presenter : Hyeon, Seung-Il.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

11/25/2015 Wireless Sensor Networks COE 499 Localization Tarek Sheltami KFUPM CCSE COE 1.

Performance of Adaptive Beam Nulling in Multihop Ad Hoc Networks Under Jamming Suman Bhunia, Vahid Behzadan, Paulo Alexandre Regis, Shamik Sengupta.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

Software Multiagent Systems: CS543 Milind Tambe University of Southern California

A Protocol for Tracking Mobile Targets using Sensor Networks H. Yang and B. Sikdar Department of Electrical, Computer and Systems Engineering Rensselaer.

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Algorithmic and Domain Centralization in Distributed Constraint Optimization Problems John P. Davin Carnegie Mellon University June 27, 2005 Committee:

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

William Stallings Data and Computer Communications

Figure 5: Change in Blackjack Posterior Distributions over Time.

Delay-Tolerant Networks (DTNs)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

CS b659: Intelligent Robotics

Vineet Mittal Should more be added here Committee Members:

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Net 435: Wireless sensor network (WSN)

Presented by Prashant Duhoon

The story of distributed constraint optimization in LA: Relaxed

Towards Next Generation Panel at SAINT 2002

CASE − Cognitive Agents for Social Environments

Hemant Kr Rath1, Anirudha Sahoo2, Abhay Karandikar1

Market-based Dynamic Task Allocation in Mobile Surveillance Systems

Distributed Algorithms for DCOP: A Graphical-Game-Based Approach

Parallel Programming in C with MPI and OpenMP

AI Applications in Network Congestion Control

Adaptive Traffic Control

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Distributed Systems and Algorithms

Presentation transcript:

Multi-Agent Exploration Matthew E. Taylor http://teamcore.usc.edu/taylorm/

DCOPs: Distributed Constraint Optimization Problems Multiple domains Multi-agent plan coordination Sensor networks Meeting scheduling Traffic light coordination RoboCup soccer Distributed Robust to failure Scalable (In)Complete Quality bounds

DCOP Framework a1 a2 a3 Different “levels” of coordination possible a1 Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized Different “levels” of coordination possible

Motivation: DCOP Extension Unrealistic: often environment is not fully known! Agents need to learn Maximize total reward Real-world applications Mobile ad-hoc networks Sensor networks The application of mobile wireless sensor networks in the real world has been on the rise. Examples include the use of autonomous under water vehicles to collect oceanic data, or the deployment of robots in urban environments. For example, such autonomous robots may be used to establish communication in a disaster scenario. However, mobile sensor networks pose different challenges: rewards are unknown, e.g. a robot doesn't know whether moving from one grid to another would be helpful; the sensor network has a limited time which disallows extensive exploration; anytime performance is important e.g. if the experiment goes for 2 hours, the cumulative performance over the 2 hours needs to be good.

Problem Statement DCEE: Distributed Coordination of Exploration & Exploitation Address Challenges: Local communication Network of (known) interactions Cooperative Unknown rewards Maximize on-line reward Limited time-horizon (Effectively) infinite reward matrix 5

Mobile Ad-Hoc Network Rewards: signal strength between agents [1,200] Goal: Maximize signal strength over time Assumes: Small Scale fading dominates Topology is fixed a1 75 a2 a1 a2 a3 a4 100 50 75 95 100 a3 50 a4

MGM Review Ideas?

Static Estimation: SE-Optimistic Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 100 50 75

Static Estimation: SE-Optimistic Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 250 If I move, I’d gain 100 If I move, I’d gain 125 a1 a2 a3 a3 a4 100 50 75

Results: Simulation Maximize total reward: area under curve SE-Optimistic No Movement

Balanced Exploration Techniques BE-Backtrack Decision theoretic calculation of exploration Track previous best location Rb Bid to explore for some number of steps (te) TODO: explain 3 parts Balanced Exploration with Backtracking Assume knowledge of the distribution. BE techniques use the current reward, time left and distribution information to estimate the utility of exploration. BE techniques are more complicated. They require more computation and are harder to implement. Agents can backtrack to a previously visited location. Compares between two actions: Backtrack or Explore. E.U.(explore) is sum of three terms: utility of exploring utility of finding a better reward than current Rb utility of failing to find a better reward than current Rb After agents explore and then backtrack, they could not have reduced the overall reward. In SE methods, the agents evaluate in each time step and then proceed. Here we allow an agent to commit to take an action for more than 1 round. Reward while exploiting × P(improve reward) Reward while exploiting × P(NOT improve reward) Reward while exploring

Results: Simulation Maximize total reward: area under curve BE-Backtrack SE-Optimistic No Movement

Omniscient Algorithm (Artificially) convert DCEE to DCOP Run MGM algorithm [Pearce & Tambe, 2007] Quickly find local optimum Establish upper bound Only works in simulation 13

Results: Simulation Maximize total reward: area under curve Omniscient BE-Backtrack SE-Optimistic No Movement

Balanced Exploration Techniques BE-Rebid Allows agents to backtrack Re-evaluate every time-step [Montemerlo04] Allows for on-the-fly reasoning Balanced Exploration with Backtracking and Rebidding 15

Balanced Exploration Techniques BE-Stay Agents unable to backtrack True for some types of robots Dynamic Programming Approach we again assume that no neighbors move which calculating these values 16

(10 agents, random graphs with 15-20 links) Results (simulation) (10 agents, random graphs with 15-20 links)

(chain topology, 100 rounds) Results (simulation) (chain topology, 100 rounds)

Results (simulation) (20 agents, 100 rounds)

Also Tested on Physical Robots Used iRobot Creates (Unfortunately, they don’t vacuum) Cengen hardware etc.

Sample Robot Results 21

k-Optimality Increased coordination Find pairs of agents to change variables (location) Higher communication overhead SE-Optimistic SE-Optimistic-2 SE-Optimistic-3 SE-Mean SE-Mean-2 BE-Rebid BE-Rebid-2 BE-Stay BE-Stay-2 22

Confirm Previous DCOP Results If (artificially) provided rewards, k=2 outperforms k=1

Sample coordination results Full Graph Chain Graph 24

Surprising Result: Increased Coordination can Hurt

Surprising Result: Increased Coordination can Hurt

Regular Graphs