Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E

Slides:

Advertisements

Similar presentations

1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.

Advertisements

Hadi Goudarzi and Massoud Pedram

Distributed Constraint Optimization Problems M OHSEN A FSHARCHI.

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan

Adopt Algorithm for Distributed Constraint Optimization

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.

Coverage by Directional Sensors Jing Ai and Alhussein A. Abouzeid Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute.

Agent-based sensor-mission assignment for tasks sharing assets Thao Le Timothy J Norman WambertoVasconcelos

MULTI-ROBOT SYSTEMS Maria Gini (work with Elizabeth Jensen, Julio Godoy, Ernesto Nunes, abd James Parker,) Department of Computer Science and Engineering.

Optimal Rectangle Packing: A Meta-CSP Approach Chris Reeson Advanced Constraint Processing Fall 2009 By Michael D. Moffitt and Martha E. Pollack, AAAI.

1 Location-Aided Routing (LAR) in Mobile Ad Hoc Networks Young-Bae Ko and Nitin H. Vaidya Yu-Ta Chen 2006 Advanced Wireless Network.

XI- ADVANTAGES SCHEDULING PROCESS

Università degli Studi dell’Aquila Academic Year 2009/2010 Course: Algorithms for Distributed Systems Instructor: Prof. Guido Proietti Time: Monday:

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.

DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe.

In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about.

Wireless Capacity. A lot of hype Self-organizing sensor networks reporting on everything everywhere Bluetooth personal networks connecting devices City.

Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

A Principled Information Valuation for Communications During Multi-Agent Coordination Simon A. Williamson, Enrico H. Gerding, Nicholas R. Jennings School.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.

Wireless Distributed Sensor Tracking: Computation and Communication Bart Selman, Carla Gomes, Scott Kirkpatrick, Ramon Bejar, Bhaskar Krishnamachari, Johannes.

LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.

Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.

Search Algorithms for Agents

Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.

6/28/2015CSC82601 Radio-resource sharing for adhoc Networking with UWB. by Francesca Cuomo, Cristina Martello, Andrea Baiocchi, and Fabrizio Capriotti.

Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.

Distributed Scheduling. What is Distributed Scheduling? Scheduling: –A resource allocation problem –Often very complex set of constraints –Tied directly.

Distributed Constraint Optimization * some slides courtesy of P. Modi

CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.

Roadmap-Based End-to-End Traffic Engineering for Multi-hop Wireless Networks Mustafa O. Kilavuz Ahmet Soran Murat Yuksel University of Nevada Reno.

Metaheuristics The idea: search the solution space directly. No math models, only a set of algorithmic steps, iterative method. Find a feasible solution.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.

Energy Efficient Routing and Self-Configuring Networks Stephen B. Wicker Bart Selman Terrence L. Fine Carla Gomes Bhaskar KrishnamachariDepartment of CS.

1 Optimal Power Allocation and AP Deployment in Green Wireless Cooperative Communications Xiaoxia Zhang Department of Electrical.

Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.

Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.

Expanding the CASE Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd.

Coordinated exploration of labyrinthine environments with application to the “pursuit- evasion” problem Leibniz Laboratory Magma team Damien Pellier –

1 Mobile-Assisted Localization in Wireless Sensor Networks Nissanka B.Priyantha, Hari Balakrishnan, Eric D. Demaine, Seth Teller IEEE INFOCOM 2005 March.

CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

11/25/2015 Wireless Sensor Networks COE 499 Localization Tarek Sheltami KFUPM CCSE COE 1.

1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.

Performance of Adaptive Beam Nulling in Multihop Ad Hoc Networks Under Jamming Suman Bhunia, Vahid Behzadan, Paulo Alexandre Regis, Shamik Sengupta.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

Software Multiagent Systems: CS543 Milind Tambe University of Southern California

A Protocol for Tracking Mobile Targets using Sensor Networks H. Yang and B. Sikdar Department of Electrical, Computer and Systems Engineering Rensselaer.

SERENA: SchEduling RoutEr Nodes Activity in wireless ad hoc and sensor networks Pascale Minet and Saoucene Mahfoudh INRIA, Rocquencourt Le Chesnay.

Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.

Using MDP Characteristics to Guide Exploration in Reinforcement Learning Paper: Bohdana Ratich & Doina Precucp Presenter: Michael Simon Some pictures/formulas.

Dynamic Load Balancing Tree and Structured Computations.

Wireless Sensor Networks

Wireless Sensor Networks

Networked Distributed POMDPs: DCOP-Inspired Distributed POMDPs

Multi-Agent Exploration

The story of distributed constraint optimization in LA: Relaxed

Artificial Intelligence

Is Dynamic Multi-Rate Worth the Effort?

Artificial Intelligence

Distributed Algorithms for DCOP: A Graphical-Game-Based Approach

Information Sciences and Systems Lab

Presentation transcript:

Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe Lafayette College http://teamcore.usc.edu

Forward Pointer When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, & Milind Tambe Wednesday, 8:30 – 10:30 Coordination and Cooperation 1

Teamwork: Foundational MAS Concept Joint actions improve outcome But increases communication & computation Over two decades of work This paper: increased teamwork can harm team Even without considering communication & computation Only considering team reward Multiple algorithms, multiple settings But why?

DCOPs: Distributed Constraint Optimization Problems Multiple domains Meeting scheduling Traffic light coordination RoboCup soccer Multi-agent plan coordination Sensor networks Distributed Robust to failure Scalable (In)Complete Quality bounds

DCOP Framework a1 a2 Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3

DCOP Framework a1 a2 a3 a1 a2 Reward 10 6 a2 a3 Reward 10 6 6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized

DCOP Framework a1 a2 a3 Different “levels” of teamwork possible Reward 10 6 a2 a3 Reward 10 6 a1 a2 a3 TODO: not graph coloring K-opt: more detail (?): 1-opt up to centralized Different “levels” of teamwork possible Complete Solution is NP-Hard

D-Cee: Distributed Coordination of Exploration and Exploitation Environment may be unknown Maximize on-line reward over some number of rounds Exploration vs. Exploitation Demonstrated mobile ad-hoc network Simulation [Released] & Robots [Released Soon]

DCOP Distrubted Constraint Optimization Problem

DCOP → DCEE Distributed Coordination of Exploration and Exploitation

DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d get R=200 a1 a2 a3 a4 99 50 75

DCEE Algorithm: SE-Optimistic (Will build upon later) Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a3 a4 99 50 75 Explore or Exploit?

Balanced Exploration Techniques BE-Rebid Decision theoretic calculation of exploration Track previous best location Rb: can backtrack Reason about exploring for some number of steps (te) TODO: explain 3 parts Balanced Exploration with Backtracking Assume knowledge of the distribution. BE techniques use the current reward, time left and distribution information to estimate the utility of exploration. BE techniques are more complicated. They require more computation and are harder to implement. Agents can backtrack to a previously visited location. Compares between two actions: Backtrack or Explore. E.U.(explore) is sum of three terms: utility of exploring utility of finding a better reward than current Rb utility of failing to find a better reward than current Rb After agents explore and then backtrack, they could not have reduced the overall reward. In SE methods, the agents evaluate in each time step and then proceed. Here we allow an agent to commit to take an action for more than 1 round. Reward while exploiting × P(improve reward) Reward while exploiting × P(NOT improve reward) Reward while exploring

Success! [ATSN-09][IJCAI-09] Both classes of (incomplete) algorithms Simulation and on Robots Ad hoc Wireless Network (Improvement if performance > 0) Third International Workshop on Agent Technology for Sensor Networks (at AAMAS-09)

k-Optimality Increased coordination – originally DCOP formulation In DCOP, increased k = increased team reward Find groups of agents to change variables Joint actions Neighbors of moving group cannot move Defines amount of teamwork (Higher communication & computation overheads)

“k-Optimality” in DCEE Groups of size k form, those with the most to gain move (change the value of their variable) A group can only move if no other agents in its neighborhood move

Example: SE-Optimistic-2 Rewards on [1,200] If I move, I’d gain 275 If I move, I’d gain 251 If I move, I’d gain 101 If I move, I’d gain 125 a1 a2 a3 a4 99 50 75 200-99  275 + 250 - 150  251 + 275 - 150  101 + 251 - 101  125 + 275 - 125 a1 a4 99 a2 a2 50 a3 a3 75

Sample coordination results Omniscient: confirms DCOP result, as expected ! ! ? Artificially Supplied Rewards (DCOP) Complete Graph Chain Graph

Physical Implementation Create Robots Mobile ad-hoc Wireless Network

Confirms Team Uncertainty Penalty Averaged over 10 trials each Trend confirmed! (Huge standard error) Total Gain Chain Complete ! ! ?

Problem with “k-Optimal” Unknown rewards cannot know if can increase reward by moving! Define new term: L-Movement # of agents that can change variables per round Independent of exploration algorithm Graph dependant Alternate measure of teamwork

General DCOP Analysis Tool? L-Movement Example: k = 1 algorithms L is the size of the largest maximal independent set of the graph NP-hard to calculate for a general graph harder for higher k Consider ring & complete graphs, both with 5 vertices ring graph: maximal independent set is 2 complete graph: maximal independent set is 1 For k =1 L=1 for a complete graph size of the maximal independent set of a ring graph is: General DCOP Analysis Tool?

Configuration Hypercube No (partial-)assignment is believed to be better than another wlog, agents can select next value when exploring Define configuration hypercube: C Each agent is a dimension is total reward when agent takes value cannot be calculated without exploration values drawn from known reward distribution Moving along an axis in hypercube → agent changing value Example: 3 agents (C is 3 dimensional) Changing from C[a, b, c] to C[a, b, c’] Agent A3 changes from c to c’

How many agents can move? (1/2) In a ring graph with 5 nodes k = 1 : L = 2 k = 2 : L = 3 In a complete graph with 5 nodes k = 1 : L = 1 k = 2 : L = 2

How many agents can move? (2/2) Configuration is reachable by an algorithm with movement L in s steps if an only if and How many agents can move? (2/2) C[2,2] reachable for L=1 if s ≥ 4

L-Movement Experiments For various DCEE problems, distributions, and L: For steps s = 1...30: Construct hypercube with s values per dimension Find M, the max achievable reward in s steps, given L Return average of 50 runs Example: 2D Hypercube Only half reachable if L=1 All locations reachable if L=2

Restricting to L-Movement: Complete Complete Graph k = 1 : L = 1 k = 2 : L = 2 L=1→2 Average Maximum Reward Discovered

Restricting to L-Movement: Ring Ring graph k = 1 : L = 2 k = 2 : L = 3 Average Maximum Reward Discovered

Uniform distribution of rewards Ring Complete Uniform distribution of rewards 4 agents Different normal distribution

k and L: 5-agent graphs K value Ring Graph, L value Complete Graph, L value 1 2 3 4 5 Increasing k changes L less in ring than complete Configuration Hypercube is upper bound Posit a consistent negative effect Suggests why increasing k has different effects: Larger improvement in complete than ring for increasing k

L-movement May Help Explain Team Uncertainty Penalty L = 2 will be able to explore more of C than algorithm with L = 1 Independent of exploration algorithm! Determined by k and graph structure C is upper bound – posit constant negative effect Any algorithm experiences diminishing returns as k increases Consistent with DCOP results L-movement difference between k = 1 algorithms and k = 2 Larger difference in graphs with more agents For k = 1, L = 1 for a complete graph For k = 1, L increases with the number of vertices in a ring graph

Thank you Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe http://teamcore.usc.edu