Distributed Algorithms for DCOP: A Graphical-Game-Based Approach

Slides:



Advertisements
Similar presentations
Detecting Statistical Interactions with Additive Groves of Trees
Advertisements

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Distributed Constraint Optimization Problems M OHSEN A FSHARCHI.
Adopt Algorithm for Distributed Constraint Optimization
On-line learning and Boosting
Towards a Theoretic Understanding of DCEE Scott Alfeld, Matthew E
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Chapter 4: Image Enhancement
The Wisdom of the Few A Collaborative Filtering Approach Based on Expert Opinions from the Web Xavier Amatriain Telefonica Research Nuria Oliver Telefonica.
DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe.
Three heuristics for transmission scheduling in sensor networks with multiple mobile sinks Damla Turgut and Lotzi Bölöni University of Central Florida.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Distributed Constraint Optimization * some slides courtesy of P. Modi
Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Stochastic sleep scheduling (SSS) for large scale wireless sensor networks Yaxiong Zhao Jie Wu Computer and Information Sciences Temple University.
Optimization-Based Procurement for Transportation Services Calice and Sheffi, 2003.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
How much do you smoke?. I Notice... That the median for the males is 13.5 cigarettes per day and the median for females is 10 cigarettes per day. This.
CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.
MDPs (cont) & Reinforcement Learning
Minesweeper Solver Marina Maznikova Artificial Intelligence II Fall 2011.
Software Multiagent Systems: CS543 Milind Tambe University of Southern California
INCLUDING UNCERTAINTY MODELS FOR SURROGATE BASED GLOBAL DESIGN OPTIMIZATION The EGO algorithm STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP Thanks.
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Distributed cooperation and coordination using the Max-Sum algorithm
Using MDP Characteristics to Guide Exploration in Reinforcement Learning Paper: Bohdana Ratich & Doina Precucp Presenter: Michael Simon Some pictures/formulas.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Multi-Agents System CMSC 691B Gunjan Kalra Peter DSouza.
SYNERGY: A Game-Theoretical Approach for Cooperative Key Generation in Wireless Networks Jingchao Sun, Xu Chen, Jinxue Zhang, Yanchao Zhang, and Junshan.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Bayesian Semi-Parametric Multiple Shrinkage
Figure 5: Change in Blackjack Posterior Distributions over Time.
Heuristic Optimization Methods
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Chapter 1: Exploring Data
Warm Up.
Reinforcement Learning (1)
Reinforcement learning (Chapter 21)
Reinforcement Learning
Addressing Conservation Issues Using IMBCR Data and Results
Multi-Agent Exploration
Games with Chance Other Search Algorithms
Games with Chance Other Search Algorithms
The story of distributed constraint optimization in LA: Relaxed
Multi-Core Parallel Routing
Markov Decision Processes
Pareto Optimization to enable set-based Designs of ship systems
Markov Decision Processes
Reinforcement Learning
Course Logistics CS533: Intelligent Agents and Decision Making
More about Posterior Distributions
Is Dynamic Multi-Rate Worth the Effort?
CASE − Cognitive Agents for Social Environments
MURI Kickoff Meeting Randolph L. Moses November, 2008
Chapter 2: Evaluative Feedback
CHAPTER 10 Comparing Two Populations or Groups
Inequalities and Their Graphs
Designing Neural Network Architectures Using Reinforcement Learning
Chapter 1: Exploring Data
Games with Chance Other Search Algorithms
Emulator of Cosmological Simulation for Initial Parameters Study
Chapter 2: Evaluative Feedback
Presentation transcript:

Distributed Algorithms for DCOP: A Graphical-Game-Based Approach For MGM, each agent broadcasts a gain message to all its neighbors that represents the maximum change in its local utility if it is allowed to act under the current context. An agent is then allowed to act if its gain message is larger than all the gain messages it receives from all its neighbors (ties can be broken through variable ordering or another method). So what?

Collaborators Manish Jain Prateek Tandon

Sample coordination results Full Graph Chain Graph 4

Regular Graphs

Team Coordination Penalty Increased coordination hurts In some graphs (low density) For some algorithms (SE-Optimistic & BE-Rebid) Intuition If there are few neighbors, agents less selective With many neighbors, won’t move unless high gain

Average # Constraints Changed k=1 and k=2 increase over time

Average Reward Improvement k=2 improves relative to k=1

Low density: k=1 better At Higher density, more constraints changed for k=2: k=2 relatively better

BE-Rebid-2 Low bids have less gain Low density does even worse

SE-Optimistic-2 Low bid -> low gain Lower Density have lower bids (higher chance of mistake)

Sanity Check

In Contrast Fewer constraints changed but higher improvement Little change in # constraints from k=1 and k=2

Summary of Team Uncertainty Penalty Relative performance of k=1 and k=2 changes over higher density Favors k=2 as density increases Few Neighbors Curtails k=2 Performance Low bids in low density graphs: hurt Low density: wider range and can have larger mistakes In Contrast: Mean and Stay Conservative: fewer constraints changed Worse overall

Solutions Discourage Joint Actions Discount all bids SE-Threshold-2 & BE-Threshold-2 Only form pair if bid > (t × #constraints) Unless high bid, “play it safe” with k=1 Discount all bids SE-i-2 Generalize SE-Optimistic and SE-Mean BE-i-2 Explore utility is discounted (bias towards stay / backtrack) Both have extra parameter to tune (t or i)

Improved SE Algorithms

Improved BE Algorithms

(Selected) Open Questions Different amounts of coordination Often increasing coordination helps But it can hurt due to uncertainty in the environment Many open theoretical questions (current work with Scot Alfeld) How close are we to optimal? Can we predict how well an algorithm will perform? Multi-armed bandit is to MDP as DCEE is to MMDP ? TODO: backup slides on this!

Exploration vs. Exploitation Multi-armed Bandit How to choose? -greedy Confidence Intervals

Possible Class Projects New algorithms General k implementation Enhance simulator: not just small-scale fading Incorporate prior knowledge How to set parameter i in SE-i-2 and BE-i-2 Learn to change graph topologies Different objectives Maximize the minimum threshold Get all constraints above a threshold

(Quick) Simulator Demo

1-3: Get neighbors’ info 4-11: Make pair 12-14: No Pair 15-26: Can I/we move? 27-30: Move, if able