Distributed Algorithms for DCOP: A Graphical-Game-Based Approach For MGM, each agent broadcasts a gain message to all its neighbors that represents the maximum change in its local utility if it is allowed to act under the current context. An agent is then allowed to act if its gain message is larger than all the gain messages it receives from all its neighbors (ties can be broken through variable ordering or another method). So what?
Collaborators Manish Jain Prateek Tandon
Sample coordination results Full Graph Chain Graph 4
Regular Graphs
Team Coordination Penalty Increased coordination hurts In some graphs (low density) For some algorithms (SE-Optimistic & BE-Rebid) Intuition If there are few neighbors, agents less selective With many neighbors, won’t move unless high gain
Average # Constraints Changed k=1 and k=2 increase over time
Average Reward Improvement k=2 improves relative to k=1
Low density: k=1 better At Higher density, more constraints changed for k=2: k=2 relatively better
BE-Rebid-2 Low bids have less gain Low density does even worse
SE-Optimistic-2 Low bid -> low gain Lower Density have lower bids (higher chance of mistake)
Sanity Check
In Contrast Fewer constraints changed but higher improvement Little change in # constraints from k=1 and k=2
Summary of Team Uncertainty Penalty Relative performance of k=1 and k=2 changes over higher density Favors k=2 as density increases Few Neighbors Curtails k=2 Performance Low bids in low density graphs: hurt Low density: wider range and can have larger mistakes In Contrast: Mean and Stay Conservative: fewer constraints changed Worse overall
Solutions Discourage Joint Actions Discount all bids SE-Threshold-2 & BE-Threshold-2 Only form pair if bid > (t × #constraints) Unless high bid, “play it safe” with k=1 Discount all bids SE-i-2 Generalize SE-Optimistic and SE-Mean BE-i-2 Explore utility is discounted (bias towards stay / backtrack) Both have extra parameter to tune (t or i)
Improved SE Algorithms
Improved BE Algorithms
(Selected) Open Questions Different amounts of coordination Often increasing coordination helps But it can hurt due to uncertainty in the environment Many open theoretical questions (current work with Scot Alfeld) How close are we to optimal? Can we predict how well an algorithm will perform? Multi-armed bandit is to MDP as DCEE is to MMDP ? TODO: backup slides on this!
Exploration vs. Exploitation Multi-armed Bandit How to choose? -greedy Confidence Intervals
Possible Class Projects New algorithms General k implementation Enhance simulator: not just small-scale fading Incorporate prior knowledge How to set parameter i in SE-i-2 and BE-i-2 Learn to change graph topologies Different objectives Maximize the minimum threshold Get all constraints above a threshold
(Quick) Simulator Demo
1-3: Get neighbors’ info 4-11: Make pair 12-14: No Pair 15-26: Can I/we move? 27-30: Move, if able