Presentation is loading. Please wait.

Presentation is loading. Please wait.

 We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated.

Similar presentations


Presentation on theme: " We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated."— Presentation transcript:

1  We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated Coalition Formation under Uncertainty Georgios Chalkiadakis Craig Boutilier Ongoing and Future Work  We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ) : Paper in this AAMAS!: W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers” p1 c1 e1 c2 p2 e2 c3 C0C0 C2C2 C1C1 e3 p3 I believe that some guys are better than my current partners…but is there any possible coalition that can guarantee me a higher payoff share? Beliefs are over types. Types reflect capabilities (private information) Agents have to:  decide who to join  decide on how to act  decide how to share the coalitional value / utility Coalition structure CS=  C 0, C 1, C 2  p1 c1 e1 c2 p2 e2 c3 Coalition C 0 ={p1, c2, e1} C2C2 C1C1 e3 p3 Action vector: a=  a C 0, a C 1, a C 2  Coalitional value: u(C 0 | a C 0 ) = 30 Allocation: Coalition Formation Reasoning under Type UncertaintyType Uncertainty: It Matters! Coalition structure CS=  C 0, C 1, C 2  + Action-related uncertainty + Action outcomes are stochastic + No superadditivity assumptions  Agents have own beliefs about the types (capabilities) of others.  Type uncertainty is then translated to value uncertainty:  According to i, what’s the value (quality) of ? A Bayesian Coalition Formation Model  N agents; each agent i has a type t  T i  Set of type profiles:  For any coalition  Agent i has beliefs about the types of the members of any C of agents:  Coalitional actions (i.e., choice of task) for C : A C  Action’s outcome s  S (given actual members’ types)  Probability  Each s results into some reward R(s)  Each i has (a possibly different) estimate about the value of any coalition C: Example experiment: The Good, the Bad, and the Ugly Optimal Repeated Coalition Formation Approximation Algorithms Discounted accumulated rewards: Total actual rewards gathered during the “Big Crime” phase: Belief-State MDP formulation to address the induced exploration- exploitation problem: i.e.: Equations account for the sequential value of coalitional agreements  One-step lookahead (OSLA): performs a one-step lookahead in belief space  VPI exploration: estimates Value of Perfect Information regarding coalitional agreements  VPI-over-OSLA: combines VPI with OSLA  Maximum a Posteriori (MAP): uses the most likely type vector given beliefs  Myopic: calculates expectation disregarding the sequential value of formation decisions Takes into account the immediate reward from forming a coalition and executing an action Takes into account the long-term impact of a coalitional agreement (i.e., the value of information: through belief- state updating and incorporation of the belief-state value into calculations) School of Electronics and Computer Science University of Southampton Southampton, United Kingdom Department of Computer Science University of Toronto Toronto, Canada VPI is a winner! Balances the expected gain against the expected cost from executing a suboptimal action: Use current model to myopically evaluate actions’ EU Assume an action results to perfect information regarding its Q-value. This perfect information has non-zero value only if it results to a change in policy. EVPI is calculated and accounted for in action selection (act greedily towards EU + EVPI ) a) Bayesian, and, yet, b) efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates  Consistently outperforms other approximation algorithms  Scales to dozens / hundreds of agents (see below), unlike lookahead approaches


Download ppt " We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated."

Similar presentations


Ads by Google