Aspiration-based Learning

Aspiration-based Learning
Lecture 2 Supplementary Materials Zhu Han Thanks Chuanyin Li for preparing the slides

Contents Applications and Use cases
Brief introduction of Weakly Acyclic Games (WAG) Aspiration-based Learning Benchmark-based dynamics Trial-and-error dynamics Mood-based dynamics Aspiration-based dynamics Conclusions

Automotive traffic routing
Objective minimize congestion experienced without knowledge other drivers’ selections structure of congestion function A similar problem is automotive traffic routing, in which drivers seek to minimize the congestion experienced to get to a desired destination. Drivers can experience the congestion on selected routes as a function of the routes selected by other drivers, but drivers do not know the structure of the congestion function. For example, commuters in a city can choose which routes to take to work. Their choices affect congestion on the roads, which determines the payoffs of other commuters. But no single commuter can be expected to know the others' commuting strategies or how their strategies influence his own commuting time. illustration of automotive traffic routing

Firms’ competition Objective without knowledge maximize profits
Other firms’ pricing strategies structure of profit function Similarly, in a market with many competing firms, no single firm is likely to know precisely what the other firms' marketing and pricing strategies are, or how these strategies affect its own profits (even though this assumption is routinely invoked in textbook models of competition).

Wind farm Objective Challenge without knowledge
maximize total power production Challenge an array of turbines without knowledge other turbines’ actions functional relationship between total power generated and actions One example of a system that exhibits these challenges is the control of a wind farm to maximize total power production. Controlling an array of turbines in a wind farm is fundamentally more challenging than controlling a single turbine. The reason is the aerodynamic interactions amongst the turbines, which render many of the single turbine control algorithms highly inefficient for optimizing total power production. Here, the goal is to establish a distributed control algorithm that enables the individual turbines to adjust their behavior based on local conditions, so as to maximize total system performance. One way to handle this largescale coordination problem is to model the interactions of the turbines in a game theoretic environment. However, the space of admissible utility functions for the individual turbines is limited because of the following informational constraints No turbine has access to the actions of other turbines, due to the lack of a suitable communication system; No turbine has access to the functional relationship between the total power generated and the action of the other turbines. The reason is that the aerodynamic interaction between the turbines is poorly understood from an engineering standpoint.

Distributed routing Objective without knowledge
minimize cost, i.e., delay without knowledge overall network structure Functional dependence of delay on routing strategies In distributed routing for ad hoc data networks, routing nodes seek to route packets to neighboring nodes based on packet destinations without knowledge of the overall network structure. The objective is to minimize the delay of packets to their destinations. This delay must be realized through trial and error, since the functional dependence of delay on routing strategies is not known. For example, in network formation, nodes need to choose their immediate links so that connectivity is achieved with a minimum possible communication cost, i.e., minimum number of links.

Medium access control Objective without knowledge Fair sharing
other users’ decisions Similarly, in medium access control, users need to establish a fair scheduling of accessing a shared communication channel so that collisions (i.e., situations at which two or more users access the common resource) are avoided.

Common characteristics
multi-agent cooperative control problems strategy adjustment process Known information own action Own payoff Unknown information Other players’ actions structural form of payoff functions The objective in distributed cooperative control for multi-agent systems is to enable a collection of “self-interested” agents to achieve a desirable “collective” objective. agents may know nothing about the structure of their utility functions, and how their own utility depends on the actions of other agents (whether local or far away).

Weakly Acyclic Games 𝑵-Players set: Finite action set:
action profile set: action profile: Utility function: = Pure Nash Equilibrium

Weakly Acyclic Games Identical Interest Games Potential games
better reply path , ∃𝑖 such that Weakly Acyclic Games better reply path cycle back on itself

Benchmark-based Dynamics
1. Initialization Each player 𝒊 selects action Baseline action Baseline utility 2. Action Selection with prob. chosen randomly over 𝐴𝑖 with prob. 3.Baseline action and utility update 4. Return to Step 2 and repeat Player 𝑖 experimented If Else Player 𝑖 didn’t experimented

Convergence: Nash equilibrium
Theorem: Given any probability 𝑝<1, if the exploration rate 𝜖>0 is sufficiently small, the for all sufficiently large steps t, a(t) is a Nash equilibrium with at least probability 𝑝. resistance trees for perturbed Markov chains The objective in distributed cooperative control for multi-agent systems is to enable a collection of “self-interested” agents to achieve a desirable “collective” objective. agents may know nothing about the structure of their utility functions, and how their own utility depends on the actions of other agents (whether local or far away). J. Marden, H. Young, G. Arslan, and J. Shamma, “Payoff-based dynamics for multiplayer weakly acyclic games,” SIAM Journal on Control and Optimization , vol. 48, no. 1, pp. 373–396, 2009.

Illustration: transportation
In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. 1000 vehicles that need to traverse through the network. the cost incurred in a road segment depends only on the total number of drivers sharing that road drivers are anonymous

Illustration: transportation
In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003) the vehicles’ collective behavior does indeed approach that of the Nash equilibrium

Trial-and-error Dynamics
A state of player Current mood Content Discontent Watchful hopeful Current benchmark action Current benchmark payoff Player 𝑖 experimented If Else Player 𝑖 didn’t experimented

Content Experiment w.p. 𝜀 Not exp. w.p. 1−𝜀 Watchful Benchmark strategy In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

hopeful Benchmark strategy discontent Uniformly random 𝜙 is a function that is larger with higher In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Definition: A game is interdependent if any proper subset S of players can influence the payoff of at least one player not in S by some (joint) choice of actions. Theorem: If the game is interdependent and has at least one pure Nash equilibrium. Given 𝛿>0 and sufficiently small experimentation probability 𝜖, then a pure Nash equilibrium is played at least 1−𝛿 of the time. perturbed Markov chains In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003) H. P. Young, “Learning by trial and error,” Games and Economic Behavior , vol. 65, no. 2, pp. 626 – 643, 2009.

Mood-based Dynamics Welfare of an action profile 𝑎∈𝐴
Action profile is efficient In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Mood-based Dynamics Agent Dynamics State Dynamics Content Discontent
if else In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Mood-based Dynamics A state is stochastically stable iff
. R. Marden, H. P. Young, and L. Y. Pao, “Achieving pareto optim ality through distributed learning,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) , Dec 2012, pp. 7419–7424. A state is stochastically stable iff action profile optimizes the benchmark actions and payoffs are aligned, i.e., for all 𝑖 The mood of each agent is content, i.e., for all i. In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003) J. R. Marden, H. P. Young, and L. Y. Pao, “Achieving pareto optimality through distributed learning,” in IEEE Conference on Decision and Control (CDC), Dec. 2012, pp. 7419–7424.

Illustration: Prisoner’s Dilemma
(B;B) is the unique pure Nash equilibrium. In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Aspiration-based Dynamics
Some constants , and At every , and agent 𝑖∈𝑁 Aspiration update In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Aspiration-based Dynamics
Action update In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003) G. C. Chasparis, J. S. Shamma, and A. Arapostathis, “Aspiration learning in coordination games,” in IEEE Conference on Decision and Control (CDC) , Dec 2010, pp. 5756–5761.

Conclusions multi-agent cooperative control problems Known information
own action Own payoff Unknown information Other players’ actions structural form of payoff functions Aspiration-based Learning Benchmark-based dynamics Trial-and-error dynamics Mood-based dynamics Aspiration-based dynamics In a congestion game, a payoff based learning algorithms means that drivers have access only to the actual congestion experienced. Drivers are unaware of the congestion level on any alternative routes. A learning rule is uncoupled if it does not require information about the opponents’ payoffs, though it may depend on their actions (Hart and Mas-Colell, 2003)

Thank you for your attention.

Aspiration-based Learning

Similar presentations

Presentation on theme: "Aspiration-based Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aspiration-based Learning

Similar presentations

Presentation on theme: "Aspiration-based Learning"— Presentation transcript:

Similar presentations

About project

Feedback