A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford University Construction Crew Problem: Dynamic Resource Allocation Joint Decision Space  Represent as MDP:  Action space: joint action a for all agents  State space: joint state x of all agents  Reward function: total reward r  Action space is exponential:  Action is assignment a = {a 1,…, a n }  State space:  Exponential in # variables  Global decision requires complete observation,, Context-Specific Structure Summary: Context-Specific Coordination Summary of Algorithm 1.Pick local rule-based basis functions h i 2.Single LP algorithm for Factored MDPs obtains Q i ’s 3.Variable coordination graph computes maximizing action Construction Crew Problem SysAdmin: Rule-based x Table-based  Search and rescue  Factory management  Supply chain  Firefighting  Network routing  Air traffic control  Multiple, simultaneous decisions  Limited observability  Limited communication Multiagent Coordination Examples Comparing to Apricodd [Boutilier et al. ’96-’99] Conclusions and Extensions Multiagent planning algorithm: Variable coordination structure; Limited context-specific communication; Limited context-specific observability. Solve large MDPs! Extensions to hierarchical and relational models Stanford UniversityStanford University ! CMU Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration WANTED: Agents that coordinate to build and maintain houses, but only when necessary! Foundation ! {Electricity, Plumbing} ! Painting ! Decoration Local Q-function Approximation M4M4 M1M1 M3M3 M2M2 Q3Q3 Q(A 1,…,A 4, X 1,…,X 4 ) ¼ Q 1 (A 1, A 4, X 1,X 4 ) + Q 2 (A 1, A 2, X 1,X 2 ) + Q 3 (A 2, A 3, X 2,X 3 ) + Q 4 (A 3, A 4, X 3,X 4 ) Associated with Agent 3 Observe only X 2 and X 3 Limited observability: agent i only observes variables in Q i Must choose action to maximize  i Q i Problems with Coordination Graph Tasks last multiple time steps Failures cause chain reactions Multiple houses Bidirectional Ring Server Reverse Star OptimalApricoddRule-based Expon06530.9 Expon0877.09 Expon100.034 OptimalApricoddRule-based Linear06531.4 Linear08430.5 Linear10348.7 Context-Specific Coordination Structure Table size exponential in #variables Messages are tables Agents communicate even if not necessary Fixed coordination structure What we want: Use structure in tables Variable coordination structure Exploit context specific independence! A1A1 A4A4 A2A2 A3A3 Local value rules represent context-specific structure: Set of rules Q i for each agent Must coordinate to maximize total value: Rule-based variable elimination [Zhang and Poole ’99] Maximizing out A 1 Rule-based coordination graph for finding optimal action A - Simplification on instantiation of the state B - Simplification when passing messages C - Simplification on maximization Simplification by approximation Variable agent communication structure Coordination structure is dynamic Long-term Utility = Value of MDP Value computed by linear programming: One variable V(x) for each state One constraint for each state x and action a Number of states and actions exponential! Decomposable Value Function Linear combination of restricted domain basis functions: Each h i is a rule over small part(s) of a complex system: The value of having two agents in the same house The value of two agents are painting a house together Must find w giving good approximate value function Single LP Solution for Factored MDPs One variable w i for each basis function Polynomially many LP variables One constraint for every state and action  Factored MDP Plumbing i Painting i Plumbing i ’ Painting i ’ R A2A2 Required Tasks Dependent Tasks Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration [Schweitzer and Seidmann ‘85] [Guestrin et al. ’01] Rule-based variable elim.  Exponentially smaller LP than table-based! A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A Instantiate current state: x = true A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 B Eliminate Variable A 1 C Local Maximization A4A4 A2A2 A3A3 A5A5 A6A6 Outline  Given long-term utilities  i Q i (x,a)  Local message passing computes maximizing action  Variable coordination structure  Long-term planning to obtain  i Q i (x,a)  Linear programming approach  Exploit context-specific structure [Bellman et al. ‘63], [Tsitsiklis & Van Roy ’96], [Koller & Parr ’99,’00], [Guestrin et al. ’01] Factored Value function V =  w i h i Factored Q function Q =  Q i Foundation ! {Electricity, Plumbing} ! Painting ! Decoration 2 Agents, 1 house Agent 1 = {Foundation, Electricity, Plumbing} Agent 2 = {Plumbing, Painting and Decoration} 4 Agents, 2 houses Agent 1 = {Painting, Decoration}; moves Agent 2 = {Foundation, Electricity, Plumbing, Painting} house 1 Agent 3 = {Foundation, Electricity} house 2 Agent 4 = {Plumbing, Decoration} house 2 Example 1: Example 2: Actual value of resulting policies Our rule-based approachApricodd Algorithm based onLinear programmingValue iteration Types of independence exploitedAdditive and context-specificOnly context-specific “Basis function” representationSpecified by userDetermined by algorithm Introduction Context-Specific Coordination, Given Q i ’sLong-Term Planning, Computing Q i ’sExperimental Results Use Coordination graph [Guestrin et al. ’01] Use variable elimination for maximization: [Bertele & Brioschi ‘72] Limited communication for optimal action choice Comm. bandwidth = induced width of coord. graph Here we need only 23, instead of 63 sum operations. A1A1 A4A4 A2A2 A3A3 ),(),(),(max 321312211, 321 AAgAAQAAQ AAA    ),(),( ),(),( 424433 312211, 43 2 1 AAQAAQ AAQAAQ AAAA   ),(),(),(),( 424433312211,,, 432 1 AAQAAQAAQAAQ AAAA  Computing Maximizing Action: Coordination Graph For every action of A 2 and A 3, maximum value for A 4 h i and Q i depend on small sets of variables and actions Polynomial-time algorithm generates compact LP

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

Similar presentations

Presentation on theme: "A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

Similar presentations

Presentation on theme: "A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford."— Presentation transcript:

Similar presentations

About project

Feedback