Download presentation
Presentation is loading. Please wait.
Published byOlivia McLaughlin Modified over 9 years ago
1
A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford University Construction Crew Problem: Dynamic Resource Allocation Joint Decision Space Represent as MDP: Action space: joint action a for all agents State space: joint state x of all agents Reward function: total reward r Action space is exponential: Action is assignment a = {a 1,…, a n } State space: Exponential in # variables Global decision requires complete observation,, Context-Specific Structure Summary: Context-Specific Coordination Summary of Algorithm 1.Pick local rule-based basis functions h i 2.Single LP algorithm for Factored MDPs obtains Q i ’s 3.Variable coordination graph computes maximizing action Construction Crew Problem SysAdmin: Rule-based x Table-based Search and rescue Factory management Supply chain Firefighting Network routing Air traffic control Multiple, simultaneous decisions Limited observability Limited communication Multiagent Coordination Examples Comparing to Apricodd [Boutilier et al. ’96-’99] Conclusions and Extensions Multiagent planning algorithm: Variable coordination structure; Limited context-specific communication; Limited context-specific observability. Solve large MDPs! Extensions to hierarchical and relational models Stanford UniversityStanford University ! CMU Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration WANTED: Agents that coordinate to build and maintain houses, but only when necessary! Foundation ! {Electricity, Plumbing} ! Painting ! Decoration Local Q-function Approximation M4M4 M1M1 M3M3 M2M2 Q3Q3 Q(A 1,…,A 4, X 1,…,X 4 ) ¼ Q 1 (A 1, A 4, X 1,X 4 ) + Q 2 (A 1, A 2, X 1,X 2 ) + Q 3 (A 2, A 3, X 2,X 3 ) + Q 4 (A 3, A 4, X 3,X 4 ) Associated with Agent 3 Observe only X 2 and X 3 Limited observability: agent i only observes variables in Q i Must choose action to maximize i Q i Problems with Coordination Graph Tasks last multiple time steps Failures cause chain reactions Multiple houses Bidirectional Ring Server Reverse Star OptimalApricoddRule-based Expon06530.9 Expon0877.09 Expon100.034 OptimalApricoddRule-based Linear06531.4 Linear08430.5 Linear10348.7 Context-Specific Coordination Structure Table size exponential in #variables Messages are tables Agents communicate even if not necessary Fixed coordination structure What we want: Use structure in tables Variable coordination structure Exploit context specific independence! A1A1 A4A4 A2A2 A3A3 Local value rules represent context-specific structure: Set of rules Q i for each agent Must coordinate to maximize total value: Rule-based variable elimination [Zhang and Poole ’99] Maximizing out A 1 Rule-based coordination graph for finding optimal action A - Simplification on instantiation of the state B - Simplification when passing messages C - Simplification on maximization Simplification by approximation Variable agent communication structure Coordination structure is dynamic Long-term Utility = Value of MDP Value computed by linear programming: One variable V(x) for each state One constraint for each state x and action a Number of states and actions exponential! Decomposable Value Function Linear combination of restricted domain basis functions: Each h i is a rule over small part(s) of a complex system: The value of having two agents in the same house The value of two agents are painting a house together Must find w giving good approximate value function Single LP Solution for Factored MDPs One variable w i for each basis function Polynomially many LP variables One constraint for every state and action Factored MDP Plumbing i Painting i Plumbing i ’ Painting i ’ R A2A2 Required Tasks Dependent Tasks Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration [Schweitzer and Seidmann ‘85] [Guestrin et al. ’01] Rule-based variable elim. Exponentially smaller LP than table-based! A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A Instantiate current state: x = true A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 B Eliminate Variable A 1 C Local Maximization A4A4 A2A2 A3A3 A5A5 A6A6 Outline Given long-term utilities i Q i (x,a) Local message passing computes maximizing action Variable coordination structure Long-term planning to obtain i Q i (x,a) Linear programming approach Exploit context-specific structure [Bellman et al. ‘63], [Tsitsiklis & Van Roy ’96], [Koller & Parr ’99,’00], [Guestrin et al. ’01] Factored Value function V = w i h i Factored Q function Q = Q i Foundation ! {Electricity, Plumbing} ! Painting ! Decoration 2 Agents, 1 house Agent 1 = {Foundation, Electricity, Plumbing} Agent 2 = {Plumbing, Painting and Decoration} 4 Agents, 2 houses Agent 1 = {Painting, Decoration}; moves Agent 2 = {Foundation, Electricity, Plumbing, Painting} house 1 Agent 3 = {Foundation, Electricity} house 2 Agent 4 = {Plumbing, Decoration} house 2 Example 1: Example 2: Actual value of resulting policies Our rule-based approachApricodd Algorithm based onLinear programmingValue iteration Types of independence exploitedAdditive and context-specificOnly context-specific “Basis function” representationSpecified by userDetermined by algorithm Introduction Context-Specific Coordination, Given Q i ’sLong-Term Planning, Computing Q i ’sExperimental Results Use Coordination graph [Guestrin et al. ’01] Use variable elimination for maximization: [Bertele & Brioschi ‘72] Limited communication for optimal action choice Comm. bandwidth = induced width of coord. graph Here we need only 23, instead of 63 sum operations. A1A1 A4A4 A2A2 A3A3 ),(),(),(max 321312211, 321 AAgAAQAAQ AAA ),(),( ),(),( 424433 312211, 43 2 1 AAQAAQ AAQAAQ AAAA ),(),(),(),( 424433312211,,, 432 1 AAQAAQAAQAAQ AAAA Computing Maximizing Action: Coordination Graph For every action of A 2 and A 3, maximum value for A 4 h i and Q i depend on small sets of variables and actions Polynomial-time algorithm generates compact LP
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.