A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts.
Agent-based sensor-mission assignment for tasks sharing assets Thao Le Timothy J Norman WambertoVasconcelos
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Decision Theoretic Planning
Pradeep Varakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe.
Markov Decision Processes
Planning under Uncertainty
Advanced MDP Topics Ron Parr Duke University. Value Function Approximation Why? –Duality between value functions and policies –Softens the problems –State.
Generalizing Plans to New Environments in Relational MDPs
Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Generalizing Plans to New Environments in Multiagent Relational MDPs Carlos Guestrin Daphne Koller Stanford University.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Distributed Planning in Hierarchical Factored MDPs Carlos Guestrin Stanford University Geoffrey Gordon Carnegie Mellon University.
Markov Decision Processes
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Multi-Agent Planning in Complex Uncertain Environments Daphne Koller Stanford University Joint work with: Carlos Guestrin (CMU) Ronald Parr (Duke)
Department of Computer Science Undergraduate Events More
Distributed Constraint Optimization * some slides courtesy of P. Modi
MAKING COMPLEX DEClSlONS
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
CS584 - Software Multiagent Systems Lecture 12 Distributed constraint optimization II: Incomplete algorithms and recent theoretical results.
Solving Large Markov Decision Processes Yilan Gu Dept. of Computer Science University of Toronto April 12, 2004.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Decision Diagrams for Sequencing and Scheduling Andre Augusto Cire Joint work with David Bergman, Willem-Jan van Hoeve, and John Hooker Tepper School of.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
A Study of Central Auction Based Wholesale Electricity Markets S. Ceppi and N. Gatti.
Department of Computer Science Undergraduate Events More
Exact and heuristics algorithms
Solving POMDPs through Macro Decomposition
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)
1 Solving Infinite Horizon Stochastic Optimization Problems John R. Birge Northwestern University (joint work with Chris Donohue, Xiaodong Xu, and Gongyun.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Department of Computer Science Undergraduate Events More
Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.
Linear Programming Chapter 1 Introduction.
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Keep the Adversary Guessing: Agent Security by Policy Randomization
Making complex decisions
István Szita & András Lőrincz
Markov Decision Processes
Structured Models for Multi-Agent Interactions
Markov Decision Processes
Exploiting Graphical Structure in Decision-Making
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Department of Electrical Engineering Joint work with Jiong Luo
Normal Form (Matrix) Games
Presentation transcript:

A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford University Construction Crew Problem: Dynamic Resource Allocation Joint Decision Space  Represent as MDP:  Action space: joint action a for all agents  State space: joint state x of all agents  Reward function: total reward r  Action space is exponential:  Action is assignment a = {a 1,…, a n }  State space:  Exponential in # variables  Global decision requires complete observation,, Context-Specific Structure Summary: Context-Specific Coordination Summary of Algorithm 1.Pick local rule-based basis functions h i 2.Single LP algorithm for Factored MDPs obtains Q i ’s 3.Variable coordination graph computes maximizing action Construction Crew Problem SysAdmin: Rule-based x Table-based  Search and rescue  Factory management  Supply chain  Firefighting  Network routing  Air traffic control  Multiple, simultaneous decisions  Limited observability  Limited communication Multiagent Coordination Examples Comparing to Apricodd [Boutilier et al. ’96-’99] Conclusions and Extensions Multiagent planning algorithm: Variable coordination structure; Limited context-specific communication; Limited context-specific observability. Solve large MDPs! Extensions to hierarchical and relational models Stanford UniversityStanford University ! CMU Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration WANTED: Agents that coordinate to build and maintain houses, but only when necessary! Foundation ! {Electricity, Plumbing} ! Painting ! Decoration Local Q-function Approximation M4M4 M1M1 M3M3 M2M2 Q3Q3 Q(A 1,…,A 4, X 1,…,X 4 ) ¼ Q 1 (A 1, A 4, X 1,X 4 ) + Q 2 (A 1, A 2, X 1,X 2 ) + Q 3 (A 2, A 3, X 2,X 3 ) + Q 4 (A 3, A 4, X 3,X 4 ) Associated with Agent 3 Observe only X 2 and X 3 Limited observability: agent i only observes variables in Q i Must choose action to maximize  i Q i Problems with Coordination Graph Tasks last multiple time steps Failures cause chain reactions Multiple houses Bidirectional Ring Server Reverse Star OptimalApricoddRule-based Expon Expon Expon OptimalApricoddRule-based Linear Linear Linear Context-Specific Coordination Structure Table size exponential in #variables Messages are tables Agents communicate even if not necessary Fixed coordination structure What we want: Use structure in tables Variable coordination structure Exploit context specific independence! A1A1 A4A4 A2A2 A3A3 Local value rules represent context-specific structure: Set of rules Q i for each agent Must coordinate to maximize total value: Rule-based variable elimination [Zhang and Poole ’99] Maximizing out A 1 Rule-based coordination graph for finding optimal action A - Simplification on instantiation of the state B - Simplification when passing messages C - Simplification on maximization Simplification by approximation Variable agent communication structure Coordination structure is dynamic Long-term Utility = Value of MDP Value computed by linear programming: One variable V(x) for each state One constraint for each state x and action a Number of states and actions exponential! Decomposable Value Function Linear combination of restricted domain basis functions: Each h i is a rule over small part(s) of a complex system: The value of having two agents in the same house The value of two agents are painting a house together Must find w giving good approximate value function Single LP Solution for Factored MDPs One variable w i for each basis function Polynomially many LP variables One constraint for every state and action  Factored MDP Plumbing i Painting i Plumbing i ’ Painting i ’ R A2A2 Required Tasks Dependent Tasks Agent 2 Plumbing, Painting Agent 1 Foundation, Electricity, Plumbing Agent 3 Electricity, Painting Agent 4 Decoration [Schweitzer and Seidmann ‘85] [Guestrin et al. ’01] Rule-based variable elim.  Exponentially smaller LP than table-based! A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 A Instantiate current state: x = true A1A1 A4A4 A2A2 A3A3 A5A5 A6A6 B Eliminate Variable A 1 C Local Maximization A4A4 A2A2 A3A3 A5A5 A6A6 Outline  Given long-term utilities  i Q i (x,a)  Local message passing computes maximizing action  Variable coordination structure  Long-term planning to obtain  i Q i (x,a)  Linear programming approach  Exploit context-specific structure [Bellman et al. ‘63], [Tsitsiklis & Van Roy ’96], [Koller & Parr ’99,’00], [Guestrin et al. ’01] Factored Value function V =  w i h i Factored Q function Q =  Q i Foundation ! {Electricity, Plumbing} ! Painting ! Decoration 2 Agents, 1 house Agent 1 = {Foundation, Electricity, Plumbing} Agent 2 = {Plumbing, Painting and Decoration} 4 Agents, 2 houses Agent 1 = {Painting, Decoration}; moves Agent 2 = {Foundation, Electricity, Plumbing, Painting} house 1 Agent 3 = {Foundation, Electricity} house 2 Agent 4 = {Plumbing, Decoration} house 2 Example 1: Example 2: Actual value of resulting policies Our rule-based approachApricodd Algorithm based onLinear programmingValue iteration Types of independence exploitedAdditive and context-specificOnly context-specific “Basis function” representationSpecified by userDetermined by algorithm Introduction Context-Specific Coordination, Given Q i ’sLong-Term Planning, Computing Q i ’sExperimental Results Use Coordination graph [Guestrin et al. ’01] Use variable elimination for maximization: [Bertele & Brioschi ‘72] Limited communication for optimal action choice Comm. bandwidth = induced width of coord. graph Here we need only 23, instead of 63 sum operations. A1A1 A4A4 A2A2 A3A3 ),(),(),(max , 321 AAgAAQAAQ AAA    ),(),( ),(),( , AAQAAQ AAQAAQ AAAA   ),(),(),(),( ,,, AAQAAQAAQAAQ AAAA  Computing Maximizing Action: Coordination Graph For every action of A 2 and A 3, maximum value for A 4 h i and Q i depend on small sets of variables and actions Polynomial-time algorithm generates compact LP