An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann HongGeoff Gordon CarnegieMellonUniversity.

Slides:



Advertisements
Similar presentations
Optimal transport methods for mesh generation Chris Budd (Bath), JF Williams (SFU)
Advertisements

Bayesian Belief Propagation
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
Regularized risk minimization
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Dynamic Balance of Cloud Vertical Velcoty
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng, David Zhang
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
Generalizing Plans to New Environments in Relational MDPs
Yashar Ganjali Computer Systems Laboratory Stanford University February 13, 2003 Optimal Routing in the Internet.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Generalizing Plans to New Environments in Multiagent Relational MDPs Carlos Guestrin Daphne Koller Stanford University.
Distributed Planning in Hierarchical Factored MDPs Carlos Guestrin Stanford University Geoffrey Gordon Carnegie Mellon University.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-
Economics 214 Lecture 37 Constrained Optimization.
Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University.
1 University of Southern California Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
Discretization Pieter Abbeel UC Berkeley EECS
Geometric (Classical) MultiGrid. Linear scalar elliptic PDE (Brandt ~1971)  1 dimension Poisson equation  Discretize the continuum x0x0 x1x1 x2x2 xixi.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Martin Burger Total Variation 1 Cetraro, September 2008 Numerical Schemes Wrap up approximate formulations of subgradient relation.
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Optimization of Linear Problems: Linear Programming (LP) © 2011 Daniel Kirschen and University of Washington 1.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Advanced Image Processing Image Relaxation – Restoration and Feature Extraction 02/02/10.
A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
1 Efficiency and Nash Equilibria in a Scrip System for P2P Networks Eric J. Friedman Joseph Y. Halpern Ian Kash.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Multi-area Nonlinear State Estimation using Distributed Semidefinite Programming Hao Zhu October 15, 2012 Acknowledgements: Prof. G.
CoFi Rank : Maximum Margin Matrix Factorization for Collaborative Ranking Markus Weimer, Alexandros Karatzoglou, Quoc Viet Le and Alex Smola NIPS’07.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Robust inversion, dimensionality reduction, and randomized sampling (Aleksandr Aravkin Michael P. Friedlander et al) Cong Fang Dec 14 th 2014.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Economics 2301 Lecture 37 Constrained Optimization.
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
Anytime Planning of Optimal Schedules for a Mobile Sensing Robot JINGJIN YU JAVED ASLAM SERTAC KARAMAN DANIELA RUS SPEAKER: SANKALP ARORA.
OR Integer Programming ( 정수계획법 ). OR
Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:
CS 541: Artificial Intelligence Lecture X: Markov Decision Process Slides Credit: Peter Norvig and Sebastian Thrun.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
Monitoring rivers and lakes [IJCAI ‘07]
Computational Optimization
István Szita & András Lőrincz
Positive Gordon–Wixom Coordinates
CS5321 Numerical Optimization
Reinforcement Learning
Heuristic Search Value Iteration
Reinforcement Learning Dealing with Partial Observability
ADMM and DSO.
Stochastic Methods.
Presentation transcript:

An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann HongGeoff Gordon CarnegieMellonUniversity

Multi-agent planning Optimize Shared constraints resources Individual constraints Individual objective

Individual constraints Individual objective Want: an efficient, distributed solver Factored MDPs [Guestrin et al., 2002] MDP: maximize linear reward Piece-wise linear constraints on shared resources Optimize Shared constraints resources Fast solver: value iteration

Distributed optimization Lagrangian relaxation How to set the prices? Gradient-based methods. Resource $100 NO $100 $50 $200 $80 $300 Solve in a distributed fashion

FISTA for factored MDPs linear objective  : augment with a strongly convex function: causal entropy [Ziebart et al., 2010] – Usually regularization towards a more uniform policy – Retains a fast individual planner (softmax value iteration) – Introduces smoothing error (to the linear objective) We show that the gain in convergence can outweigh the approximation (smoothing) error.