An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann HongGeoff Gordon CarnegieMellonUniversity
Multi-agent planning Optimize Shared constraints resources Individual constraints Individual objective
Individual constraints Individual objective Want: an efficient, distributed solver Factored MDPs [Guestrin et al., 2002] MDP: maximize linear reward Piece-wise linear constraints on shared resources Optimize Shared constraints resources Fast solver: value iteration
Distributed optimization Lagrangian relaxation How to set the prices? Gradient-based methods. Resource $100 NO $100 $50 $200 $80 $300 Solve in a distributed fashion
FISTA for factored MDPs linear objective : augment with a strongly convex function: causal entropy [Ziebart et al., 2010] – Usually regularization towards a more uniform policy – Retains a fast individual planner (softmax value iteration) – Introduces smoothing error (to the linear objective) We show that the gain in convergence can outweigh the approximation (smoothing) error.