Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University
Outline Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion Wavelets approximation Experimental results Conclusions
Introduction In MDP models, it is desirable/necessary to approximate the value function for a large state size or reinforcement learning situation. Two novel approaches are explored in this paper to make value function approximation on state space graphs
Approximate policy iteration In a RL MDP model, value function approximation is a part of approximate policy iteration process, which is used to iteratively solve the RL problem.
Approximate policy iteration Sample (s, a, r, s’)
Value function approximation A variety of linear and non-linear architectures have been widely studied as they offer many advantages in the context of value function approximation However, many of them are handcoded in an ad hoc trial-and-error process by a human designer.
Value function approximation A finite MDP can be defined as Any policy defines a unique value function, which satisfies the Bellman equation We want to project the value function into another lower dimensional space
Value function approximation In the approximation, is a |S||A|*k matrix, each column of which is a basis function evaluated at (s,a) points, k is the number of basis functions selected and is a weight vector. The problem is how to efficiently and effectively construct those basis functions
Laplacian eigenfunctions We model the state space as a finite undirected weighted graph (G,E,W) The combinational Laplacian L is defined as: The normalized Laplacian is We use the eigenfunctions of L as the orthonormal basis
Diffusion wavelets Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. They allows fast and accurate computation of high powers of a Markov chain P on the graph, including direct computation of the Green’s function of the Markov chain, (I- P) -1, for solving Bellman’s equation.
Diffusion wavelets Markov Random Walk We symmetrize P and take powers where and are eigenvalues and eigenfunctions of the normalized Laplacian
Diffusion wavelets A diffusion wavelets tree consists of orthogonal diffusion scaling function and orthogonal wavelets. The scaling functions span a subspace with the property,and the span of wavelets,,is the orthogonal complement of into.
Diffusion wavelets
The detail subspaces Downsampling, orthogonalization, and operator compression - diffusion maps: X is the data set A - diffusion operator, G – Gram-Schmidt ortho-normalization, M - A G
Diffusion wavelets
Experimental results
Conclusions Two novel value function approximation methods are exploited The underlying representation and policies are simultaneously learned Diffusion wavelets is a powerful tool for signal processing techniques of functions on manifolds and graphs