Presentation is loading. Please wait.

Presentation is loading. Please wait.

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University.

Similar presentations


Presentation on theme: "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University."— Presentation transcript:

1 Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University

2 Outline Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion Wavelets approximation Experimental results Conclusions

3 Introduction In MDP models, it is desirable/necessary to approximate the value function for a large state size or reinforcement learning situation. Two novel approaches are explored in this paper to make value function approximation on state space graphs

4 Approximate policy iteration In a RL MDP model, value function approximation is a part of approximate policy iteration process, which is used to iteratively solve the RL problem.

5 Approximate policy iteration Sample (s, a, r, s’)

6 Value function approximation A variety of linear and non-linear architectures have been widely studied as they offer many advantages in the context of value function approximation However, many of them are handcoded in an ad hoc trial-and-error process by a human designer.

7 Value function approximation A finite MDP can be defined as Any policy defines a unique value function, which satisfies the Bellman equation We want to project the value function into another lower dimensional space

8 Value function approximation In the approximation, is a |S||A|*k matrix, each column of which is a basis function evaluated at (s,a) points, k is the number of basis functions selected and is a weight vector. The problem is how to efficiently and effectively construct those basis functions

9 Laplacian eigenfunctions We model the state space as a finite undirected weighted graph (G,E,W) The combinational Laplacian L is defined as: The normalized Laplacian is We use the eigenfunctions of L as the orthonormal basis

10 Diffusion wavelets Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. They allows fast and accurate computation of high powers of a Markov chain P on the graph, including direct computation of the Green’s function of the Markov chain, (I- P) -1, for solving Bellman’s equation.

11 Diffusion wavelets Markov Random Walk We symmetrize P and take powers where and are eigenvalues and eigenfunctions of the normalized Laplacian

12 Diffusion wavelets A diffusion wavelets tree consists of orthogonal diffusion scaling function and orthogonal wavelets. The scaling functions span a subspace with the property,and the span of wavelets,,is the orthogonal complement of into.

13 Diffusion wavelets

14 The detail subspaces Downsampling, orthogonalization, and operator compression  - diffusion maps: X is the data set A - diffusion operator, G – Gram-Schmidt ortho-normalization, M - A  G

15 Diffusion wavelets

16

17 Experimental results

18 Conclusions Two novel value function approximation methods are exploited The underlying representation and policies are simultaneously learned Diffusion wavelets is a powerful tool for signal processing techniques of functions on manifolds and graphs


Download ppt "Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University."

Similar presentations


Ads by Google