Download presentation
Presentation is loading. Please wait.
Published byMarianna Willis Modified over 9 years ago
1
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, and Michael L. Littman
2
A Walk Through Our Paper Features: Training Data: (s,r,s’),(s,r,s’), (s,r,s’) Linear Model P – (k × k) R – (k × 1) Project dynamics into feature space (minimizing L 2 error in predicted next features) Linear Value Function V= w Solve for exact value Function given P , R Solve for linear fixed Point using linear TD, LSTD, etc.
3
A Walk Through Our Paper Features: Training Data: (s,r,s’),(s,r,s’) (s,r,s’)… Linear Model P – (k × k) R – (k × 1) Project dynamics into feature space (minimizing L 2 error in predicted next features) Linear Value Function V= w Solve for exact value Function given P , R Solve for linear fixed Point using linear TD, LSTD, etc. Bellman error of linear fixed point solution Reward error Per feature error Insight into feature selection! 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions Total Bellman Error Reward Error Feature Error
4
Outline Terminology/notation review Linear model, linear fixed point equivalence Bellman error as function of model error Feature selection insights Experimental results
5
Basic Terminology Markov Reward Process (MRP)* – States – S=[s 1 …s n ] – Reward – R:S ℝ – Transition matrix – P[i,j]=P(s i |s j ) – Discount – 0≤ <1 – Value – V(s) = expected, discounted value of state s True Value Function: V*=(I- P) -1 R *Ask about MDPs later.
6
Linear Value Function Approximation |S| typically quite large Pick linearly independent features =( 1 … k ) (basis functions) Desire weights w=w 1 …w k, s.t.
7
Bellman Operator Used in, e.g., value iteration: Defines fixed point: V*=TV* Bellman error (residual): BE bounds actual error:
8
Linear Fixed Point V=weights of projection of V into span( ) LSTD, linear TD, etc. solve for the linear fixed point: span( )
9
Outline Terminology/notation review Linear model, linear fixed point equivalence Bellman error as function of model error Feature selection insights Experimental results
10
Linear Model Approximation Linearly independent features =( 1 … k ) (n × k) Want R = reward model (k × 1 ) w/smallest L 2 error: Want P = feature × feature model (k × k ) w/ smallest L 2 error Expected next feature values (n × k)
11
Value Function of the Linear Model Value function is in span( ) Can express value functions as w If V is bounded, then: Note similarity to conventional solution: (k × k)(k × 1) (n × 1) (n × n)
12
Linear Model, Linear Fixed Point Equivalence Theorem: For features , the linear model’s exact value function and the linear fixed point solution are identical. Proof sketch: Note: Preliminary observations along these lines by Boyan [99] Approximate model appears in linear fixed point definition! Definition of linear fixed point solution
13
Linear Model, Linear Fixed Point Equivalence (s,a,s’), (s,a,s’), … Training Data Given: Linearly independent features =( 1 … k ) Linear Model P – (k × k) R – (k × 1) Project dynamics into feature space (minimizing L 2 error in predicted next features) Linear Value Function V= w Solve for exact value Function given P , R Solve for linear fixed Point using linear TD, LSTD, etc.
14
Outline Terminology/notation review Linear model, linear fixed point equivalence Bellman error as function of model error Feature selection insights Experimental results
15
Model Error Linearly independent features =( 1 … k ) Error in reward: Error in predicted next features: (per feature error) Expected next feature values Predicted next feature values (n × k) (n × 1)
16
Bellman Error Theorem: Bellman error of linear fixed point solution Reward error Per feature error Punch line: Bellman error decomposes into a function of model errors!
17
Outline Terminology/Notation Review Linear model, linear fixed point equivalence Bellman error as function of model error Feature selection insights Experimental results
18
Insights into Feature Selection I Features should model the reward. The reward itself is a useful feature!
19
Insights into Feature Selection II Features “predict themselves” BE= R, no dependence on Value function approximation, feature selection reduce to regression problems on R w =(I- P) -1 R When = 0: Approximate reward True P!
20
Achieving Zero Feature Error ( = 0) When are features sufficient for 0 error in expected next feature values? – Rank( )=|S| – is composed of eigenvectors of P – span an invariant subspace of P Invariant subspace:
21
Insight into Adding Features Methods for adding features – Add the Bellman error as a feature (BEBF) [Wu & Givan, 2004; Sanner & Boutilier, 2005; Keller et al., 2006), Parr et al. (2007)] – Add the model errors ( , R) as features (MEBF) – Add P k R for increasing k (Krylov basis) [Petrik (2007)] Theorem: BEBF, MEBF, and the Krylov basis are equivalent when initialized with ={} Note: Special thanks to Marek Petrik for demonstrating Krylov=BEBF
22
Insight into Proto Value Functions Proto value functions (PVFs) compute eigenvectors of a modified adjacency graph (Laplacian) [Mahadevan & Maggioni] Adjacency graph = approximate P PVFs ~ eigenvectors of P ∴ PVFs = approximation to subspace invariant features (Empirically, closeness of this approximation varies) Note: Similar observations made by Petrik, who considered a version of PVFs that used P instead of Laplacian.
23
Outline Terminology/notation review Linear model, linear fixed point equivalence Bellman error as function of model error Feature selection insights Experimental results
24
Experimental Results Four Algorithms – PVFs (In order of “smoothness”) – PVF-MP (Matching pursuits w/PVF dictionary) – eig-MP (Matching pursuits w/eigenvectors of P) – BEBF (AKA MEBF, Krylov basis) Measured (in L 2 ) as a function of number of basis functions added: – Total Bellman error – Reward error R – Total feature error w Three problems – Chain [Lagoudakis & Parr] (talk, paper, poster) – Two Room [Mahadevan & Maggioni] (paper, poster) – Blackjack [Sutton & Barto] (paper, poster)
25
Chain Results PVF PVF-MP Eig-MP BEBF 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions Total Bellman ErrorReward ErrorFeature Error 50 state chain from Lagoudakis & Parr Ask about blackjack or the two-room domain – or come to our poster!
26
Conclusions From Experiments eig-mp will always have =0 PVFs sometimes approximate subspace invariance (potentially useful because of stability issues w/eig-mp) PVF-MP dominates PVF because PVF ignores R BEBF will always have R=0 BEBF has a more steady/predictable reduction in BE Don’t ignore R!
27
Ground Covered Features: Training Data: (s,r,s’),(s,r,s’), (s,r,s’) Linear Model P – (k × k) R – (k × 1) Project dynamics into feature space (minimizing L 2 error in predicted next features) Linear Value Function V= w Solve for exact value Function given P , R Solve for linear fixed Point using linear TD, LSTD, etc. Bellman error of linear fixed point solution Reward error Per feature error Insight into feature selection! 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions 1020304050 0 0.2 0.4 0.6 0.8 1 1.2 Number of basis functions Total Bellman Error Reward Error Feature Error
28
Thank you! Also, special thanks to Jeff Johns, Sridhar Mahadevan, and Marek Petrik
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.