Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)
Overview Why? –MPDs need value functions –Value function approximation –“Good” approximate value functions How –Approximation architecture –Dot products in large state spaces –Expectation in large state spaces –Orthogonal projection –MAX in large state spaces
Why You Need Value Functions Given current configuration: Expected value of all widgets produced by factory Expected number of steps before failure
DBN - MDPs X Y Z State Variables Timett+1 Action
Adding rewards X Y Z tt+1 R1R1 Reward have small sets of parent variables too Total reward adds sub-rewards: R=R 1 +R 2 R2R2
Computing Values Value Function Symbolic transition model (DBN) Q: Does V have a convenient, compact form?
Compact Models = Compact V? X Y Z t t+1 R=+1 t+2t+3
Enter Value Function Approximation Not enough structure for exact, symbolic methods in many domains Our approach: –Combine symbolic methods with VFA –Define a restricted class of value functions –Find the “best” in that class –Bound error
Linearly Decomposable Value Functions Approximate high-dimensional functions with a combination of lower-dimensional functions Motivation: Multi-attribute utility theory (Keeney & Raifa) Note: Overlapping is allowed!
Decomposable Value Functions Each has a domain of a small set of variables Each a feature of a complex system –status of a machine –inventory of a store Also: think of each as a basis function Linear combination of functions:
Matrix Form Note for linear Algebra fans: is a linear function in the column space of h 1 …h k K basis functions states h 1 (s1) h 2 (s1)... h 1 (s2) h 2 (s2)…. A= assigns a value to every state
Defining a fixed point Standard fixed point equation Projection operator Fixed point With approximation We use orthogonal projection to force V to have the desired form.
Solving for the fixed point Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00] Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods. LSTD[Bratdke & Barto 96] O(k 2 n)
Key Operations Backprojection of a basis function: Dot product of two restricted domain basis functions If these two operations can be done efficiently: kx1kxk Solution Cost for k basis functions: matrix inversion kxk
Backprojection = 1-step Expectation Important: Single step of lookahead only - no more X Y Z
Efficient dot product Need to compute: e.g.: h 1 = f(x), h 2 = f(y)
Symbolic Linear VFA Incurs only 1 step worth representation blowup –Solve directly for fixed point –Contrast with bisimulation/structured DP Exact Iterative – representation grows with each step No a priori quality guarantees a posteriori quality guarantees
Error Bounds How are we doing?: Claim: Equivalent to maximizing sum of restricted domain functions Use a cost network (Dechter 99) (one-step lookahead expected value) (max one-step error)
Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72] Cost Networks A D BC As in Bayes nets, maximization is exponential in size of largest factor. NP-hard in general Here we need only 16, instead of 64 sum operations.
Checkpoint Starting with: –Factored model (DBN) –Restricted value function space (restricted domain basis functions) Find fixed point in restricted space Bound solution quality a posteriori But: Fixed point may not have lowest max norm error
Max-norm Error Minimization General max-norm error minimization Symbolic operation over large state spaces
Algorithm for finding: General Max-norm Error Minimization Solve by Linear Programming : [Cheney ’82]
Symbolic max-norm minimization For fixed weights w, compute max-norm: However, if basis and target are functions of only a few variables, we can do it efficiently! Cost Networks can maximize over large state spaces efficiently when function is factored:
Representing the Constraints Explicit representation is exponential (|S|=2 n ) : If basis and target are factored, can use Cost Networks to represent the constraints:
Conclusions Value function approximation w/error bounds Symbolic operations (no sampling!) Methods over large state spaces –Orthogonal Projection –Max-norm error minimization Tools over large state spaces –Expectation –Dot product –Max