1 A Stochastic Pursuit-Evasion Game with no Information Sharing Ashitosh Swarup Jason Speyer Johnathan Wolfe School of Engineering and Applied Science UCLA
2 Introduction The game considered here is the LQG stochastic pursuit-evasion game. The game considered here is the LQG stochastic pursuit-evasion game. Deterministic version of this game was studied by Ho, Bryson and Baron. Deterministic version of this game was studied by Ho, Bryson and Baron. The case in which both players process their own noisy measurements was studied by Willman. The case in which both players process their own noisy measurements was studied by Willman. We continue investigating this class of games. We continue investigating this class of games.
3 Willman’s Approach Attempted to find strategies in which each player’s control is an assumed linear function of his entire observation history. Attempted to find strategies in which each player’s control is an assumed linear function of his entire observation history. Optimizing the cost function resulted in a set of implicit equations for the control gains. Optimizing the cost function resulted in a set of implicit equations for the control gains. No closed form solution shown for implicit equations; results were obtained numerically for up to 3 stages. No closed form solution shown for implicit equations; results were obtained numerically for up to 3 stages.
4 Our Objective Examine conditions under which closed form linear and/or nonlinear optimal solutions exist. Examine conditions under which closed form linear and/or nonlinear optimal solutions exist. Willman sets up an LQG problem and states an optimality result without proof. We use dynamic programming to derive conditions for optimal controllers. Willman sets up an LQG problem and states an optimality result without proof. We use dynamic programming to derive conditions for optimal controllers. If possible, eliminate the need to smooth over each player’s entire observation sequence (dimensionality constraint). If possible, eliminate the need to smooth over each player’s entire observation sequence (dimensionality constraint).
5 Problem Setup System Dynamics given by: System Dynamics given by: x(i+1)=x(i)+G p u(i)-G e v(i)+q(i) Subscripts p and e refer to pursuer and evader respectively. Subscripts p and e refer to pursuer and evader respectively. The pursuer’s and opponent’s controls are u and v respectively. The pursuer’s and opponent’s controls are u and v respectively. q is Gaussian white, (0,Q), x(0) is Gaussian, (x 0,P 0 ), statistics of q and x(0) a priori known to both players. q is Gaussian white, (0,Q), x(0) is Gaussian, (x 0,P 0 ), statistics of q and x(0) a priori known to both players.
6 Problem Setup (contd.) The players receive noisy measurements: z p (i)=H p x(i)+w p (i) z e (i)=H e x(i)+w e (i) Each player has no information about his opponent’s observation, but knows his opponent’s noise statistics. Each player has no information about his opponent’s observation, but knows his opponent’s noise statistics. w p Gaussian white, (0,R p ). w p Gaussian white, (0,R p ). w e Gaussian white, (0,R e ). w e Gaussian white, (0,R e ). Both players start off with common a priori estimate of the initial state x(0). Both players start off with common a priori estimate of the initial state x(0).
7 Problem Setup (contd.) Observation Histories: Observation Histories: Z p (i)= f z p (j), j=0,..,i g Z e (i)= f z e (j), j=0,..,i g Cost function: Cost function: J(u,v)=E [[S f x(n),x(n)]+ 0 n-1 ([Bu(i),u(i)]-[Cv(i),v(i)])] Pursuer minimizes the cost function while evader maximizes. Pursuer minimizes the cost function while evader maximizes.
8 Saddle Point Condition Finding optimal controls involves solving the following saddle-point inequality: Finding optimal controls involves solving the following saddle-point inequality: J(u,v o ) ¸ J(u o,v o ) ¸ J(u o,v) Optimize person-by-person by solving the following inequalities: Optimize person-by-person by solving the following inequalities: J(u o,v o ) ¸ J(u o,v) J(u,v o ) ¸ J(u o,v o )
9 The One-Stage Game Cost function: Cost function: J(u,v)=E [[S f x(1),x(1)]+[Bu(0),u(0)]-[Cv(0),v(0)]] Optimize to get expressions for u o (0) and v o (0). Optimize to get expressions for u o (0) and v o (0). Assume a linear functional form of the controls: Assume a linear functional form of the controls: u o (0)= u + u x 0 + u z p (0) v o (0)= v + v x 0 + v z e (0) Solving for the coefficients using the equations derived previously gives u = v =0, and nonzero values for the other matrix gains. Solving for the coefficients using the equations derived previously gives u = v =0, and nonzero values for the other matrix gains. An assumed nonlinear form of the optimal controls degenerates into the above linear controllers. An assumed nonlinear form of the optimal controls degenerates into the above linear controllers.
10 The Two Stage Game The cost function in this case is The cost function in this case is J 1 (u,v)=E[[S f x(2),x(2)]+ 0 1 [B i u(i),u(i)]-[C i v(i),v(i)]] Assume a linear form of the controls: Assume a linear form of the controls: u o (0)=k 0 +k 0 0 x 0 +k 0 0 z p (0); v o (0)=l 0 +l 0 0 x 0 +l 0 0 z e (0) u o (1)=k 1 +k 0 1 x 0 +k 1 0 z p (0)+k 1 1 z p (1) v o (1)=l 1 +l 0 1 x 0 +l 1 0 z e (0)+l 1 1 z e (1) Optimize cost function using dynamic programming to get expressions for u o (0), v o (0), u o (1) and v o (1). Optimize cost function using dynamic programming to get expressions for u o (0), v o (0), u o (1) and v o (1). Use the expressions derived for the optimal controls to get 14 equations for the 14 unknown control- coefficient matrices. Use the expressions derived for the optimal controls to get 14 equations for the 14 unknown control- coefficient matrices.
11 The Two Stage Problem Analytical Constraint Solving the equations for the control gains involves inverting a matrix with unknown elements. Solving the equations for the control gains involves inverting a matrix with unknown elements. Results in polynomial equations in the unknowns. Results in polynomial equations in the unknowns. Consider the scalar case first to extract properties of the system. Consider the scalar case first to extract properties of the system.
12 The Two Stage Game Properties of the Scalar Equations k 0 0, l 0 0, k 0 1, l 0 1, k 1 1 and l 1 1 are mutually dependent and do not depend on the other variables. k 0 0, l 0 0, k 0 1, l 0 1, k 1 1 and l 1 1 are mutually dependent and do not depend on the other variables. This reduces the number of equations we have to solve simultaneously from 14 to 6. This reduces the number of equations we have to solve simultaneously from 14 to 6. The other variables k 0, l 0, k 0 0, l 0 0, k 1, l 1, k 0 1 and l 0 1 depend on the above 6 variables, and can be solved for after solving the above 6 equations. The other variables k 0, l 0, k 0 0, l 0 0, k 1, l 1, k 0 1 and l 0 1 depend on the above 6 variables, and can be solved for after solving the above 6 equations.
13 The Two Stage Game Solving the Scalar Equations k 0 0 and l 0 0 can be eliminated by solving: k 0 0 and l 0 0 can be eliminated by solving: k 0 0 = p (k p1 +k p2 l 0 0 ) l 0 0 = e (k e1 +k e2 k 0 0 ) p, e, k p1, k p1, k e1, k e2 and l e2 are functions of k 0 1, l 0 1, k 1 1 and l 1 1. p, e, k p1, k p1, k e1, k e2 and l e2 are functions of k 0 1, l 0 1, k 1 1 and l 1 1. We thus need to solve 4 equations for the 4 variables from the final stage. We thus need to solve 4 equations for the 4 variables from the final stage.
14 The Two Stage Game Solving the Scalar Equations (contd.) As we go on to the final stage, we encounter polynomial equations of the form: As we go on to the final stage, we encounter polynomial equations of the form: k 0 1 =f p (l 0 1, k 1 1, l 1 1 ) l 0 1 =f e (k 0 1, l 1 1, k 1 1 ) Eliminate k 0 1 and l 0 1 from these equations and go on to solve the pair of equations for k 1 1 and l 1 1. Eliminate k 0 1 and l 0 1 from these equations and go on to solve the pair of equations for k 1 1 and l 1 1. Back-substitute values of k 1 1 and l 1 1 into previous equations to solve for remaining 4 variables. Back-substitute values of k 1 1 and l 1 1 into previous equations to solve for remaining 4 variables. We thus have a dynamic programming kind of approach for these 6 variables i.e. solve for variables from the final stage first and then solve for subsequent stages. We thus have a dynamic programming kind of approach for these 6 variables i.e. solve for variables from the final stage first and then solve for subsequent stages.
15 Conclusion and Future Work Even seemingly simple linear structures result in complex polynomial equations. Even seemingly simple linear structures result in complex polynomial equations. If analytical linear solutions exist in the scalar case, do nonlinear solutions exist? If analytical linear solutions exist in the scalar case, do nonlinear solutions exist? Is it possible to find analytical closed form solutions for the vector case? Is it possible to find analytical closed form solutions for the vector case? Can the need to smooth over the entire observation sequence be eliminated? Can the need to smooth over the entire observation sequence be eliminated?