Download presentation
Presentation is loading. Please wait.
Published byHoward Banks Modified over 9 years ago
1
KRISHNA KALYANAM (INFOSCITEX CORP.) IN COLLABORATION WITH S. DARBHA (TAMU) P. P. KHARGONEKAR (UF, E-ARPA) M. PACHTER (AFIT/ENG) P. CHANDLER AND D. CASBEER (AFRL/RQCA) AFRL/RQCA UAV TEAM MEETING OCT 31, 2012 Optimal Min-max Pursuit Evasion on a Manhattan Grid
2
UGS Sensor Range UGS Communication Range Valid Intruder Path Scenario UAV Communication Range BASE 10/31/12 RQCA Conf. Rm. 2
3
Pursuit-Evasion Framework Pursuer engaged in search and capture of intruder on a Manhattan road network Intersections in road instrumented with Unattended Ground Sensors (UGSs) Pursuer has a 2x speed advantage over the evader Pursuer has no on-board sensing capability Evader triggers UGS and the event is time-stamped and stored in the UGS Pursuer interrogates UGSs to get evader location information Capture occurs when pursuer and evader are co- located at an UGS location 10/31/12 RQCA Conf. Rm. 3
4
Manhattan Grid (3 row corridor) All edges of the grid are of same length Purser arrives at node (t/c/b,0) with delay D>0 (time steps) behind the evader Evader dynamics - move North, East or South but cannot re-visit a node Pursuer actions - move North, East or South or Loiter/ Wait at current location Pursuer has a 2x speed advantage over the evader c 012 n b t 10/31/12 RQCA Conf. Rm. 4 D
5
Governing Equations 10/31/12 RQCA Conf. Rm. 5
6
Problem Framework Pose the problem as a Partially Observable Markov Decision Process (POMDP) unconventional POMDP since observations give delayed intruder location information with random time delays! Use observations to compute the set of possible intruder locations Dual control problem Pursuer’s action in addition to aiding capture also affects the future uncertainty associated with evader’s location (exploration vs. exploitation) 10/31/12 RQCA Conf. Rm. 6
7
Partial and delayed state information 10/31/12 RQCA Conf. Rm. 7
8
Optimization Problem 10/31/12 RQCA Conf. Rm. 8 t c b D012
9
Bellman recursion 10/31/12 RQCA Conf. Rm. 9
10
10/31/12 RQCA Conf. Rm. 10 Induction - Motivation c D012D-1DD-1D-210 single row: capture in exactly D steps T(D)=1+T(D-1);T(1)=1 => T(D) = D two rows: capture in exactly D+2 steps T(D)=1+T(D-1);T(1)=3 => T(D) = D+2 pursuer evader t b D D-1 D-2 1 0
11
A Feasible Policy (upper bound) t c b D012 10/31/12 RQCA Conf. Rm. 11
12
Bottom/Top row - delay 1 1 0 pursuer evader 01 10/31/12 RQCA Conf. Rm. 12
13
Bottom/Top row - delay 2 1 00122 10/31/12 RQCA Conf. Rm. 13
14
Center row - delay 1 1 1 001232 10/31/12 RQCA Conf. Rm. 14
15
Center row - delay 2 012340 2 2 1 1 10/31/12 RQCA Conf. Rm. 15
16
Bottom row - delay 3 10/31/12 RQCA Conf. Rm. 16 Center row - delay 3 t c b D012
17
10/31/12 RQCA Conf. Rm. 17 Specification of the policy μ Delay (D)SequenceMax Steps 1ENLNL5 2EN 2 L6 3EN 2 13 ≥4EN 2 ?D+10 Delay (D)SequenceMax Steps 1ENLS 2 11 2ENS 2 12 3ENSES13 ≥4??D+10 bottom row: center row:
18
Induction argument for D>=4 Basic step: T μ (r,3)=13 Induction hypothesis: 10/31/12 RQCA Conf. Rm. 18
19
10/31/12 RQCA Conf. Rm. 19 Specification of the policy μ Delay (D)SequenceMin-Max Steps 1ENLNL5 2EN 2 L6 ≥3EN 2 D+10 Delay (D)SequenceMin-Max Steps 1ENLS 2 11 2ENS 2 12 3ENSES13 ≥4E D-3 NSE 2 SD+10 bottom row: center row:
20
Center row, delay D>=4 10/31/12 RQCA Conf. Rm. 20 D k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 (D-3) moves E
21
Center row, delay D>=4 (contd.) D (D-3) moves E 2 k=0, k=D k=D+1 k=2D-4 k=2 k=4 k=2D-4 k=2D-2 k=2D+2 k=2D k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 21
22
Center row, delay D>=4 (contd.) D k=0, k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 22
23
Center row, delay D>=4 Bottom row, delay D>=4 D01k=D+1 D-2 k=4, k=D+2 k=0, k=D 10/31/12 RQCA Conf. Rm. 23
24
Lower Bound on Steps to capture 10/31/12 RQCA Conf. Rm. 24 t c b D012
25
Lower bound on optimal time to capture 10/31/12 RQCA Conf. Rm. 25
26
Optimal (min-max) Steps to Capture 10/31/12 RQCA Conf. Rm. 26
27
East is optimal at red UGS sketch of proof: 10/31/12 RQCA Conf. Rm. 27
28
Optimal trajectory There is an optimal trajectory, referred to as a ``turnpike”, which both the pursuer and the evader strive to reach and stay in, for most of the encounter. Here, the turnpike is the center row of the symmetric 3 row grid. The pursuer, after initially going east, if not already on the turnpike, immediately heads towards it. The evader initially heads to the turnpike, unless it is already on it, until the ``end game", whence it swerves and gets off the turnpike to avoid immediate capture. The pursuer stays on the turnpike, monitoring the delays, until he observers delay 1. At this point, he also executes the ``end game" maneuver, and captures the evader in exactly 11 more steps. RQCA Conf. Rm. 10/31/12 28
29
Summary Advantages Policy is dependent only on the delay at, and time elapsed since, the last red UGS (sufficient statistic?) Policy is optimal despite not relying on the entire information history of pursuer Disadvantages Policy is not in analytical form i.e., function from information state to action space (and so not extendable to other graphs) what is the intuition (exploration vs. exploitation, does separation exist?) Extension(s) Can policy be approximated by a feedback policy that minimizes suitable norm of the error (distance to evader + size of uncertainty) Capture can no longer be guaranteed (by a single pursuer) if number of rows exceeds 3 With 2 pursuers, capture can be guaranteed in D+4 steps on any number of rows (including infinity)! RQCA Conf. Rm. 10/31/12 29
30
Extras 10/31/12 RQCA Conf. Rm. 30
31
Center row, delay D>=4 (contd.) D k=0, k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 31 conservative bound: D-1+11=D+10 (see extra slide)
32
10/31/12 RQCA Conf. Rm. 32 D 0 k=0, k=D k=D+1 k=2D-4 k=2 k=4 k=2D-4 k=2D-2 k=2D k=2D-2 k=2D 01D-4D-3D-2D-11 steps to capture: D-1+3=D+2 conservative bound (per policy) = D-1+11=D+10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.