KRISHNA KALYANAM (INFOSCITEX CORP.) IN COLLABORATION WITH S. DARBHA (TAMU) P. P. KHARGONEKAR (UF, E-ARPA) M. PACHTER (AFIT/ENG) P. CHANDLER AND D. CASBEER.

Slides:

Advertisements

Similar presentations

AI Pathfinding Representing the Search Space

Advertisements

On-Line Discovery of Hot Motion Paths D. Sacharidis 1, K. Patroumpas 1, M. Terrovitis 1, V. Kantere 1, M. Potamias 2, K. Mouratidis 3, T. Sellis 1 1 National.

Marzieh Parandehgheibi

Markov Decision Process

Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.

Partially Observable Markov Decision Process (POMDP)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

Delay bounded Routing in Vehicular Ad-hoc Networks Antonios Skordylis Niki Trigoni MobiHoc 2008 Slides by Alex Papadimitriou.

A Survey on Tracking Methods for a Wireless Sensor Network Taylor Flagg, Beau Hollis & Francisco J. Garcia-Ascanio.

Decision Theoretic Planning

Branch & Bound Algorithms

CSE 380 – Computer Game Programming Pathfinding AI

SB Research Presentation – 12/2/05 Finding Rectilinear Least Cost Paths in the Presence of Convex Polygonal Congested Regions # Avijit Sarkar School of.

Infinite Horizon Problems

Planning under Uncertainty

1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.

The Cat and The Mouse -- The Case of Mobile Sensors and Targets David K. Y. Yau Lab for Advanced Network Systems Dept of Computer Science Purdue University.

NUS CS5247 A Visibility-Based Pursuit-Evasion Problem Leonidas J.Guibas, Jean-Claude Latombe, Steven M. LaValle, David Lin, Rajeev Motwani. Computer Science.

Visibility-based Motion Planning Lecture slides for COMP presented by Georgi Tsankov.

Active SLAM in Structured Environments Cindy Leung, Shoudong Huang and Gamini Dissanayake Presented by: Arvind Pereira for the CS-599 – Sequential Decision.

Reinforcement Learning

Mobility Improves Coverage of Sensor Networks Benyuan Liu*, Peter Brass, Olivier Dousse, Philippe Nain, Don Towsley * Department of Computer Science University.

Problem Statement Given a control system where components, i.e. plant, sensors, controllers, actuators, are connected via a communication network, design.

LSRP: Local Stabilization in Shortest Path Routing Hongwei Zhang and Anish Arora Presented by Aviv Zohar.

MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002

Presented by David Stavens. Autonomous Inspection Compute a path such that every point on the boundary of the workspace can be inspected from some point.

1 Energy-Efficient localization for networks of underwater drifters Diba Mirza Curt Schurgers Department of Electrical and Computer Engineering.

Data Flow Analysis Compiler Design Nov. 3, 2005.

Chess Review May 11, 2005 Berkeley, CA Tracking Multiple Objects using Sensor Networks and Camera Networks Songhwai Oh EECS, UC Berkeley

AFOSR MURI. Salem, MA. June 4, /10 Coordinated UAV Operations: Perspectives and New Results Vishwesh Kulkarni Joint Work with Jan De Mot, Sommer.

PEG Breakout Mike, Sarah, Thomas, Rob S., Joe, Paul, Luca, Bruno, Alec.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Dr. Shankar Sastry, Chair Electrical Engineering & Computer Sciences University of California, Berkeley.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Department of Computer Science Undergraduate Events More

Department of Computer Science Undergraduate Events More

Robotic Sensor Networks: from theory to practice CSSE Annual Research Review Sameera Poduri.

Reinforcement Learning (1)

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

K. KRISHNA, D. CASBEER, P. CHANDLER & M. PACHTER AFRL CONTROL SCIENCE CENTER OF EXCELLENCE MACCCS REVIEW APRIL 19, 2012 UAV Search & Capture of a Moving.

Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.

Online Algorithms By: Sean Keith. An online algorithm is an algorithm that receives its input over time, where knowledge of the entire input is not available.

Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

MDPs (cont) & Reinforcement Learning

a/b/g Networks Routing Herbert Rubens Slides taken from UIUC Wireless Networking Group.

Hybrid Systems Controller Synthesis Examples EE291E Tomlin/Sastry.

Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

I owa S tate U niversity Laboratory for Advanced Networks (LAN) Coverage and Connectivity Control of Wireless Sensor Networks under Mobility Qiang QiuAhmed.

Beard & McLain, “Small Unmanned Aircraft,” Princeton University Press, 2012, Chapter 12: Slide 1 Chapter 12 Path Planning.

EUROCONTROL RESEARCH CENTRE1 21/11/2016 An algorithmic approach to air path computation Devan SOHIER LDCI-EPHE (Paris) 23/11/04.

Optimal Acceleration and Braking Sequences for Vehicles in the Presence of Moving Obstacles Jeff Johnson, Kris Hauser School of Informatics and Computing.

Delay-Tolerant Networks (DTNs)

Introduction to Spatial Computing CSE 5ISC

Reinforcement learning

CO Games Development 1 Week 8 Depth-first search, Combinatorial Explosion, Heuristics, Hill-Climbing Gareth Bellaby.

Humanoid Motion Planning for Dual-Arm Manipulation and Re-Grasping Tasks Nikolaus Vahrenkamp, Dmitry Berenson, Tamim Asfour, James Kuffner, Rudiger Dillmann.

Reinforcement Learning Dealing with Partial Observability

Presentation transcript:

KRISHNA KALYANAM (INFOSCITEX CORP.) IN COLLABORATION WITH S. DARBHA (TAMU) P. P. KHARGONEKAR (UF, E-ARPA) M. PACHTER (AFIT/ENG) P. CHANDLER AND D. CASBEER (AFRL/RQCA) AFRL/RQCA UAV TEAM MEETING OCT 31, 2012 Optimal Min-max Pursuit Evasion on a Manhattan Grid

UGS Sensor Range UGS Communication Range Valid Intruder Path Scenario UAV Communication Range BASE 10/31/12 RQCA Conf. Rm. 2

Pursuit-Evasion Framework Pursuer engaged in search and capture of intruder on a Manhattan road network Intersections in road instrumented with Unattended Ground Sensors (UGSs) Pursuer has a 2x speed advantage over the evader Pursuer has no on-board sensing capability Evader triggers UGS and the event is time-stamped and stored in the UGS Pursuer interrogates UGSs to get evader location information Capture occurs when pursuer and evader are co- located at an UGS location 10/31/12 RQCA Conf. Rm. 3

Manhattan Grid (3 row corridor) All edges of the grid are of same length Purser arrives at node (t/c/b,0) with delay D>0 (time steps) behind the evader Evader dynamics - move North, East or South but cannot re-visit a node Pursuer actions - move North, East or South or Loiter/ Wait at current location Pursuer has a 2x speed advantage over the evader c 012 n b t 10/31/12 RQCA Conf. Rm. 4 D

Governing Equations 10/31/12 RQCA Conf. Rm. 5

Problem Framework Pose the problem as a Partially Observable Markov Decision Process (POMDP)  unconventional POMDP since observations give delayed intruder location information with random time delays!  Use observations to compute the set of possible intruder locations Dual control problem  Pursuer’s action in addition to aiding capture also affects the future uncertainty associated with evader’s location (exploration vs. exploitation) 10/31/12 RQCA Conf. Rm. 6

Partial and delayed state information 10/31/12 RQCA Conf. Rm. 7

Optimization Problem 10/31/12 RQCA Conf. Rm. 8 t c b D012

Bellman recursion 10/31/12 RQCA Conf. Rm. 9

10/31/12 RQCA Conf. Rm. 10 Induction - Motivation c D012D-1DD-1D-210 single row: capture in exactly D steps T(D)=1+T(D-1);T(1)=1 => T(D) = D two rows: capture in exactly D+2 steps T(D)=1+T(D-1);T(1)=3 => T(D) = D+2 pursuer evader t b D D-1 D-2 1 0

A Feasible Policy (upper bound) t c b D012 10/31/12 RQCA Conf. Rm. 11

Bottom/Top row - delay pursuer evader 01 10/31/12 RQCA Conf. Rm. 12

Bottom/Top row - delay /31/12 RQCA Conf. Rm. 13

Center row - delay /31/12 RQCA Conf. Rm. 14

Center row - delay /31/12 RQCA Conf. Rm. 15

Bottom row - delay 3 10/31/12 RQCA Conf. Rm. 16 Center row - delay 3 t c b D012

10/31/12 RQCA Conf. Rm. 17 Specification of the policy μ Delay (D)SequenceMax Steps 1ENLNL5 2EN 2 L6 3EN 2 13 ≥4EN 2 ?D+10 Delay (D)SequenceMax Steps 1ENLS ENS ENSES13 ≥4??D+10 bottom row: center row:

Induction argument for D>=4 Basic step: T μ (r,3)=13 Induction hypothesis: 10/31/12 RQCA Conf. Rm. 18

10/31/12 RQCA Conf. Rm. 19 Specification of the policy μ Delay (D)SequenceMin-Max Steps 1ENLNL5 2EN 2 L6 ≥3EN 2 D+10 Delay (D)SequenceMin-Max Steps 1ENLS ENS ENSES13 ≥4E D-3 NSE 2 SD+10 bottom row: center row:

Center row, delay D>=4 10/31/12 RQCA Conf. Rm. 20 D k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 (D-3) moves E

Center row, delay D>=4 (contd.) D (D-3) moves E 2 k=0, k=D k=D+1 k=2D-4 k=2 k=4 k=2D-4 k=2D-2 k=2D+2 k=2D k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 21

Center row, delay D>=4 (contd.) D k=0, k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 22

Center row, delay D>=4 Bottom row, delay D>=4 D01k=D+1 D-2 k=4, k=D+2 k=0, k=D 10/31/12 RQCA Conf. Rm. 23

Lower Bound on Steps to capture 10/31/12 RQCA Conf. Rm. 24 t c b D012

Lower bound on optimal time to capture 10/31/12 RQCA Conf. Rm. 25

Optimal (min-max) Steps to Capture 10/31/12 RQCA Conf. Rm. 26

East is optimal at red UGS sketch of proof: 10/31/12 RQCA Conf. Rm. 27

Optimal trajectory There is an optimal trajectory, referred to as a ``turnpike”, which both the pursuer and the evader strive to reach and stay in, for most of the encounter. Here, the turnpike is the center row of the symmetric 3 row grid. The pursuer, after initially going east, if not already on the turnpike, immediately heads towards it. The evader initially heads to the turnpike, unless it is already on it, until the ``end game", whence it swerves and gets off the turnpike to avoid immediate capture. The pursuer stays on the turnpike, monitoring the delays, until he observers delay 1. At this point, he also executes the ``end game" maneuver, and captures the evader in exactly 11 more steps. RQCA Conf. Rm. 10/31/12 28

Summary Advantages  Policy is dependent only on the delay at, and time elapsed since, the last red UGS (sufficient statistic?)  Policy is optimal despite not relying on the entire information history of pursuer Disadvantages  Policy is not in analytical form i.e., function from information state to action space (and so not extendable to other graphs)  what is the intuition (exploration vs. exploitation, does separation exist?) Extension(s)  Can policy be approximated by a feedback policy that minimizes suitable norm of the error (distance to evader + size of uncertainty)  Capture can no longer be guaranteed (by a single pursuer) if number of rows exceeds 3  With 2 pursuers, capture can be guaranteed in D+4 steps on any number of rows (including infinity)! RQCA Conf. Rm. 10/31/12 29

Extras 10/31/12 RQCA Conf. Rm. 30

Center row, delay D>=4 (contd.) D k=0, k=D k=D+1 k=2D-4 k=2D+2 k=2D k=2D-201D-4D-3D-2D-1 10/31/12 RQCA Conf. Rm. 31  conservative bound: D-1+11=D+10 (see extra slide)

10/31/12 RQCA Conf. Rm. 32 D 0 k=0, k=D k=D+1 k=2D-4 k=2 k=4 k=2D-4 k=2D-2 k=2D k=2D-2 k=2D 01D-4D-3D-2D-11 steps to capture: D-1+3=D+2 conservative bound (per policy) = D-1+11=D+10