Adaptation of the Simulated Risk Disambiguation Protocol to a Discrete Setting ICAPS Workshop on POMDP, Classification and Regression: Relationships and.

Slides:



Advertisements
Similar presentations
Chapter 4 Partition I. Covering and Dominating.
Advertisements

Problem solving with graph search
Markov Decision Process
Introduction to Graph Theory Instructor: Dr. Chaudhary Department of Computer Science Millersville University Reading Assignment Chapter 1.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
Approximations of points and polygonal chains
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Decision Theoretic Planning
Applications of Single and Multiple UAV for Patrol and Target Search. Pinsky Simyon. Supervisor: Dr. Mark Moulin.
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
An Introduction to PO-MDP Presented by Alp Sardağ.
The Theory of NP-Completeness
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Visual Recognition Tutorial
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
MAKING COMPLEX DEClSlONS
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
Network Aware Resource Allocation in Distributed Clouds.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Binary Search From solving a problem to verifying an answer.
Dynamic Random Graph Modelling and Applications in the UK 2001 Foot-and-Mouth Epidemic Christopher G. Small Joint work with Yasaman Hosseinkashi, Shoja.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 An Arc-Path Model for OSPF Weight Setting Problem Dr.Jeffery Kennington Anusha Madhavan.
MAT 2720 Discrete Mathematics Section 8.2 Paths and Cycles
1 Random Disambiguation Paths Al Aksakalli In Collaboration with Carey Priebe & Donniell Fishkind Department of Applied Mathematics and Statistics Johns.
1 Optimization Techniques Constrained Optimization by Linear Programming updated NTU SY-521-N SMU EMIS 5300/7300 Systems Analysis Methods Dr.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Partially Observable Markov Decision Process and RL
All-pairs Shortest paths Transitive Closure
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Traveling Salesman Problems Motivated by Robot Navigation
6.5 Stochastic Prog. and Benders’ decomposition
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Curl and Divergence.
Computability and Complexity
Markov Decision Processes
James B. Orlin Presented by Tal Kaminker
Markov Decision Processes
ICS 353: Design and Analysis of Algorithms
Hidden Markov Models Part 2: Algorithms
GRAPH SPANNERS.
Discrete Event Simulation - 4
Unit 4: Dynamic Programming
CS 188: Artificial Intelligence Fall 2007
Introduction Wireless Ad-Hoc Network
Algorithms for Budget-Constrained Survivable Topology Design
traveling salesman problem
CPS 173 Computational problems, algorithms, runtime, hardness
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Chapter 6 Network Flow Models.
6.5 Stochastic Prog. and Benders’ decomposition
Reinforcement Learning Dealing with Partial Observability
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Reinforcement Learning (2)
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Adaptation of the Simulated Risk Disambiguation Protocol to a Discrete Setting ICAPS Workshop on POMDP, Classification and Regression: Relationships and Joint Utilization June 8, 2006 Cumbria, England Al Aksakalli, Donniell Fishkind, Carey Priebe Department of Applied Mathematics and Statistics Johns Hopkins University

Outline: Problem Description MDP and POMDP Formulations Adaptation of the Simulated Risk Protocol Computational Experiments

Problem Description: Spatial arrangement of detections: true detections , false detections

Spatial arrangement of detections: true detections , false detections Problem Description: Spatial arrangement of detections: true detections , false detections .29 .11 .72 .26 .61 .23 .39 We only see .59 .72 .89 .68 .83 .13 .64 Assume for all that is the probability that .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 .26 .61 .23 .39 .59 start s .72 .89 t destination .68 .83 .13 .64 .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .61 .23 .39 .59 s .72 .89 t .68 .83 .13 .64 .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .61 .23 ?? .39 .59 s We seek a continuous curve from to in of shortest achievable arclength ?? .72 .89 t .68 .83 .13 .64 .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .61 .23 .39 .59 s We seek a continuous curve from to in of shortest achievable arclength .72 .89 t .68 .83 .13 .64 …and we assume the ability to disambiguate detections from the boundary of their hazard regions. .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .61 .23 true .39 .59 s We seek a continuous curve from to in of shortest achievable arclength .72 .89 t .68 .83 .13 .64 …and we assume the ability to disambiguate detections from the boundary of their hazard regions. .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .61 .23 …or false .39 .59 s We seek a continuous curve from to in of shortest achievable arclength .89 t .68 .83 .13 .64 …and we assume the ability to disambiguate detections from the boundary of their hazard regions. .32 .27

Given start and destination Problem Description: Given start and destination .29 .11 .72 About each detection there is a hazard region , an open disk of fixed radius .26 .59 s We seek a continuous curve from to in of shortest achievable arclength .89 t .68 .83 .13 .64 …and we assume the ability to disambiguate detections from the boundary of their hazard regions. .32 .27 the rest of the transversal…

Definition: A disambiguation protocol is a function # disambiguations allowed cost per disambiguation which detection disambiguated next… …and where the disambiguation performed

Example 1: Protocol gives rise to the RDP Length=707.97, prob=.89670 Length=1116.19, prob=.10330

Example 2: Protocol gives rise to the RDP (superimposed composite)

Given , find protocol of minimum . Random Disambiguation Paths (RDP) Problem: Given , find protocol of minimum .

Related work: Canadian Traveller Problem (CTP): Graph theoretic RDP Given a finite graph – edges with specific probabilities of being traversable, and a starting and a destination vertex – each edge’s status is revealed only when one of the end points is visited: objective is to minimize expected traversal length Shown to be #P-hard

Markov Decision Process (MDP) formulation: Let be the information vector keeping track of the decision maker’s current knowledge; be the set of all possible disambiguation points RDP Problem can be cast as a K-stage finite horizon MDP with States: Actions: where v is a disambiguation point and i is a hazard region index Rewards: the negative of the shortest path distance between the state vertex and the action vertex minus c, if not going to d - d is an absorbative state for which there is a one-time and very large reward for entering Transitions: governed by ‘s

Partially Observable Markov Decision Process (POMDP) formulation: RDP problem can be cast as a POMDP by trimming the information vector to and folding the ambiguity of the hazards into ambiguity of the information vector, hence the partial observability of the state.

Risk Simulation Protocol: For purpose of deciding next disambiguation point, we pretend that ambiguous disks are riskily traversable… ? ? ? traversal ? ?

is the usual Euclidean length of . Risk Simulation Protocol: For purpose of deciding next disambiguation point, we pretend that ambiguous disks are riskily traversable… ? ? ? traversal ? ? is the usual Euclidean length of . is the surprise length of , which is the negative logarithm of the probability that is traversable in actuality.

Given undesirability function (henceforth, monotonically non-decreasing in its arguments) and, say,

Given undesirability function (henceforth, monotonically non-decreasing in its arguments) and, say, Definition: The simulated risk protocol is defined as dictating that the next disambiguation be at the first ambiguous point of . ? ? ? traversal ? ?

Given undesirability function (henceforth, monotonically non-decreasing in its arguments) and, say, Definition: The simulated risk protocol is defined as dictating that the next disambiguation be at the first ambiguous point of . ? ? ? traversal ? ? How to proceed once this disambiguation is performed: update and , decrement , and set the new s to be y.

How to navigate in this continuous setting: The Tangent Arc Graph (TAG) is the superimposition/subdivision of all visibility graphs generated by all subsets of disks. For any undesirability function, is an path in TAG !

Linear undesirability functions: Because of the efficiency in their realization, we will consider simulated risk protocols generated by linear undesirability functions for a chosen parameter . As a further shorthand, denote such a protocol by .

How (during the simulation of risk phase) can be affected by :

How (during the simulation of risk phase) can be affected by :

How (during the simulation of risk phase) can be affected by :

How (during the simulation of risk phase) can be affected by :

How (during the simulation of risk phase) can be affected by :

Example 1: Protocol gives rise to the RDP Length=707.97, prob=.89670 Length=1116.19, prob=.10330

(superimposed composite) Example 2: Protocol gives rise to the RDP (superimposed composite)

A discrete version of RDP (DRDP): Discretization via a subgraph of the integer lattice with unit edge lengths:

Adapting the simulated risk protocol to the lattice discretization: Again, consider a linear undesirability function: - u is the Euclidean length, - v is the surprise length ( ) Each edge in G is weighted with where 1 is the indicator function, and comp() is the number of connected components of its argument (Each time a hazard region intersects an edge, half of the surprise length is added to that edge’s weight)

Example: (Simulated risk protocol & RDP are computed effortlessly)

Computational experiments: A 40 by 20 integer lattice is used Each hazard region is a disk with radius 5.5 Disk centers sampled from a uniform distribution of integers in ‘s sampled from uniform distribution on (0,1) Cost of disambiguation is taken as 1.5 For each N, K combination, 50 different instances were sampled Optimal solutions found by solving the MDP model via value iteration

Illustration with N=7, K=1: Expected length:

Comparison of optimal versus simulated risk: Runtime to find overall optimal (SR-RDP runtime negligible) Simulated risk found the optimal solution 74% of the time Overall mean percentage error of simulated risk solutions was less than 1% For N=7, K=3; VI took more than an hour for N=10, K=1; VI did not run due to insufficient memory

Q & A