Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning Group June 10, 2005 Authors: Vikram Krishnamurthy & Robin Evans

Motivation Overview Multiarmed Bandits HMM Multiarmed Bandits Experimental Results Outline

ESA has only one steerable beam. The coordinates of each target evolve according to a finite state Markov chain. Question: which single target should the tracker choose to observe at each time instant in order to optimize some specified cost function? Motivation

Overview - How it works?

The Model One has N parallel projects, indexed i=1,2,…,N and at each instant of discrete time can work on only a single project. Let the state of project i at time k be denoted. If one works on project i at time k then one pays an immediate expected cost of. The state changes to by a Markov transition rule (which may depend upon i, but not upon t), while the state of the projects one has not touched remain unchanged: for.The problem is how to allocate one’s effort over projects sequentially in time so as to minimize expected total discounted cost. Multiarmed Bandits

Gittins Index Simplest non-trivial problem, classic No essential solution until Gittins and his co-workers. They proved that to each project i one could attach an index,,such that the optimal action at time k is to work on that project for which the current index is smallest. The index is calculated by solving the problem of allocating one’s effort optimally between project i and a standard project which yields a constant cost. Gittins’ result thus reduces the case of general N to that of the case N = 2.

HMM Multiarmed Bandits The “standard” multiarmed bandits problem involves a fully observed finite state Markov chain and is only a MDP with a rich structure. For the multitarget tracking, due to measurement noise at the sensor, the states are only partially observable. Thus, the multitarget tracking problem needs to be formulated as a multiarmed bandits involving HMMs (with the HMM filter to estimate the information state). Can be solved brute forcedly by POMDP, but it involves a much higher (enormous) dimensional Markov chain. Bandit assumption decouples the problem.

The information state of currently observed target updates by the HMM filter: For the other P-1 unobserved target, their information states are kept frozen: if target q is not observed Bandit Assumption

Why it is Valid? Slow Dynamics: slowly moving targets have a bandit structure. where Decoupling Approximation: without the bandit assumption, the optimal solution is intractable. Bandit model is perhaps the only reasonable approximation that leads to computationally tractable solution. Reinitialization: a compromise. Reinitialize the HMM multiarmed bandits at regular intervals with updated estimates from all targets.

Some details Finite State Markov Assumption: denotes the quantized distance of the p th target from base station, and the target distance evolves according to a finite-state Markov chain. Cost structure: typically depends on the distance of the p th target to the base station, i.e., the target gets close to the base station pose a greater threat and given higher priority by the tracking algorithm. Objective function:

Optimal Solution For the bandit assumption, the optimal solution has an indexable (decoupling) rule, that is, the optimization can be decoupled into P independent optimization problems. For each target p, there is a function (Gittins index). Solved by POMDP algorithms, see the next slide. The optimal scheduling policy at time k is to steer the beam toward the target with the smallest Gittins index

Gittins Index For arbitrary multiarmed bandits problem, the Gittins index can be calculated by solving an associated infinite horizon discounted control problem called the “return to state”. For the target p, given information state at time k, there are two actions: 1) Continue, which incurs a cost and evolves according to HMM filter; 2) Restart, which moves to a fixed information state, incurs a cost, and evolves according to HMM filter.

The Gittins index of the state of target p is given by where satisfies the Bellman equation:

POMDP solver Defining new parameters (see eq.15), Can be solved by any standard POMDP solver: such as sondik’s algorithm, witness algorithm, incremental-prune, or suboptimal (approximated) algorithms.

Experimental Results

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.

Similar presentations

Presentation on theme: "Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.

Similar presentations

Presentation on theme: "Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback