Online Multi-camera Tracking with a Switiching State-Space Model Wojciech Zajdel, A. Taylan Cemgil, and Ben KrÄose ICPR 2004.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

SAMSI Discussion Session Random Sets/ Point Processes in Multi-Object Tracking: Vo Dr Daniel Clark EECE Department Heriot-Watt University UK.

Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Business Statistics for Managerial Decision

Reducing Drift in Parametric Motion Tracking

On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Point and Confidence Interval Estimation of a Population Proportion, p

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Lecture II-2: Probability Review

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Online Learning Algorithms

Chapter 6 Probability.

Maximum Likelihood Estimation

©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.

Pairs of Random Variables Random Process. Introduction  In this lecture you will study:  Joint pmf, cdf, and pdf  Joint moments  The degree of “correlation”

Random variables Petter Mostad Repetition Sample space, set theory, events, probability Conditional probability, Bayes theorem, independence,

EM and expected complete log-likelihood Mixture of Experts

Probability theory 2 Tron Anders Moger September 13th 2006.

Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

AP STATS: Take 10 minutes or so to complete your 7.1C quiz.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Probability Distributions

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

Topic 5 - Joint distributions and the CLT

Lecture 2: Statistical learning primer for biologists

Sampling and estimation Petter Mostad

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Page 0 of 7 Particle filter - IFC Implementation Particle filter – IFC implementation: Accept file (one frame at a time) Initial processing** Compute autocorrelations,

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

TEMPLATE DESIGN © Approximate Inference Completing the analogy… Inferring Seismic Event Locations We start out with the.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Stochastic Process - Introduction

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

Classification of unlabeled data:

Multimodal Learning with Deep Boltzmann Machines

Machine Learning Basics

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Introduction to particle filter

Clustering Using Pairwise Comparisons

More about Posterior Distributions

Introduction to particle filter

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Image Analysis

More Parameter Learning, Multinomial and Continuous Variables

LECTURE 07: BAYESIAN ESTIMATION

CS 416 Artificial Intelligence

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Presentation transcript:

Online Multi-camera Tracking with a Switiching State-Space Model Wojciech Zajdel, A. Taylan Cemgil, and Ben KrÄose ICPR 2004

Yk={Ok, Dk} …. Ok = Description of observation (colour vector in this case) – (assumed noisy) Dk = Camera no., time (assumed non-noisy) Yk={Ok, Dk} …. Networks of Non-Overlapping Cameras

Appearance is a Noisy Observation Assume observed appearance is a random sample from some distribution of possible (probable) appearances of an object. Represent this as a latent variable with mean and covariance; Xk={mk,Vk} We have a prior (X) over the parameters of this model (a Normal-Inverse Wishart distribution)

Appearance Model

Tracking is Just Association Tracking is just associating our Ds (camera,time) with a particular object i.e. {D1 (n), D2 (n),D3 (n), …} Defines a sequence of observations of person n over time. Also represent this information (redundantly) as; S k i.e. the label of person to which observation Y k is assigned N.B. For K observations there is a maximum of K possible people! (i.e. we dont know the people, but define potential people by each new observation)

But, how many actual people are there?? Weve said before the maximum for a sequence of K observations is K people. C k is the actual number of trajectories (people); C k <=K Related concept: Z k = index to last time person k was observed (can be NULL if first time person was observed)

Camera Network Topology Topology defines valid (or likely) paths through the network.. Defined (in a Markov like way) as: P (D i+1 (n) |D i (n) ) i.e. the probability that observation D i+1 results from object n, given observation D i does. In this paper is uniform over possible paths (and 0 for impossible paths).. But others have done more complex things.

A Predictive Model Rather than search through the space of possible associations and optimise some fitness measure, it is sometimes easier to define a predictive model and work backwards from the observations to estimate probabilities over the association variables; H k = {S k, C k, Z k (1),….. Z k (k) } … i.e. the association variables Association Variables Appearance Distributions Observations

Tracking as Filtering Once we have a predictive model we can filter data on a predict;observe;update cycle. This is (in some sense) an alternative to searching through possible latent variable values to maximise the posterior probability (e.g. MCMC as introduced by Krishna)… Only usually tractable under simplifying conditions, e.g. Kalman filter; Gaussian probabilities Particle Filter; Probabilities represented as a finite number of samples

Predictive Model: Predict Step Predictive density (i.e. without considering Yk, the latest observation): Current associations given previous; defined a-priori from topology Joint probability of latent variables (i.e. the unknowns)* (*possibly should be conditioned on past observations?) Current appearance given previous appearance and associations; defined based on appearance of a person not changing and sampling new people from a prior From the previous iteration (N.B. t0 is easy as there are no people)

Filtered density Prediction (from previous slide) i.e. probability of latent variables Probability of observation given latent parameters (i.e. associations + appearances) Normalising factor Probability of latent variables, given the current observation N.B. Latent variables H are discrete, whereas the variables X are continuous BUT: Result is a mixture of O(k!) density functions => intractable

How to filter?? If all latent variables were discrete (which they are not) we could maintain probabilities for all combinations of latent variable values (but this might be a lot!) We could use something like a particle filter to approximate the densities (others have done this, but this is not what these guys have done)

Their Solution Reformulate the filtered density using an approximation that is more tractable i.e. rather than maintaining a distribution over all of H (the possible associations.. Quite a big set potentially) a set of simpler distributions are maintained over S/C/Z at the current step (remember S= label of Xk, C=no. of trajectories, Z=last instance time). The product of these simpler distributions approximates the true filtered density Can go back to the original problem (i.e. estimating the complete H) by finding the product of marginals (more later!) Labels and count at current step (discrete) Appearance (continuous.. But assume simple distribution) Time of last observation of k (discrete)

Their Solution – Presented Differently (Technical Report Version) ftp://ftp.wins.uva.nl/pub/computer-systems/aut-sys/reports/IAS-UVA pdf (same thing, slightly different notation) NB. Appearance conditioned on theta (same form as parameters of the prior on appearance.. An Inverse Wishart density )

An Aside: Marginals and Product of Marginals Imagine a joint density over 2 variables x and y p(x,y) X Y P(X,Y) If variables x and y are (reasonably) independent, then we can marginalise over one of the variables (or the other) by summing over all values. X Y P(X) P(Y) n*m bins n+m bins => Weve removed the dependency & work with them separately…

We can then go back to the original representation by taking the product for each pair of values of x and y: P(x,y) = p(x)*p(y) Marginals and Product of Marginals

Results Method compared to; i) MCMC (similar idea to Krishnas presentation last week) ii) Multiple Hypothesis Tracking (i.e. a hypothesis pruning based method) It does better (others over-estimate no. of trajectories)

Drawbacks.. And solution K grows with number of observations and memory usage O(k 2 ).. Although complexity is only O(k) [I think] Pruning is used to keep this down (removing the least likely to be a trajectory end point)

Summary There is more than one way to skin a cat; ADF (this paper) – Approximating the problem, solving exactly MCMC – Exact problem, but approximating the solution (stochastic) MHT - Exact problem, but approximating the solution (via hypothesis pruning)