Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Probabilistic Tracking and Recognition of Non-rigid Hand Motion
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
1 Approximated tracking of multiple non-rigid objects using adaptive quantization and resampling techniques. J. M. Sotoca 1, F.J. Ferri 1, J. Gutierrez.
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Visual Recognition Tutorial
Formation et Analyse d’Images Session 8
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.
Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.
Segmentation and Tracking of Multiple Humans in Crowded Environments Tao Zhao, Ram Nevatia, Bo Wu IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
1 Face Tracking in Videos Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Object Detection and Tracking Mike Knowles 11 th January 2005
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Today Introduction to MCMC Particle filters and MCMC
Particle Filtering for Non- Linear/Non-Gaussian System Bohyung Han
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Data-Driven Markov Chain Monte Carlo Presented by Tomasz MalisiewiczTomasz Malisiewicz for Advanced PerceptionAdvanced Perception 3/1/2006.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Particle Filtering in Network Tomography
Multimodal Interaction Dr. Mike Spann
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.
Object Tracking using Particle Filter
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Computer vision: models, learning and inference Chapter 19 Temporal models.
A General Framework for Tracking Multiple People from a Moving Camera
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Particle Filters.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
Stable Multi-Target Tracking in Real-Time Surveillance Video
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Expectation-Maximization (EM) Case Studies
JetStream: Probabilistic Contour Extraction with Particles Patrick Perez, Andrew Blake, and Michel Gangnet, Microsoft Research, St George House, 1 Guildhall.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
Tracking with dynamics
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Automatic 3D modelling of Architecture Anthony Dick 1 Phil Torr 2 Roberto Cipolla 1 1 Department of Engineering 2 Microsoft Research, University of Cambridge.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Marked Point Processes for Crowd Counting
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Particle Filtering for Geometric Active Contours
Motion Detection And Analysis
Probabilistic Robotics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Outline Image Segmentation by Data-Driven Markov Chain Monte Carlo
Marked Point Processes for Crowd Counting
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Video Compass Jana Kosecka and Wei Zhang George Mason University
Presentation transcript:

Using Multi-Modality to Guide Visual Tracking Jaco Vermaak Cambridge University Engineering Department Patrick Pérez, Michel Gangnet, Andrew Blake Microsoft Research Cambridge Paris, December 2002

Introduction  Visual tracking difficult: changes in pose and illumination, occlusion, clutter, inaccurate models, high-dimensional state spaces, etc.  Tracking can be aided by combining information in multiple measurement modalities  Illustrated here on head tracking using:  Sound and contour measurements  Colour and motion measurements

General Tracking

Tracking Equations  Objective : recursive estimation of the filtering distribution:  General solution:  Prediction step:  Filtering / update step:  Problem : generally no analytic solutions available

Particle Filter Tracking  Monte Carlo implementation of general recursions.  Filtering distribution represented by samples / particles with associated importance weights:  Proposal step : new particles proposed from a suitable proposal distribution:  Reweighting step: particles reweighted with importance weights:  Resampling step : multiply particles with high importance weights and eliminate those with low importance weights.

Particle Filter Building Blocks  Sampling from conditional density  Resampling  Reweighting with positive function

Particle Filter Implementation Requires specification of:  System configuration and state space  Likelihood model  Dynamical model for state evolution  State proposal distribution  Particle filter architecture

Head Tracking using Sound and Contour Measurements

Problem Formulation  Objective : track the head of a person in a video sequence using audio and image cues  Audio : time delay of arrival ( TDOA ) measurements at microphone pair orthogonal to optical axis of camera  Image : edge events along normal lines to a hypothesised contour  Complimentary modalities: audio good for (re)initialisation ; image good for fine localisation

System Configuration image plane camera microphone pair

Model Ingredients  Low-dimensional state space : similarity transform applied to a reference template  Dynamical prior : integrated Langevin equation, i.e. second-order Markov kernel  Multi-modal data likelihoods :  Sound based likelihood: TDOA at mic. pair  Contour based likelihood: edge events

Contour Likelihood  Input : maxima of projected luminance gradient along normals ( such events on normal)

Contour Likelihood  Advantages  Low computational cost  Robust to illumination changes  Drawbacks  Fragile because of narrow support (especially with only similarity transform on a fixed shape space)  Sensitive to background clutter  Extension  Multiply gradient by inter-frame difference to reduce influence of background clutter

Inter-Frame Difference Without frame differenceWith frame difference

Audio Likelihood  Input : positions of peaks in generalised cross-correlation function ( GCCF )  Reverberation leads to multiple peaks TDOA GCCF TDOA

Audio Likelihood  Deterministic mapping from Time Delay of Arrival ( TDOA ) to bearing angle ( microphone calibration ) to X-coordinate in image plane ( camera calibration )  Audio likelihood follows in similar manner to contour likelihood  Likelihood assumes a uniform clutter model

Particle Filter Architecture  Layered sampling : first X-position and sound likelihood; then rest  X-position proposal : mixture of diffusion dynamics and sound proposal:  To admit “ jumps ” from proposal X-dynamics have to be augmented with an uniform component :

Examples  Effect of inter-frame difference:  Conversational ping-pong:

Examples  Conversational ping-pong and sound based reinitialisation:

Head Tracking using Colour and Motion Measurements

Problem Formulation  Objective : detect and track the head of a single person in a video sequence taken from a stationary camera  Modality fusion:  Motion and colour measurements are complementary  Motion : when the object is moving colour is unreliable  Colour : when the object is stationary motion information disappears  Automatic object detection and tracker initialisation using motion measurements  Individualisation of the colour model to the object:  Initialised with a generic skin colour model  Adapted to object colour during periods of motion: motion model acts as “anchor”

Object Description and Motion  Head modelled as an ellipse that is free to translate and scale in the image  Binary indicator variable to signal whether object is present in the image or not, so object state becomes:  State components assumed to have independent motion models  Indicator: discrete Markov chain  Position and scale: Langevin motion with uniform initialisation:

Image Measurements  Measurements taken on a regular filter grid :  Measurement vector: hue image saturation image frame-difference image isotropic Gaussian filters

Observation Likelihood Model  Measurements at gridpoints assumed to be independent  Unique background (object absent) likelihood model for each gridpoint  All gridpoints covered by the object share the same foreground likelihood model:  At each gridpoint the measurements are also assumed to be independent:  Note that the background motion model is shared by all the gridpoints

Colour Likelihood Model  Normalised histograms for both foreground and background colour likelihood models:  Background models trained on a sequence without objects  Foreground models trained on a set of labelled face images  Histogram models supplied with a small uniform component to prevent numerical problems associated with empty bins

Motion Likelihood Model  Background frame-difference measurements empirically found to be gamma distributed:  Foreground frame-difference depends on magnitude of motion, number and orientation of foreground edges, etc.  Modelling these effects accurately is difficult  In general: if the object is moving foreground frame-difference measurements are substantially larger than those for background  Thus a two-component uniform distribution is adopted for the foreground frame-difference measurements ( outlier model )

Particle Proposal  Three stages of operation:  Birth : object first enters scene; proposal should detect object and spawn particles in the object region  Alive : object persists in scene; proposal should allow object to be tracked, whether it is stationary or moves around  Death : object leaves scene; proposal should kill particles associated with the object  Form of particle proposal: empirical probability of object being alive

Particle Proposal  Indicator proposal:  Birth only allowed if there is no object currently in the scene  All particles alive are subjected to a fixed death probability  State proposal:  Langevin dynamics if object is alive  Gaussian birth proposal: parameters from detection module

Object Detection  Object region detected by probabilistic segmentation of the horizontal and vertical projections of the frame-difference measurements:  Region location and size determine parameters for birth proposal distribution

Colour Model Adaptation  Why:  Generic skin colour model may be too broad for accurate localisation  Model sensitive to colour changes due to changes in pose and illumination  When:  Object present and moving : largest variations in colour expected  Motion likelihood “anchors” particles around moving object  How:  Gradual: avoid fitting to the background: enforced with prior  Stochastic EM: contribution of particles proportional to likelihood

Colour Model Adaptation  Unknown parameters: normalised bin values for object hue and saturation histograms  EM Q-function for MAP estimation:  No analytic solution but particle approximation yields:  Monte Carlo approximation only performed over particles that are currently alive

Colour Model Adaptation  Dirichlet prior used for parameter updates:  Prior centred on old parameter values  Variance controlled by multiplicative constant  Update rule for normalised bin counts becomes:

What Happens? particle histograms weighted average histogram

Implementation  Colour model adaptation iterations occur between particle prediction and particle reweighting in standard particle filter  Stochastic EM algorithm initialised with parameters from previous time step  A single stochastic EM iteration is sufficient at each time step  Number of particles is fixed to 100  Non-optimised algorithm runs at 15fps on standard desktop PC

Examples No adaptation: tracker gets stuck on skin- coloured carpet in the background Adaptation: tracker successfully adapts to changes in pose and illumination and lock is maintained No motion likelihood: tracker fails, illustrating need for “anchor” likelihood

Examples Tracking is successful despite substantial variations in pose and illumination and the subject temporarily leaving the scene Particles are killed when the subject leaves the scene; upon re-entering the individualised colour model allows lock to be re-established within a few frames

The End