Stochastic Tracking of Humans Michael J. Black Department of Computer Science Brown University.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Probabilistic Tracking and Recognition of Non-rigid Hand Motion
Bayesian Reconstruction of 3D Human Motion from Single-Camera Video
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
1 Approximated tracking of multiple non-rigid objects using adaptive quantization and resampling techniques. J. M. Sotoca 1, F.J. Ferri 1, J. Gutierrez.
1 Schedule. May 4, Monday, May 4: –Lecture on tracking objects and people. Wednesday, May 6: –Lecture on writing papers and giving talks. Monday,
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking Hedvig Sidenbladh, KTH, Sweden (now FOI, Sweden) Michael J. Black, Brown University,
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Forward-Backward Correlation for Template-Based Tracking Xiao Wang ECE Dept. Clemson University.
Robust Object Tracking via Sparsity-based Collaborative Model
Contours and Optical Flow: Cues for Capturing Human Motion in Videos Thomas Brox Computer Vision and Pattern Recognition Group University of Bonn Research.
Oklahoma State University Generative Graphical Models for Maneuvering Object Tracking and Dynamics Analysis Xin Fan and Guoliang Fan Visual Computing and.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Formation et Analyse d’Images Session 8
Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Segmentation and Tracking of Multiple Humans in Crowded Environments Tao Zhao, Ram Nevatia, Bo Wu IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
A Study of Approaches for Object Recognition
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Object Detection and Tracking Mike Knowles 11 th January 2005
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Bayesian Filtering for Robot Localization
Markov Localization & Bayes Filtering
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Computer vision: models, learning and inference Chapter 19 Temporal models.
From Bayesian Filtering to Particle Filters Dieter Fox University of Washington Joint work with W. Burgard, F. Dellaert, C. Kwok, S. Thrun.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
A General Framework for Tracking Multiple People from a Moving Camera
Visual Tracking Conventional approach Build a model before tracking starts Use contours, color, or appearance to represent an object Optical flow Incorporate.
Scientific Writing Abstract Writing. Why ? Most important part of the paper Number of Readers ! Make people read your work. Sell your work. Make your.
Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black Department.
Encoding/Decoding of Arm Kinematics from Simultaneously Recorded MI Neurons Y. Gao, E. Bienenstock, M. Black, S.Shoham, M.Serruya, J. Donoghue Brown Univ.,
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Computer Vision Michael Isard and Dimitris Metaxas.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH Michael Black, Brown University.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Expectation-Maximization (EM) Case Studies
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
Rick Parent - CIS681 Motion Analysis – Human Figure Processing video to extract information of objects Motion tracking Pose reconstruction Motion and subject.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Tracking with dynamics
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Science of Silly Walks Hedvig Sidenbladh Michael J. Black Department of Computer Science Brown University Royal Inst.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA
Tracking Objects with Dynamics
Particle Filtering for Geometric Active Contours
Probabilistic Robotics
Dynamical Statistical Shape Priors for Level Set Based Tracking
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Filtering Things to take away from this lecture An image as a function
Analysis of Contour Motions
Filtering An image as a function Digital vs. continuous images
Tracking Many slides adapted from Kristen Grauman, Deva Ramanan.
Presentation transcript:

Stochastic Tracking of Humans Michael J. Black Department of Computer Science Brown University

Collaborators Hedvig Sidenbladh Royal Institute of Technology (KTH), Sweden Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University David Fleet Xerox PARC and Queen’s University Allan Jepson University of Toronto

Goal: 3D Human Motion * 3D articulated model * Perspective projection * Monocular sequence * Unknown, cluttered, environment * Infer 3D human motion from 2D image motion.

Overview * Why is 3D human motion important? * Why is recovering it hard? * A Bayesian approach * generative model * robust likelihood function * temporal prior model (learning) * stochastic search (particle filtering) * Where are we going? * Recent advances & state of the art. * What remains to be done?

Why is it Important? Applications Human-Computer Interaction Surveillance Motion capture (games and animation) Video search/annotation Work practice analysis. Social display of puzzlement * detect moving regions * estimate motion * model articulated objects * model temporal patterns of activity * interpret the motion

Why is it Hard? The appearance of people can vary dramatically. Bones and joints are unobservable (muscle, skin, clothing hide the underlying structure). (inference)

Why is it hard? People can appear in arbitrary poses. They can deform in complex ways. Occlusion results in ambiguities and multiple interpretations.

Why is it hard? Geometrically under-constrained.

Other Problems * non-linear dynamics of limbs * similarity of appearance of different limbs (matching ambiguities) * image noise * outliers Our models are approximations. Image changes that are not modeled (e.g. clothing deformation) will be outliers.

Common Assumptions * Multiple Cameras (additional constraints, occlusion) * Color Images (locate face and hands) * Known Background (background subtraction to locate person) * Batch process an entire sequence. * Known Initialization (to be avoided)

Requirements 1. Represent uncertainty and multiple hypotheses. 2. Model non-linear dynamics of the body. 3. Exploit image cues in a robust fashion. 4. Integrate information over time. 5. Combine multiple image cues.

Bayesian Inference Build models of human form and motion. Learn priors over model parameters: p(model) Exploit cues in the images. Model robust likelihoods: p(image cue | model) Represent the posterior distribution p(model | cue) p(cue | model) p(model)

Problems A simple articulated human model may have 30+ parameters (e.g. joint angles. 60+ w/ velocities). Models of human action are non-linear and likelihood models will be multi-modal. Key challenges Key challenges (common to other domains) representation, learning, and search in high dimensional spaces.

Bayesian Formulation Represent a distribution over 3D poses. * define generative model of image appearance * multi-modal posterior over model parameters - sampled representation - particle filtering approach. * focus on image motion as a cue (adding edges,…)

Generative Model: Shape * 3D Articulated Body Model * pinhole camera * parameter vector = 

Generative Model: Motion t-1 t Projection of image texture onto the 3D model Projection of model appearance into image coordinates

Appearance Model Could be many things * template (Cham & Rehg ‘99) * eigen-model (Sidenbladh, et al ‘00) * texture model * filter responses (edges, ridges, …) * learned over time Simple probabilistic model: Markov assumption

Noise Model Generative model: Mixture of Gaussian and uniform outlier distribution: Function of surface orientation

Generative Model: Temporal * general smooth motion or, * action-specific motion (walking) First order Markov assumption on angles, , and angular velocity, V: Explore two models of human motion

Bayesian Formulation Posterior over shape, velocity, and appearance given an image sequence. Likelihood of observing the image given the shape and appearance parameters Temporal model Posterior from previous time instant

Robust Likelihood For n random pixels from limb j compute: where

Temporal Model: Smooth Motion * individual angles and velocities assumed independent

What does the posterior look like? Shoulder: 3dof Elbow: 1dof Elbow bends

Particle Filtering * large literature (Gordon et al ‘93, Isard & Blake ‘96,…) * non-Gaussian posterior approximated by N discrete samples * explicitly represent the ambiguities * exploit stochastic sampling for tracking

Particle Filter sample sample normalize PosteriorTemporal dynamics LikelihoodPosterior

Arm Tracking: Smooth motion prior Particle filter * represents ambiguity * propagates information over time Display: expected value of joint angles.

Full-Body Tracking * parameter space too large * constrain posterior to valid 3D human motions. * learn generative models automatically from training data. time joint angles 3D motion-capture data: * segment into “movemes” (Bregler) * train probabilistic model. (from M. Gleicher)

Learning Temporal Models * Motion capture data is noisy, data is missing, activities are performed differently. * For cyclic motion (important but special class): 1. Detect cycles and segment 2. Account for missing data 3. Preserve continuity of cycles 4. Statistical model of variation * Approaches should generalize to non-cyclic motion. (Dirk Ormoneit & Trevor Hastie)

Detecting Cycles Automatically detect length of cycles, Automatically segment and align cycles.

Modeling Cyclic Motion Automatically align 3D data with a reference curve represented using periodically constrained regression splines.

Modeling Cyclic Motion * Iterative SVD method (from gene expression work) * computes SVD in Fourier domain * construct a rank-q approximation and take inverse Fourier transform * impute missing data from the approximation * repeat until convergence. * Segment into cycles, compute mean curve and represent variation by performing PCA on data. * SVD must enforce periodicity and cope with missing data.

Action-Specific Model The joint angles at time t are a linear combination of the basis motions evaluated at phase  Mean curveBasis curves  

Temporal Model: Walking Parameters of the generative model are now Probabilistic model for

Learned Walking Model * mean walker

Learned Walking Model * sample with small 

Learned Walking Model * sample with moderate 

Learned Walking Model * sample with very large 

Stochastic 3D Tracking Stochastic 3D tracking (manual initialization) Use motion information to update and track distribution over time

Stochastic 3D Tracking * significant changes in view and depth. * template-based methods will fail.

No likelihood * how strong is the walking prior? (or is our likelihood doing anything?)

Issues * Large parameter space * approx samples * sparsely represented * not real time * Flow-based models can drift * Requires initialization

Lessons Learned * Probabilistic (Bayesian) framework allows - integration of information over time - modeling of priors - explicit generative image model * Particle filtering allows - multi-modal distributions - tracking with ambiguities and non-linear models * Weak image cues necessitate strong priors and many samples.

Work to be done * better appearance model - other cues (Color, edges, appearance,…) * automatic initialization using 2D models * learn more general models of motion * better occlusion model (new) * model of the background motion (new) * better representations of the posterior (Fleet&Chou) * better sampling methods (Fleet&Ormoneit) * adapt shape of limbs

Very preliminary work…

The Statistics of People in Images and Video How do people appear in natural scenes? Want a general model. Edge Filters Ridge Filters

Statistics of Images Ruderman. Lee, Mumford, Huang. Portilla and Simoncelli. Olshausen & Field. Xu, Wu, & Mumford. … Learning Pon and Poff for edge detection and road following: Geman and Jednyak Konishi, Yuille, and Coughlan

Example Training Images

Distribution of Filter Responses

Ratios for different limbs

Local Contrast Normalization

Likelihood Foreground pixels Background pixels

Benefits Generic model of appearance. Principled way to chose filters. Model of foreground and background is incorporated into the tracking framework. exploits the ratio between foreground and background likelihoods. improves tracking. Done the same for ridges and motion.

Outlook 5 years: - Relatively reliable people tracking in monocular video. - Path is pretty clear. … solve the vision problem. Next step: Beyond person-centric - people interacting with object/world Beyond that: Recognizing action - goals, intentions,... … solve the AI problem.

Some Related Work * Bregler & Malik: image motion, single hypothesis, full-body required multiple cameras, scaled ortho. * Ju, Black, Yacoob: cardboard person model, image motion, 2D * Deutscher et al: Condensation, edge cues, background subtraction. * Cham& Rehg: known templates, 2D (SPM), particle filter. * Wachter & Nagel: nicely combines motion and edges, single hypothesis (Kalman filter). * Leventon & Freeman: assumes 2D tracking, probabilistic formulation, learned temporal model (full body, monocular, articulated)

Conclusions Bayesian formulation for tracking 3D human figures using monocular image information. * Generative model of image appearance. * Non-linear model represents ambiguities, singularities occlusion, etc - sampled representation of posterior. * Particle filtering for incremental estimation. * Automatic learning of cyclic motion prior. Rich framework for modeling the complexity of human motion.

Initialization Using 2D Model * Full-body walking model. * Constructed from 3D mocap data. * 2D, view-based (every 30 degrees) * 4 subjects, 14 cycles

2D, View-Based Walker * Construct linear optical flow basis * Use similar Bayesian framework for tracking (Black CVPR’99) * Coarse estimate of 3D parameters * Automatic initialization Example Bases:... 0 degrees 90 degrees

Recent Results * Box indicates mean position and scale. * Recovers distribution over phase and 3D scale.

Contrast Normalization Contrast Normalization Locally weight image derivatives by Global contrast normalization (Lee, Mumford & Huang)

Optimizing the Filters Chose contrast normalization to maximize detection accuracy ROC curve Battacharyya Kullback-Leibler

Local Contrast Normalization

Representing the Posterior represented by discrete set of N samples Normalized likelihood:

Condensation 1. Selection Sample from posterior at t-1 Most probable states selected most often. 2. Prediction. 3. Updating

1. Selection 2. Prediction/Diffusion (sample from ) ie from the temporal prior: 1. Compute 2. Sample from 3. Sample from 3. Updating Condensation

states p Condensation 1. Selection 2. Prediction/Diffusion (sample from ) Models the dynamics: 3. Updating

Condensation 1. Selection 2. Prediction 3. Updating (the distribution) Evaluate new likelihood. Repeat until N new samples have been generated. Compute normalized probability distribution.

Visualizing Results Expected value of state parameter

Likelihood * To cope with occluded limbs or those viewed at narrow angles, we introduce a probability of occlusion. * likelihood of observing limb j is then * likelihood of the model is product of limb likelihoods

Indexing/Search The crux of the problem. The parameter space is huge. Brute force search is infeasible (ditto discretely sampling the space). Need to index into correct part of the space. Use a hierarchy of models of increasing complexity Images Generic Models (expansion, rotation,…) Coarse Object Models (EigenPeople) Detailed Models (shape & activity) Compute likelihood Index w/ Jepson & Fleet.

Initialization * new spatially constrained mixture model * find appropriate mid-level representations * initialize high level models using mid-level cues

Digital Video Analysis Social display of puzzlement To automatically analyze such a sequence we must * detect moving regions * estimate and interpret the motion * model complex articulated objects such as humans * model temporal patterns of activity

Tracking Moving Structure Next steps * split/merge/kill/initialize/grow/shrink operations * probabilistic search for best interpretation of the scene * detect more complex structures (articulation)

Generative Model

Mouth Training Data * 3000 image training sequence * motion estimated between pairs of frames * utterances: “center”, “print”, “track”, “release”

Learned Spatial Model * 3 basis flow fields account for 85% of variance. * fewer needed for recognition than for accurate estimation.

Mouth Temporal Models

Mouth Results

Results

Mouth Results

Let be the image measurements at time t. Let be a sequence of measurements from 0 to t. Bayesian Formulation Let be a state. We want * not Gaussian. Measurement likelihood. Can’t represented in closed form. Temporal prior. Can be sampled

Generative Model (Brightness Constancy) Optical flow

Representing the Posterior represented by discrete set of S samples

Stochastic Search * Particle filtering (Condensation): 1. Sample from posterior at time t Predict using temporal prior. 3. Evaluate likelihood. * Predict non-Gaussian distribution over time. * Update posterior with new measurements. * Allocate computational resources to effectively explore the space.

Generative Model: Motion t-1 t

Learned Walking Model * sample with large 