A General Framework for Tracking Multiple People from a Moving Camera Wongun Choi, Caroline Pantofaru, Silvio Savarese IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, July 2013
Overview Motivation Related Work Introduction Proposed Method Experiment Result Conclusion
Motivation 1.Final goal is tracking multiple people from a moving camera, including outdoor video scene and indoor video scene. 2.There are some challenge to solve: People have variety poses Complexity of the motion patterns of multiple people in the same scene Changeable scene and illumination effect
Related work Tracking by online learning : Learning appearance model [10],[5],[34],[7],[26] Color histogram and mean shift [10] Tracking with a moving camera: Probabilistic framework multiple detectors [42],[43] Stereo and graphical model [12],[13] [5] S. Avidan. Ensemble tracking. In PAMI, 2007 [7] C. Bibby and I. Reid. Robust real-time visual tracking using pixelwise posteriors. In ECCV, 2008 [10] D. Comaniciu and P. Meer. Mean shift:Arobust approach toward feature space analysis. In PAMI, 2002. [12] A. Ess, B. Leibe, K. Schindler, and L. van Gool. A mobile vision system for robust multi-person tracking. In CVPR, 2008. [13] A. Ess, B. Leibe, K. Schindler, and L. van Gool. Robust multi person tracking from a mobile platform. PAMI, 2009. [26] S. Kwak, W. Nam, B. Han, and J. Han. Learning occlusion with likelihoods for visual tracking. In ICCV, 2011 [34] D. Ramanan, D. Forsyth, and A. Zisserman. Tracking people by learning their appearance. PAMI, Jan. 2007. [42] C. Wojek, S. Walk, S. Roth, and B. Schiele. Monocular 3d scene understanding with explicit occlusion reasoning. In CVPR, 2011. [43] C. Wojek, S. Walk, and B. Schiele. Multi-cue onboard pedestrian detection. In CVPR, 2009
Introduction(1) To solve these issues proposed method: People have variety poses : Fusing multiple person detection method and some observations Complexity of the motion patterns of multiple people in the same scene Build a motion model that capture the interaction between targets Changeable scene and illumination effect Proposed a novel 3D model which explain the process of video generation
Introduction(2) Observation cues:
Introduction(3) Build 3D Model:
Introduction(4) Particle filter: 1.Def: posterior density estimation algorithms that estimate the posterior density of the state-space by directly implementing the Bayesian recursion equations 2.Using sampling for generating state distribution of posterior and using resampling To reconstruct the new distribution
Introduction(5) Reversible-Jump Markov Chain Monte Carlo(RJMCMC): A class of algorithms for sampling from probability distributions based on constructing a Markov chain which allows changes of the dimensionality of the state
Proposed Method System overview: 1.Using observation cues to generate detection hypotheses and an observation Model 2.Build a motion model account both for people’s unexpected motions as well as interactions between people 3. Sampling procedure for the RJ-MCMC tracker which include evaluation(resampling)
Proposed Method Model representation:
Proposed Method Using as random variables and model their relationship by joint posterior probability The tracking problem can formulate as finding maximum-a-posteri (MAP) Observation likelihood Motion model (transition model) Posterior at time t-1
Proposed Method Observation likelihood: Camera projection function:
Proposed Method Target Observation Likelihood: j:detectors wj: weight for detector j
Proposed Method Target Observation Likelihood: 1) pedestrian detector 2) upper body detector 3) target-specific detector based on appearance model 4) detector based on upper-body shape from depth 5) face detector 6) skin detector 7) motion detector
Proposed Method Pedestrian and upper body detector using HOG:
Proposed Method Face detector using OpenCV Viola-jones face detector:
Proposed Method Skin color detector using threshold on HSV color space:
Proposed Method Depth shape detector using world coordinate system:
Proposed Method Motion detector by project motion points into image plane and threshold:
Proposed Method Geometric Feature likelihood by interest point detector: is the uniform distribution
Proposed Method Motion prior:
Proposed Method Camera motion prior:
Proposed Method Target motion prior:
Proposed Method Existence prior:
Proposed Method Motion prior: Independent Interacting
Proposed Method Independent Motion prior : update
Proposed Method Interacting Motion prior: Mode variable
Proposed Method Repulsion: Group motion: Repulsion force
Proposed Method Tracking by Reversible Jump Markov Chain Monte Carlo Particle filtering: Sampling: Convert posterior problem:
Experimental result Using ETH dataset [12] Video frame rate ~14Hz Resolution 640*480 pixels
Experimental result Single frame detection accuracy via overlap ratio between the ground truth bounding box and tracked bounding box.
Experimental result
Conclusion Combine probabilistic model with joint variables Relationship between the camera, targets’ and geometric features Combine multiple cues adaptable to different sensor configurations and different environments Allowing people to interact Automatically detecting people