Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2012
Outline Motivation Introduction Proposed method Experimental results Conclusion
Motivation Goal: tracking single or multiple pedestrians in crowd scenes Solve conventional tracking problems -Occlusion problem -Pedestrians move in of different directions -Appearance change
Introduction(1) Observe a phenomenon
Observation Small area of instantaneous motions tend to repeat -Temporal -Spatial
Introduction(2) Spatio-temporal motion pattern -Describe crowd motion -Build a Spatial and temporal statistical model -Use to predict movement of individuals
Spatio-temporal motion pattern t y x
3D gradient vector: Calculate the mean motion vector or build a statistical model at each cuboid
Introduction(3) Hidden Markov Model: -States are not directly visible -Compromise of three components observation probabilities transition probabilities initial probabilities
Introduction(4) Posterior distribution: given confidence X find probability of parameters
Introduction(5) Particle filter: is a filter which can be used to predict next state -different from kalman filter: Robust to non linear system and can handle non Gaussian noise -Measurement:
Proposed method
Flow chart
(a) Divide the training video into spatio-temporal cuboids and calculate motion vectors, and then build statistical model for each motion patterns (b) Train a collection of hidden Markov models (c) Use observed local motion patterns to predict the motion patterns at each location (d) Use this predicted motion patterns to trace individuals
Step (a)-statistical model for motion patterns 1.First we calculate the motion vector at each pixel by 3D gradient vector 2.Next we build a statistical model by 3D Gaussian distribution
3. Define the local spatio-temporal pattern at location n and frame t
Step (b)-train hidden Markov models 1. By clustering algorithm, divide motion patterns into S clusters 2. Define states{s=1,…,S},and S is the number of clusters 3. For a specific hidden state s, the probability of an observed motion pattern is: Calculate variance between two distributions
Step(c)- predict motion patterns Taking expected value of the predictive distribution: Solve by forwards-backwards algorithm Reference: [23] L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”Proc. IEEE,vol. 77, no. 2,pp , Feb
Step(d)-trace individuals Use particle filter maximize posterior distribution : Compare to: posterior likelihood priors P(x f )
x f-1 =[x,y,w,h] T in frame f-1 Figure present state vector x f-1 define a target window at frame f-1
Past and current measurement: z f is the frame at time f
priors We use the motion pattern at the center of tracked target to estimate priors on the distribution of next state x f
Transition distribution P(x f |x f-1 ) is the transition distribution We model by normal distribution: is the 2D optical flow vector from predicted motion pattern [27] is the covariance matrix from predicted motion pattern distribution Reference: [27]J. Wright and R. Pless, “Analysis of Persistent Motion Patterns Using the 3D Structure Tensor,”Proc. IEEE Workshop Motion and Video Computing,pp , 2005
Likelihood distribution T: template of human object R: region of bounding box at frame f Z: constant : variance respect to appearance change
Define distance measure: t i : template gradient vector r i : region gradient vector M: number of pixels in template If distance large, likelihood small If distance small, likelihood large
Add weight information to adjust appearance change Error account to appearance change -pixels from occlusion region have large angle between t and r thus error E i large -When Ei large weight becomes small
Experimental results Implementation : -Intel Xeon X GHz processor - 10 frames per seconds - cuboid size 10*10*10
Datasets
From UCF Crowd data set 300,350,300,120 frames respectively (a) train station’s concourse (b) ticket gate (c) sidewalk (d) intersection
Experiment 1 white indicate high error error indicate little texture or noisy area intersection scene due to small amount amount of training data
Experiment 2
When occlusion enormous, variance of likelihood increase at frame 56,112,201
Experiment 3
Experiment 4 Errors cause by Innitial states not contain this direction
Experiment 5
Experiment 6
Conclusion We proposed a efficient method for tracking individuals in crowded scenes We solve the error caused by occlusion problem, appearance change, and different direction movement