Multicamera People Tracking with a Probabilistic Occupancy Map Francois Fleuret, Jerome Berclaz, Richard Lengangne (EPFL) and Pascal Fua(IEEE Senior member) PAMI 2008
Outline Introduction Problem definition Modeling & Probabilistic Occupancy Map Result Conclusion
Survey of previous work Monocular approaches – Blob-based methods – Color-based methods Multiveiw approaches – Blob-based methods – Color-based methods – Occupancy map methods
Introduction(1/2) Goal: – Keeping track of multiple persons in a complex environment(occlusion, lighting changes, etc.)
Introduction(2/2) Algorithmic step: 1.Building a color model and a motion model. 2.Estimating the probabilities of occupancy of the ground plane 3.Combining these probabilities with a color model and a motion model and using the Viterbi algorithm to track individuals
Problem formulation(1/4) Computing the optimal trajectories – Processing the video sequences by batches of T=100 frames, each of which includes C cameras. – Discretizing the visible part of ground plane into a finite number G of locations. – Introducing a virtual hidden location H which represents the entrances and departures from and into the visible area.
Problem formulation(2/4) – Let be the hidden stochastic process standing for the locations of individuals whether visible or not. – N* stands for the maximum allowable number of individuals in our world. – variables take values in {1,…,G, H }. – Given, the images acquired at time t for
Problem formulation(3/4) Task: Find the values of L 1,…,L T that maximize: Constrain: – No individual can be at a visible location occupied by an individual who is already processed. (1)
Problem formulation(4/4) Could lead to undesirable local minima – Connecting the trajectories of two separate people Processing individual trajectories in an order that depends on a reliability score – The most reliable ones are computed first, thereby reducing the potential for confusion when processing the remaining one s
Computation of the trajectories(1/2) Maximize the conditional probability: where – Simultaneous optimization of all the L i s would be intractable. (2)
Computation of the trajectories(2/2) Optimizing one trajectory after the other instead: – This can assure each trajectory will not go through already occupied location. (3) (4) (5)
Modeling single trajectory(1/3) Considering the trajectory of individual n over T temporal frames, we seek to maximize: – Since the denominator is constant with respect to l n, we simply maximize the numerator. (6)
Modeling single trajectory(2/3) Introducing the maximum of the probability of both the observations and the trajectory ending up at location k at time t : Modeling the processes and I t jointly with a hidden Markov model: (7) (9) (8)
Modeling single trajectory(3/3) Under such a model, we have the recursive expression: – Performing a global search with dynamic programming – Yielding the classic Viterbi algorithm (10)
Motion model(1/2) Chose: – ρ: c onstant, tuning the average human walking speed – c: constant, limiting the maximum allowable speed (be set to almost 12mph). – Z: a normalization factor – The probability decreases with the distance from location k, and it will be zero if greater than a constant maximum distance. (11)
Motion model(2/2) – Defining the probability of transitions to the parts of the scene that are connected to the hidden location H. – Entrance and departure of individuals are naturally taken into account by the estimation of the maximum a posteriori trajectories.
– Implement background subtraction to produce binary masks B t from input images I t – Denoting T t as the colors of the pixels inside the blobs and then we generate: Appearance model (12) Boolean r.v., standing for the presence of an individual at location k of the grid at time t
Color model(1/3) – : The image composed of 1s inside a rectangle standing for the silhouette of an individual at location k seen from camera c. – : The pixels taken at the intersection of the binary image of camera c at time t and the rectangle corresponding to location k.
Color model(2/3) If we assume that if someone is present at a certain location k, then his presence influences the color of the pixels T t corresponding to k. Modeling the dependency as if the pixels were independent and identically distributed and followed a density in the RGB space associated to the individual.
Color model(3/3) Let be the color distributions of N * individuals present in the scene at the beginning of the batch of T frames, we have: where would be flat if individual n is at location H. (13) (14)
Probabilistic occupancy map (POM) In (12):, ground plane occupancy represents the probability that somebody is standing at location k corresponding to time t. To solve the occupancy problem, we represent humans as simple rectangles and then approximate the occupancy probabilities as the marginals of a product law Q.
Independence assumptions (1) 1.Individuals in the scene do not take into account the presence of other individuals in their vicinity when moving around. –This can be formalized as: (15)
Independence assumptions (2) 2.All statistical dependencies between views are due to the presence of individuals in the room. – This is equivalent to treating the views as function of the vector X=(X 1,…,X G ) plus some independent noise. – Implying that as soon as the presence of all individuals is known, the views become independent. – The assumption can be written as: (16)
Generative image model(1/3) Given X k, modeling B to relate B and X k. Let A c be the synthetic image obtained by putting rectangles at locations where X k =1. Thus,,where denotes “union” between two images. A c with three X k s equal to1.
Generative image model(2/3) Define a normalized pseudo-distance Ψ by: – σ accounts for the quality of the background subtraction, and it was fixed arbitrarily to (17)
Generative image model(3/3) Modeling the conditional distribution P(B c |X) of the background subtraction image, given the true hidden state, as a density decreasing with the pseudo-distance Ψ(B c,A c ) between B c and A c. The value decreases as the distance Ψ between B c and A c increases. (19) (18) (17) A factor is just there to make the probability distribution sum to 1
Relation between the q k s(1/6) Denoting E Q the expectation under X~Q Q : the product law used to approximate the real posterior distribution for a fixed t. We want to minimize the Kullback-Leibler divergence (also called K-L divergence) between the approximation Q and the true posterior.
Relation between the q k s(2/6) We use the form of the derivative of the K-L divergence with respect to the unknown q k. where ε k is the prior probability of presence at location i, P(X k =1). (20)
Relation between the q k s(3/6) Then, if we solve we obtain: with is untractable. Since under X~Q, the image A c is concentrated around B c, so we approximate: (21) (22)
Relation between the q k s(4/6) Leading to the main result: (23) The average image E Q (A c ), where all q k s are null, but four of them are equal to 0.2. The corresponding occupancy probabilities q k
Relation between the q k s(5/6) The evolution of q k and E Q (A c ) : POM q k The average images E Q (A c )
Relation between the q k s(6/6) Intuitively, if putting the rectangular shape for position k in the image improves the fit with the actual images: decreaseincrease become negative become larger
Fast estimation of the q k s(1/3) Estimating the q k s by giving them a uniform value first, and use them to compute the average synthetic images. Re-estimating every q k s: location k
Fast estimation of the q k s(2/3) Remaining issue: computation of. – Has to be done G times per iteration. – For the requirement of convergence, it needs to iterate the order of 100. – E Q (A c ) and differ only in the rectangle A k, where is multiplied by a constant factor.
Fast estimation of the q k s(3/3) Finally, we use the following at each iteration and for every c : (27) (29) (28) (30) (31) | I | be the sum of the pixels of an image I union of two binary images pixelwise product
Testing environment Frame rate: 25 fps Background subtraction: Visiowave 2 indoor sequences: – 2 cameras are about 1.80m, 2 cameras are about 2.30m – Discretized locations G=28*28=794 – Area of interest: 5.5m*5.5m ≈30m 2 – 4 people 4 outdoor sequences: –All 3 or 4 cameras are at head level (≈ 1.80m) – Discretized locations G=40*40=1600 or 30*50=1500 – Area of interest: 10m*10m or 6m*10m – 6 people
Correct result
Failed result The algorithm detects a person at wrong place, due to bad quality of background subtraction.
Global trajectory estimation
Global trajectory estimation - indoor If we pick 100 frames randomly among a sequence, more than 90% of the errors between the estimated distance and the ground truth are less than 31 cm. The results are still robust even though 20% of images are blanked out.
Rectangle size influence The results of POM algorithm are almost unchanged for model sizes between 1.7m to 2.2m. Smaller size of rectangle makes the algorithm more sensitive. The algorithm can detect a 100cm high individual.
Limitations of the algorithm Computing the POM – Poor background subtraction – The presence of people is/are covered by only one camera – The excessive proximity of several individuals Other situation: jumping bending obstacles and long sharp shadows small kids
Conclusion and future work Presented an algorithm that can reliably track multiple persons in a complex environment. Providing metrically accurate position estimates. Extensions of the work – Improvements of the stochastic model – To use the scheme to automatically estimate trajectories and learn sophisticated appearance and behavior models.
Appearance model proof Ref: Appendix B
Color model proof Ref: appendix C
Relation between the q l s Ref: appendix A