VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:

Motivation Automated tracking and activity recognition is missing from marketing research Hardware is already there Visual information can reveal a lot about human interactions with each other Help in making intelligent marketing decisions

Goals Extract semantic information from the tracks (Activity Analysis) Process visual information to get a formal representation of human locations (Visual Tracking)

Related Work: Detection and Tracking Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple-blob tracker”, ICCV 2001

Related Work: Activity Recognition Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998

System Components Low-level Processing Camera Model Obstacle Model Foreground Segmentation Head Detection Tracking Jump-diffuse transitions Priors and Likelihoods Accept/Reject Candidate Event Detection Actor Distances Deterministic Agglomerative Clustering Validity Index Activity Detection Event Distances Fuzzy Agglomerative Clustering Adaptively Remove Weak Clusters

Background Modeling Color μ RGB I low I hi codeword codebook ………..

Adaptive Background Update  If there is no match  if codebook is saturated then pixel is foreground  else create new codeword  Else update the codeword with new pixel information  If >1 matches then merge matching codewords I(p) > I low I(p) < I high (RGB(p)∙ μ RGB ) < T RGB t(p)/t high > T t1 t(p)/t low > T t2  Match pixel p to the codebook b

Background Subtraction

Head Detection Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction

Camera Setup Two camera types PerspectiveSpherical Mixtures of indoor and outdoor scenes Color and thermal image sensors Varying lighting conditions (daylight, cloud cover, incandescent, etc.)

Camera Modeling Perspective Projection Spherical Projection X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix Assumption: floor plane Z f = 0 X = cos(θ) tan(π-φ)(Z c -Ż) Y = sin(θ) tan(π-φ)(Z c -Ż) Z = Ż X Y Z y x [X c, Y c, Z c ] Lat Lon [X c, Y c, Z c ] X Y Z

Tracking Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. Apply Markov Chain Monte Carlo (MCMC) to estimate the next state ? ? ? x t-1 xtxt ztzt ? Add body Delete body Recover deleted Change Size Move

Tracking Location of each pedestrian is estimated probabilistically based on:  Current image  Previous state of the system  Physical constraints The goal of our tracking system is to find the candidate state x´ (a set of bodies along with their parameters) which, given the last known state x, will best fit the current observation z P(x’| z, x) = L(z|x’) · P(x’{x}) observation likelihood state prior probability

Tracking: Priors N(h μ, h σ 2 ) and N(w μ,w σ 2 )  body width and height U(x) R and U(y) R  body coordinates are weighted uniformly within the rectangular region R of the floor map. d(w t, w t−1 ) and d(h t, h t−1 )  variation from the previous size d(x t, x’ t−1 ) and d(y, y’ t−1 )  variation from Kalman predicted position N(μ door, σ door )  distance to the closest door (for new bodies) Constraints on the body parameters: Temporal continuity:

Tracking Likelihoods: Distance weight plane Problem: blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004) Solution: employ “distance weight plane” D xy = |P xyz, C xyz | where P and C are world coordinates of the camera and reference point correspondingly and

Tracking Likelihoods: Z-buffer 0 = background, 1=furthermost body, 2 = next closest body, etc

Tracking Likelihoods: Color Histogram Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step. Let: I - set of all blob pixels O - set of body pixels Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms

Tracking: Anisotropic Weighted Mean Shift Classic Mean-ShiftOur Mean-Shift t-1t H t

Actors and events Shopper groups are formed by individual shoppers who shop together for some amount of time –More than fleeting crossing of paths –Dwelling together –Splitting and uniting after a period of time

Swarming Shopper groups detected based on “swarming” idea in reverse –Swarming is used in graphics to generate flocking behaviour in animations. –Rules define flocking behaviour: Avoid collisions with the neighbors. Maintain fixed distance with neighbors Coordinate velocity vector with neighbors.

Tracking Customer Groups We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) 51 610 Customer groups

Terminology Actors: shoppers (bodies detected in tracking) –(x, y, id) Swarming events defined as short time activity sequences of multiple agents interacting with each other. –Could be fleeting (crossing paths) –Later analysis sorts this out and ignores chance encounters.

Swarming The actors that best fit this model signal a Swarming Event Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods. 111213

Two actors come sufficiently close according to some distance measure: –Relative position p i =(x i, y i ) of actor i on the floor –Body orientations α i –Dwelling state δ i ={T,F}. Event detection Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling

Event detection Perform agglomerative clustering of actors a into clusters C Initialize: N singleton clusters Do: merge two closest clusters While not: validity index I reaches its maximum I consists of isolation I ni and compactness I nc I ni = isolation I nc = compactness

Event detection # Iteration Final events

Activity Detection The shopper group detection is accomplished by clustering the short term events over long time periods. –The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).

Activity detection Higher level activities (shopper groups) detected using these events as building blocks over longer time periods Some definitions: –B ei ={b  e i } the set of all bodies taking part in an event e i. –τ ei and τ ej are the average times of events e i and e j happening.

Activity detection Define a measure of similarity between two events Overlap between two sets of actorsSeparation in time

Activity detection Perform fuzzy agglomerative clustering Minimize objective function where w ij are fuzzy weights and asymmetric variants of Tukey’s biweight estimators:  (.) is the loss function from robust statistics. ψ(.) is the weight function  Adaptively choose only strong fuzzy clusters  Label remaining clusters as activities

Results: Swarming activities detected in space-time Dot location: average event location Dot size: validity Dots of same color: belong to same activity

Group Detection Results

Quantitative Results

Tracking Sequence number FramesPeople People missed False hits Identity switches 1105415313 206018000 3170016512 415063000 520312000 616524000 %85444812.54.110.4

Group Detection SequenceGroupsP+P−P−Partial 120070 217131 3 070 Total541122 Percent1001.822.23.7 Ground truth (manually determined) false positives false negatives (groups missed) Partially identified groups (≥2 people in the group Correctly identified)

Qualitative Assesments Longer paths provide better group detection ( p val << 1 ) Two-people groups are easiest to detect Simple one-step clustering of trajectories is not sufficient for long-term group detection Employee tracks pose a significant problem and have to be excluded Several groups were missed by the operator in the initial ground truth –System caught groups missed by the human expert after inspection of results.

Contributions –BG subtraction based on codebook (RGB+thermal) –Introduced head candidate selection method based on VPP histogram –Resolving track initialization ambiguity and non-unique body-blob correspondence –Informed jump-diffuse transitions in MCMC tracker –Weight plane and z-buffer improve likelihood estimation –Anisotropic mean-shift with obstacle model –Two-layer formal framework high level activity detection –Implemented robust fuzzy clustering to group events into activities

Future Work Improved Tracking (via feature points) Demographical analysis Focus of Attention Sensor Fusion Other Types of Swarming Activities

Questions? Thank you!

Measuring Focus of Attention By exploiting a number of visual cues, such as walking speed, shoulder position and facial color, we approximate the angle of the customer’s gaze and orientation (α in previous slide). 42 purple cones indicate gaze direction (not very reliable)

Background Subtraction

Tracking – Accepting the State x’ and x  candidate and current states P(x)  stationary distribution of Markov chain m t  proposal distribution Candidate proposal state x’ is drawn with probability m t (x’|x) and then accept it with the probability α(x, x’)

Related Work: Background Modelling Horprasert, Harwood, and Davis “A statistical approach for real-time robust background subtraction and shadow detection”, ICCV 1999 Stauffer and Grimson “Adaptive background mixture models for real-time tracking” CVPR 1999 Kim, Chalidabhongse, Harwood, and Davis “Background modeling and subtraction by codebook construction”, ICIP 2004 Wren, Azarbayejani, Darrell, and Pentland “Pfinder: Real- time tracking of the human body” PAMI1997. Cucchiara, Grana, Piccardi, and Prati “Detecting moving objects, ghosts, and shadows in video streams” PAMI 2003.

Experiments We have run the algorithm on 3 one hour long videos collected in real retail store using the panoramic camera. –typical store traffic on three different day taken during 4PM-5PM Ground truth: shopper groups marked by human marketing expert for the videos. Total number of customers appearing in the scene is 254 Total number of groups is 50: –9 groups consist of 3 or more people.

Activity Detection  1 and  2 are sigmoid functions that saturate beyond certain distances

Background Modelling Two stacks of codeword values (codebooks) Color μ RGB I low I hi Thermal t high t low codeword codebook

Tracking: Jump-Diffuse Transitions Add a new body Delete a body Recover a recently deleted body Change body dimensions Change body position (optimize with mean shift)

Shortcomings No mean-shift for spherical model Use of ad-hoc thresholds in several places: –body mutation weights –fuzzy clustering termination criteria –number of MCMC iterations Imprecise obstacle model Not-real time (problem for some security applications)

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:

Similar presentations

Presentation on theme: "VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:

Similar presentations

Presentation on theme: "VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:"— Presentation transcript:

Similar presentations

About project

Feedback