Marked Point Processes for Crowd Counting

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Applications of one-class classification

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Bayesian Estimation in MARK

電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)

Semantic Structure from Motion Paper by: Sid Yingze Bao and Silvio Savarese Presentation by: Ian Lenz.

Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Robust Foreground Detection in Video Using Pixel Layers Kedar A. Patwardhan, Guillermoo Sapire, and Vassilios Morellas IEEE TRANSACTION ON PATTERN ANAYLSIS.

Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.

Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Visual Recognition Tutorial

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Segmentation and Tracking of Multiple Humans in Crowded Environments Tao Zhao, Ram Nevatia, Bo Wu IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

1 On the Statistical Analysis of Dirty Pictures Julian Besag.

Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.

CS485/685 Computer Vision Prof. George Bebis

Object Detection and Tracking Mike Knowles 11 th January 2005

Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.

Visual Recognition Tutorial

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.

Radial Basis Function Networks

Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.

A General Framework for Tracking Multiple People from a Moving Camera

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.

1 Howard Schultz, Edward M. Riseman, Frank R. Stolle Computer Science Department University of Massachusetts, USA Dong-Min Woo School of Electrical Engineering.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.

18 th August 2006 International Conference on Pattern Recognition 2006 Epipolar Geometry from Two Correspondences Michal Perďoch, Jiří Matas, Ondřej Chum.

Crowd Analysis at Mass Transit Sites Prahlad Kilambi, Osama Masound, and Nikolaos Papanikolopoulos University of Minnesota Proceedings of IEEE ITSC 2006.

Expectation-Maximization (EM) Case Studies

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

1 Tree Crown Extraction Using Marked Point Processes Guillaume Perrin Xavier Descombes – Josiane Zerubia ARIANA, joint research group CNRS/INRIA/UNSA INRIA.

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Text From Corners: A Novel Approach to Detect Text and Caption in Videos Xu Zhao, Kai-Hsiang Lin, Yun Fu, Member, IEEE, Yuxiao Hu, Member, IEEE, Yuncai.

Presented by: Idan Aharoni

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Marked Point Processes for Crowd Counting

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Ch3: Model Building through Regression

Real-Time Human Pose Recognition in Parts from Single Depth Image

Dynamical Statistical Shape Priors for Level Set Based Tracking

estimated tracklet partition

Marked Point Processes for Crowd Counting

Vehicle Segmentation and Tracking in the Presence of Occlusions

Eric Grimson, Chris Stauffer,

Marked Point Processes for Crowd Counting

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

A Block Based MAP Segmentation for Image Compression

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Marked Point Processes for Crowd Counting Weina Ge and Robert T. Collins CVPR 2009. Presented by, Boddeti MohanVaraKrishna M.E.(SSA)

INTRODUCTION A Bayesian marked point process (MPP) model is developed to detect and count people in crowded scenes. The model couples a spatial stochastic process governing number and placement of individuals with a conditional mark process for selecting body shape. It automatically learns the mark (shape) process from training video by estimating a mixture of Bernoulli shape prototypes along with an extrinsic shape distribution describing the orientation and scaling of these shapes for any given image location. The reversible jump Markov Chain Monte Carlo framework is used to efficiently search for the maximum a posteriori configuration of shapes, leading to an estimate of the count, location and pose of each person in the scene.

Motivation Detecting and counting people in video of a crowded scene a challenging problem. Problems like the spatial overlap between people makes it difficult to delineate individuals as connected component blobs within a background subtraction image Why Crowd Counting? to increase situational awareness for crowd control and public safely by providing real-time estimates of the number of people entering or exiting a venue.

Related Work A. Marana, L. Costa, R. Lotufo, and S. Velastin. On the efficacy of texture analysis for crowd monitoring. In Proc. Computer Graphics, Image Processing and Vision, pages 354–361, 1998. D. Kong, D. Gray, and H. Tao. A viewpoint invariant approach for crowd counting. In International Conference on Pattern Recognition, pages 1187–1190, 2006. V. Rabaud and S. Belongie. Counting crowded moving objects. In IEEE Computer Vision and Pattern Recognition,pages 705–711, 2006 T. Zhao and R. Nevatia. Bayesian human segmentation in crowded situations. In IEEE Conference on Computer Vision and Pattern Recognition, pages 459–466, June 2003

CONTRIBUTIONS A conditional mark process to model known correlations between bounding box size/orientation and image location. The mark process parameters are decomposed into extrinsic appearance (geometry) and intrinsic appearance (shape and posture), learned separately. Bayesian formulation for learning weighted Bernoulli shape masks using the EM algorithm, which gives the flexibility to model a variety of shapes within the rectangular MPP framework.

MARKED POINT PROCESSES FOR CROWD COUNTING Consider counting pedestrians in a frame of video given a foreground mask image produced by background subtraction. The goal is to determine the number and configuration of binary pedestrian shapes that best explains the foreground mask data. One can think of this process as trying to place a set of cutout shapes over the foreground mask to “cover” the foreground pixels while trying to avoid covering the background pixels. If we assume rectangular shapes, leading to the problem of finding a rectangular cover.

PRILIMINARIES A spatial point process is a stochastic process that is suitable for modeling prior knowledge on the spatial distribution of an unknown number of objects. A realization of the process consists of a random countable set of points {s1, . . . , sn} in a bounded region S Є Rd. An MPP couples a spatial point process Z with a second process defined over a “mark” space M of shapes such that a random mark mp ∊ M is associated with each point p ∊ Z.

Contd… For example, a 2D point process of rectangular marks has elements of the form si = (pi, (wi, hi, ϴi)) specifying the location, width, height and orientation of a specific rectangle in the image. In this paper, a novel marked point process is proposed that weaves the prior knowledge of the spatial pattern, extrinsic size and transformation of objects, with intrinsic geometric shape information modeled by a mixture of Bernoulli distributions. Thus, the realization of the MPP in this paper consists of an image location p defined on a bounded subset of R2, together with a mark m defining a geometric shape to place at point p.

MODEL A Bayesian approach to model the objects in the scene as a set of configurations from an MPP that incorporates prior knowledge such as expected sizes of people in the image or knowledge about image regions where people will not appear. The prior term for an object as ∏(si) , and assume independence among the objects. Priors in MPPs are typically factored so that the mark process is independent from the spatial point process, that is ∏(si) = ∏(pi) ∏(mi) . However, this common approach ignores obvious and strong correlations between the size and orientation of projected objects and their 2D image locations in views taken by a static camera.

Introduce a conditional mark process for rectangles representing the shape and orientation of a 2D bounding box, conditioned on spatial location, leading to a factored prior of the form ∏(si) = ∏(pi) ∏(wi , hi , ϴ i|pi) The prior for the point process ∏(p i) is chosen as a homogeneous Poisson point process. This means that the total number of objects follows a Poisson distribution, and given the number of objects, the locations are i.i.d. and each uniformly distributed in the bounded region.

The conditional mark process: We represent the prior for ∏(wi , hi , ϴ i|pi) as independent Gaussian distributions on the width, height and orientation of a pedestrian bounding box centered at a given image location pi. The spatially varying mean and variance parameters for each random mark are stored in lookup tables indexed by the image location.

Likelihood Formally, let y i be the binary value of pixel i in the observed foreground mask data, with 1 = foreground, 0 = background. To compute the goodness of fit of a proposed configuration of shapes to the data, a common way is to first map the configuration into a label image where pixels are labeled foreground if any of the shapes cover it, and background otherwise, so that each pixel in the label image has a one-to-one counterpart in the observed foreground mask.

Let x i be the values in the label image. Since both x i and y i are binary variables, Bernoulli distributions are used to characterize p(y i |x i ). A mixture of Bernoulli Distributions is used instead of modelling the foreground and background separately. The resulting x i in the label image is therefore no longer a binary variable, but a continuous variable ranging from [0, 1] , the mean parameter of the Bernoulli distribution p(y i |x i).

Assuming conditional independence among the pixels,the joint likelihood function can be written as

This likelihood function is biased towards MAP solutions with multiple overlapping rectangles that claim almost the same set of foreground pixels. Many authors address this problem by including a “hardcore” penalty that disallows any overlap by adding an infinite penalty when overlap occurs .This is too strict for the present application of people counting, who may overlap in the view. Another principled approach is to add pairwise interaction terms into the likelihood function to penalize the area of overlap between each pair of shapes. The later approach is followed in the paper.

ESTIMATING EXTRINSIC SHAPE PARAMETERS Orientation Estimation: Since we know pedestrians will be oriented vertically, it suffices to determine the vertical vanishing point of the scene, which completely determines the 2D image orientation of a vertical object at any pixel. Conversely, we can estimate the vertical vanishing point from the measured major axis of elliptical blobs extracted from foreground masks of walking people. Often automatically generated foreground masks are noisy, requiring robust estimation techniques.

HEIGHT AND WIDTH ESTIMATION A reasonable first-order model of many scenes can assume that people are walking or standing on a planar ground surface. The planarity assumption regularizes the computation of size, by constraining the relative depth of people in the scene as a function of image location. Simplified by considering views where the vertical vanishing point is along the y axis of the image coordinate system, which can be achieved with an in-plane image rotation. If the vanishing point is far from the image, i.e. for small tilt angles, such as from an elevated camera looking down a hallway, size in the image is dominated by depth from the camera, and is linearly proportional to row number in the image.

Learning intrinsic shape classes “soft” segmentation of shape by representing the probability of each pixel being foreground use a mixture of Bernoulli distributions to model the learned shape prototypes that are rectangular patches of spatially varying μ(xi) values, one per pixel, learned from a training dataset of observed foreground masks. High μ values implies foreground

select a random subset of frames labeled with ground truth bounding boxes then run background subtraction to get fg/bg masks, which are overlaid with the bounding boxes to extract a set of binary shape patterns, each scaled to a standard size The training samples are a set of binary variables.

X = {xi, . . . , xN} as the collection of N training shape patterns, where xi = (xi1, . . . , xiD)T (D being the size of the shape pattern). Modellin X by a mixture of Bernoulli distributions

INFERENCE To perform Bayesian inference of the best configuration of person shapes in the image, a prior term is defined for the marked point process to combine with the likelihood. Finding the mode of the resulting posterior then provides a MAP estimate over the configuration space. The decision to go to a new configuration is decided based on the Metropolis-Hastings acceptance ratio

RJMCMC based Sampling procedure Birth Proposal Death Proposal Update Proposal The RJMCMC procedure is iterated between 500 and 3000 times, with the larger number of iterations being needed when there are more people in the scene. The move probability for birth, death and update proposals are set to be 0.4, 0.2 and 0.4, respectively

Experiments The sampling procedure takes 1.6 seconds to process 500 iterations on a 720×480 frame, 9 people per frame on average, using unoptimized matlab code.

Summary Proposed a marked point process model to detect and count people in crowds. Our model captures the correlations between the mark process (i.e., bounding box size/orientation) and the spatial point process by automatically estimating an extrinsic shape mapping. The model augmented with intrinsic shape information modeled by a weighted mixture of Bernoulli distributions. The learned shape prototypes are more realistic than simple geometric shapes, which leads to more accurate foreground fitting.

THANK YOU