Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLGENCE, APRIL 2010
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Introduction Robust Human tracking and identification are highly dependent on reliable human detection and human segmentation. Remains challenging due to several conditions like body postures, illumination, occlusion, and viewpoint changes. Goal: Develop a robust and efficient approach to detect and segmentation. Method: Shape-based, part-template matching
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Previous Work Shape Feature extraction schemes – Model human shapes globally [1],[2],[3] – Model shapes using sparse local features [9],[10],[11] Learning Perspective – Generative approach – tree-based data structure [6],[7],[8] – Discriminative approach – using SVMs as the test classifiers [3] Surveillance scenarios – Motion blob information [35],[36]
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Proposed Approach Hierarchical part-template matching approach combining with discriminative learning.
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Hierarchical Part-Template Matching Generating the part-template tree model – Synthesizing global shape models – Generating parts by decomposition – Constructing an initial tree model using parts Learning the part-template tree Hierarchical part-template matching
Synthesizing Global Shape Models Analyzing articulation of human body to six regions – Head, torso, pair of upper legs, pair of lower legs – Parameter above are quantized into {3,2,3,3,3,3}
Generating Parts by Decomposition Binarize (a) and to obtain (b), then extract boundaries of the silhouettes to get (c). Silhouettes are decomposed into three parts(head-torso, upper legs, and lower legs) The parameters of silhouettes are denoted by θ j, consist of index and location
Constructing an Initial Tree Model Using Parts A part-template tree is conducted by placing the decomposed part region or fragment into a tree. Four layer L 0 ~L 3, denote root, head-torso, upper and lower legs separately. Tree consists of 186 part-template. (6 ht models, 18 ul models, and 162 ll models) Much larger set only slightly improves in performance. Applying fast hierarchical shape matching scheme.
Constructing an Initial Tree Model Using Parts
Learning the Part-Template Tree The tree doesn’t contain any prior statistics from real human silhouettes. The learning is performed by matching the tree to a set of real human silhouette images. The goal is to explicitly estimate branching probability distributions (conditional probability distributions).
Learning the Part-Template Tree Learning method: – The training silhouette is passed through the tree from root to estimate the matching score and find the optimal path. – Based on the set of paths, a branching probability distribution is estimated for each node. – Each node contains a binary image of the part- template, its sample point coordinates, and a branching probability.
Hierarchical Part-Template Matching Similarly to the model used for tree learning. The overall matching score for a detection window is simply modeled as a summation of scores of all nodes along the path. Score of node is the product of the part- template matching score and the probability of the node. Matching method is similar to Chamfer matching [6]. – The matching score of a sample point on the contour is measured by edge-orientation matching to find the optimal human pose. [6] D.M. Gavrila and V. Philomin, “Real-Time Object Detection for SMART Vehicles,” Proc. IEEE
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Pose-Adaptive Descriptors Introduce a pose-adaptive feature computation method for detecting human from images using SVM. By similar method of HOG descriptor[3] getting object detection window. After given the candidate detection window, hierarchical part-template matching is performed to estimate the optimal pose. After the pose is estimated, block features closest to each pose contour point are collected. [3] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Conf.
Pose-Adaptive Descriptors
Low-Level Features Similar to [3] Given an image, calculate gradient magnitudes |G| and edge orientation O Quantize the image into 8x8 nonoverlapping cells, each represent a histogram of edge orientations.
Pose Inference on The Low-Level Features An optimal tree path is estimated based on the matching score. Among matching score, the part-template score is measured by an average of gradient magnitude. Matching score (1), where B(t) = [O(t)/(π/9)], h is the orientation histogram The average score of the part-template is (2)
Representation Using Pose-Adaptive Descriptors The global shape models are represented as a set of boundary points with corresponding edge orientations.
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Scene-to-Camera Calibration To obtain a mapping between head points and foot points in the image, estimate the homography between the head plane and the foot plane in the image. Get head point p h = f(p f ), where p f is an arbitrary point of foot.
Combining With Background Subtraction Find foot regions R foot = {x| ϒ x ≥ξ } Through part-template matching finding regions that may be legs. Given the estimated human vertical axis v x and an adaptive rectangular window W(x,(w 0,h 0 )), get human detection. Get human segmentation.
Combining With Calibration and Background Substraction
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
Experiment Result Present result of human detector using their method on two public pedestrian data sets (INRIA and MIT-CBCL). Present result of multiple occluded human detector on three crowded image and video data set. Compare with other approaches using DET curves.
Experiment of Detection Result
Better performance than HOG-SVM. Not only detecting but also segmenting human poses. Can be further improved because of capability of being extended to cover more pose or articulations. Successfully detected difficult poses while the HOG-based detector missed.
Experiment of Detection Result
Experiment of Segmentation Result Using pose model and probabilistic hierarchical part-template matching algorithm give very accurate segmentation in the MIT- CBCL and INRIA data set.
Experiment Without Subtraction
Experiment With Subtraction Data set – Caviar Benchmark data set – Munich Airport data set collected by Siemens Corporate Research Can get good result even with poor and inaccurate background subtraction.
Experiment With Subtraction
Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion
A hierarchical part-template matching approach is employed to match human shapes with images detect and segment simultaneously. Many of misdetections are due to the pose estimation failures. Future work – Investigating the addition of color and texture statistics to the local contextual descriptor to improve the detection and segmentation performance.