Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

Slides:

Advertisements

Similar presentations

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Advertisements

Simultaneous surveillance camera calibration and foot-head homology estimation from human detection 1 Author : Micusic & Pajdla Presenter : Shiu, Jia-Hau.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Automatic Feature Extraction for Multi-view 3D Face Recognition

Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.

Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor ：王聖智教授 Student ：周節.

Robust Object Tracking via Sparsity-based Collaborative Model

Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.

A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Detecting Pedestrians by Learning Shapelet Features

Robust Multi-Pedestrian Tracking in Thermal-Visible Surveillance Videos Alex Leykin and Riad Hammoud.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

IEEE TCSVT 2011 Wonjun Kim Chanho Jung Changick Kim

Broadcast Court-Net Sports Video Analysis Using Fast 3-D Camera Modeling Jungong Han Dirk Farin Peter H. N. IEEE CSVT 2008.

Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.

Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

A Study of Approaches for Object Recognition

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.

Object Detection using Histograms of Oriented Gradients

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

Scale Invariant Feature Transform (SIFT)

Introduction to Object Tracking Presented by Youyou Wang CS643 Texas A&M University.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Shape Classification Using the Inner-Distance Haibin Ling David W. Jacobs IEEE TRANSACTION ON PATTERN ANAYSIS AND MACHINE INTELLIGENCE FEBRUARY 2007.

Fast Human Detection Using a Novel Boosted Cascading Structure With Meta Stages Yu-Ting Chen and Chu-Song Chen, Member, IEEE.

REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang Lin Sun Yat-sen University,

Generic object detection with deformable part-based models

Yuping Lin and Gérard Medioni.  Introduction  Method  Register UAV streams to a global reference image ▪ Consecutive UAV image registration ▪ UAV to.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

A General Framework for Tracking Multiple People from a Moving Camera

 Tsung-Sheng Fu, Hua-Tsung Chen, Chien-Li Chou, Wen-Jiin Tsai, and Suh-Yin Lee Visual Communications and Image Processing (VCIP), 2011 IEEE, 6-9 Nov.

Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca

Object Detection with Discriminatively Trained Part Based Models

Pedestrian Detection and Localization

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Hierarchical Method for Foreground DetectionUsing Codebook Model Jing-Ming Guo, Yun-Fu Liu, Chih-Hsien Hsia, Min-Hsiung Shih, and Chih-Sheng Hsu IEEE TRANSACTIONS.

A Statistical Method for 3D Object Detection Applied to Face and Cars CVPR 2000 Henry Schneiderman and Takeo Kanade Robotics Institute, Carnegie Mellon.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Histograms of Oriented Gradients for Human Detection(HOG)

Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.

Jiu XU, Axel BEAUGENDRE and Satoshi GOTO Computer Sciences and Convergence Information Technology (ICCIT), th International Conference on 1 Real-time.

Human Detection Method Combining HOG and Cumulative Sum based Binary Pattern Jong Gook Ko', Jin Woo Choi', So Hee Park', Jang Hee You', ' Electronics and.

Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team

Text From Corners: A Novel Approach to Detect Text and Caption in Videos Xu Zhao, Kai-Hsiang Lin, Yun Fu, Member, IEEE, Yuxiao Hu, Member, IEEE, Yuncai.

Week 10 Emily Hand UNR.

Vehicle Detection in Aerial Surveillance Using Dynamic Bayesian Networks Hsu-Yung Cheng, Member, IEEE, Chih-Chia Weng, and Yi-Ying Chen IEEE TRANSACTIONS.

More sliding window detection: Discriminative part-based models

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.

WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人：蒲薇榄.

Fast Human Detection in Crowded Scenes by Contour Integration and Local Shape Estimation Csaba Beleznai, Horst Bischof Computer Vision and Pattern Recognition,

A Tutorial on HOG Human Detection

A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.

“The Truth About Cats And Dogs”

Object DetectionII Ali Taalimi 01/08/2013.

Author: Ye Li, Meng Joo Er, and Dayong Shen Speaker: Kai-Wen, Weng

Presentation transcript:

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLGENCE, APRIL 2010

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Introduction Robust Human tracking and identification are highly dependent on reliable human detection and human segmentation. Remains challenging due to several conditions like body postures, illumination, occlusion, and viewpoint changes. Goal: Develop a robust and efficient approach to detect and segmentation. Method: Shape-based, part-template matching

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Previous Work Shape Feature extraction schemes – Model human shapes globally [1],[2],[3] – Model shapes using sparse local features [9],[10],[11] Learning Perspective – Generative approach – tree-based data structure [6],[7],[8] – Discriminative approach – using SVMs as the test classifiers [3] Surveillance scenarios – Motion blob information [35],[36]

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Proposed Approach Hierarchical part-template matching approach combining with discriminative learning.

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Hierarchical Part-Template Matching Generating the part-template tree model – Synthesizing global shape models – Generating parts by decomposition – Constructing an initial tree model using parts Learning the part-template tree Hierarchical part-template matching

Synthesizing Global Shape Models Analyzing articulation of human body to six regions – Head, torso, pair of upper legs, pair of lower legs – Parameter above are quantized into {3,2,3,3,3,3}

Generating Parts by Decomposition Binarize (a) and to obtain (b), then extract boundaries of the silhouettes to get (c). Silhouettes are decomposed into three parts(head-torso, upper legs, and lower legs) The parameters of silhouettes are denoted by θ j, consist of index and location

Constructing an Initial Tree Model Using Parts A part-template tree is conducted by placing the decomposed part region or fragment into a tree. Four layer L 0 ~L 3, denote root, head-torso, upper and lower legs separately. Tree consists of 186 part-template. (6 ht models, 18 ul models, and 162 ll models) Much larger set only slightly improves in performance. Applying fast hierarchical shape matching scheme.

Constructing an Initial Tree Model Using Parts

Learning the Part-Template Tree The tree doesn’t contain any prior statistics from real human silhouettes. The learning is performed by matching the tree to a set of real human silhouette images. The goal is to explicitly estimate branching probability distributions (conditional probability distributions).

Learning the Part-Template Tree Learning method: – The training silhouette is passed through the tree from root to estimate the matching score and find the optimal path. – Based on the set of paths, a branching probability distribution is estimated for each node. – Each node contains a binary image of the part- template, its sample point coordinates, and a branching probability.

Hierarchical Part-Template Matching Similarly to the model used for tree learning. The overall matching score for a detection window is simply modeled as a summation of scores of all nodes along the path. Score of node is the product of the part- template matching score and the probability of the node. Matching method is similar to Chamfer matching [6]. – The matching score of a sample point on the contour is measured by edge-orientation matching to find the optimal human pose. [6] D.M. Gavrila and V. Philomin, “Real-Time Object Detection for SMART Vehicles,” Proc. IEEE

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Pose-Adaptive Descriptors Introduce a pose-adaptive feature computation method for detecting human from images using SVM. By similar method of HOG descriptor[3] getting object detection window. After given the candidate detection window, hierarchical part-template matching is performed to estimate the optimal pose. After the pose is estimated, block features closest to each pose contour point are collected. [3] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Conf.

Pose-Adaptive Descriptors

Low-Level Features Similar to [3] Given an image, calculate gradient magnitudes |G| and edge orientation O Quantize the image into 8x8 nonoverlapping cells, each represent a histogram of edge orientations.

Pose Inference on The Low-Level Features An optimal tree path is estimated based on the matching score. Among matching score, the part-template score is measured by an average of gradient magnitude. Matching score (1), where B(t) = [O(t)/(π/9)], h is the orientation histogram The average score of the part-template is (2)

Representation Using Pose-Adaptive Descriptors The global shape models are represented as a set of boundary points with corresponding edge orientations.

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Scene-to-Camera Calibration To obtain a mapping between head points and foot points in the image, estimate the homography between the head plane and the foot plane in the image. Get head point p h = f(p f ), where p f is an arbitrary point of foot.

Combining With Background Subtraction Find foot regions R foot = {x| ϒ x ≥ξ } Through part-template matching finding regions that may be legs. Given the estimated human vertical axis v x and an adaptive rectangular window W(x,(w 0,h 0 )), get human detection. Get human segmentation.

Combining With Calibration and Background Substraction

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

Experiment Result Present result of human detector using their method on two public pedestrian data sets (INRIA and MIT-CBCL). Present result of multiple occluded human detector on three crowded image and video data set. Compare with other approaches using DET curves.

Experiment of Detection Result

Better performance than HOG-SVM. Not only detecting but also segmenting human poses. Can be further improved because of capability of being extended to cover more pose or articulations. Successfully detected difficult poses while the HOG-based detector missed.

Experiment of Detection Result

Experiment of Segmentation Result Using pose model and probabilistic hierarchical part-template matching algorithm give very accurate segmentation in the MIT- CBCL and INRIA data set.

Experiment Without Subtraction

Experiment With Subtraction Data set – Caviar Benchmark data set – Munich Airport data set collected by Siemens Corporate Research Can get good result even with poor and inaccurate background subtraction.

Experiment With Subtraction

Overview Introduction Previous Work Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction Experiment Result Conclusion

A hierarchical part-template matching approach is employed to match human shapes with images detect and segment simultaneously. Many of misdetections are due to the pose estimation failures. Future work – Investigating the addition of color and texture statistics to the local contextual descriptor to improve the detection and segmentation performance.