Human Upper Body Pose Recognition Using Adaboost Template for Natural Human Robot Interaction Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu Institute for Infocomm Research (I2R), Singapore
Outline Introduction Related Works The Method The Problem Template Modeling Adaboost Template Recognition & Segmentation Experiments and Evaluations Conclusions
Introduction Motivations Difficulties for approaches on 2D images Upper body pose is one of important clues of human social behavior in natural conversation, especially multiple persons are involved in the conversation; A social robot has to be aware of various clues from human body for intelligent and natural human-robot-interaction; An important clues in our social robots for attention estimation and engagement management (direction, distance, motion state, upper body pose, face pose, gaze, etc.) Difficulties for approaches on 2D images Pose ambiguity due to the lost depth information and self-occlusion; Limited view of human objects when engaged in face-to-face interaction; Variations of human shapes, scales, clothes, poses, etc. Complexity of visual features due to lighting conditions, cluttered backgrounds, and crowded scenes.
Related Works Human Body Pose Recognition in Computer Vision 2D silhouette based approaches (e.g., Gavirla and Philomin, ICCV’09, Mittal, et al, IEEE AVSS’03, Dimitrijevic, et al, ICCV Workshop’05) 2D pictorial models (e.g., Ju, et al, FG’96, Felzenszwalb and Huttenlocher, IJCV 2005, Andriluka, et al, CVPR’09, Ferrari, et al, CVPR’09) 3D structure models (e.g., Taylor, CVIU 2000, Lee and Cohen, ECCV’04) Template Matching Deformable template matching (e.g., Cootes, Edwards, Taylor, Active Appearance Models) Object tracking (Yilmaz, et al, ACM CS 2006 (Survey)) Face detection (Yang, et al, IEEE T-PAMI 2002 (Survey)) Image registration (Zitova and Flusser, IVC 2003 (Survey)) Adaboost Learning in Vision Face detection (Viola & Jones, CVPR’01 (Cascade Classifiers)) Multi-view face detection (Huang, et al., IEEE T-PAMI 2007 (Vector Boosting Algorithm)) Multiclass object detection with shared features (Torralba, et al, CVPR’04 (Joint Boosting Algorithm)) Online tracking (e.g., Avidan, “Ensemble Tracking,” IEEE T-PAMI 2007)
Method: The Problem Problem formulation Challenges Classify the upper body poses into seven categories: views of 0°, ±30°, ±60°, and ±90° to the camera. Challenges The depth measures from disparity images are not accurate; Inter-class variations due to variations of human sizes, shapes, poses, and clothes; Inter-class variations due to human positions to the camera; Incompletion of disparity measures from body due to the lack of texture features.
Method: Template Modeling Learning the basic templates Learning the mean template for each category Learning the variance template for each category Learning the percentage template for each category
Method: Adaboost Template Definition of positive and negative regions Design of weak classifiers R+ R−
Method: Learning Adaboost learning algorithm Given Nc training samples for category c. Initialize: For t=1,…,T For each pixel x in the template Compute the error with respect to the distribution Dt Choose Tune the template boundary Update the distribution
Method: Recognition & Segmentation Adaptive model-driven segmentation: Quality level of disparity measurements Adaptively compensate for the missing disparity measurements
Experiments and Evaluations A New Benchmarking Data Set Camera: Videre Design STOC stereo camera. Data Set: 430 images from 19 individuals. Training samples: Randomly select 93 images of 8 persons from the data set, among them, 28 for 0° view, 13 for +30° view, 10 for -30° view, 11 for +60° view, 10 for -60° view, 11 for +90° view, and 10 for -90° view. Baseline Algorithm: Template matching: 3D surface template matching (Breitenstein, et al, “Real-Time Face Pose Estimation from Single Range Images”, CVPR’08) Distance: Let T(x) be a normalized input sample Recognition
Experiments and Evaluations Results on recognition: On average, the accuracy rate increased from 67.4% to 90.7%. Template Matching -90° -60° -30° 0° +30° +60° +90° 100% 60% 37.1% 2.9% 2.6% 31.6% 65.8% 1.75% 14.3% 78.6% 3.6% 61.0% 9.7% 29.3% 27.8% 72.2% Adaboost Template -90° -60° -30° 0° +30° +60° +90° 100% 3.4% 88% 6.9% 1.7% 3.3% 81.7% 15% 9.5% 87.3% 3.2% 5.2% 77.6% 17.2% 98.3%
Experiments and Evaluations Results on segmentation: Pose recognition Quality estimation Top-down segmentation
Application: Attention Estimation Deployed in a robot receptionist for attention estimation
Conclusions A new approach of human upper body pose recognition for human robot interaction A new template model: Adaboost template Easy for training (no need of negative samples) Achieve good balance between generality and specialties of training samples Both recognition and segmentation Deployed and tested on a robot receptionist for attention estimation and the management of engagement in dialogs which may involve multiple participants.
Thank You!