Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Applications of one-class classification
Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.
Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Extended Gaussian Images
Face Alignment with Part-Based Modeling
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Yuanlu Xu Human Re-identification: A Survey.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Automatic Feature Extraction for Multi-view 3D Face Recognition
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
USING LINKING FEATURES IN LEARNING NON-PARAMETRIC PART MODELS * Ammar Kamal Hattab ENGN2560 Final Project Presentation May 17, 2013 * Leonid Karlinsky,
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Scale Invariant Feature Transform (SIFT)
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
Bag-of-Words based Image Classification Joost van de Weijer.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Computer vision.
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Object Recognition in Images Slides originally created by Bernd Heisele.
A Codebook-Free and Annotation-free Approach for Fine-Grained Image Categorization Authors Bangpeng Yao et al. Presenter Hyung-seok Lee ( 이형석 ) CVPR 2012.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Histograms of Oriented Gradients for Human Detection(HOG)
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Recognition Using Visual Phrases
CSE 185 Introduction to Computer Vision Feature Matching.
A New Method for Crater Detection Heather Dunlop November 2, 2006.
Unsupervised Salience Learning for Person Re-identification
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Describing People: A Poselet-Based Approach to Attribute Classification.
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology
Face Detection and Head Tracking Ying Wu Electrical Engineering & Computer Science Northwestern University, Evanston, IL
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Data Driven Attributes for Action Detection
Recognizing Deformable Shapes
PRESENTED BY Yang Jiao Timo Ahonen, Matti Pietikainen
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Categorization by Learning and Combing Object Parts
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
Outline Background Motivation Proposed Model Experimental Results
CSE 185 Introduction to Computer Vision
Recognizing Deformable Shapes
Presentation transcript:

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

Outline Introduction Related work Methods Experiments Conclusion and future work

Introduction The proposed poselet classifiers are directly trained to handle the visual variation associated with a common underlying semantics.

Introduction What is poselet? A poselet describes a particular part of the human pose under a given viewpoint. It is defined with a set of examples that are close in 3D configuration space. Two criteria of “good” Poselets 1.Easy to find the poselet given the input image. (Tightly clustered in appearance space) 2.Easy to localize the 3D configuration of the person conditioned on the detection of a poselet. (Tightly clustered in configuration space) Contribution 1.Propose a new notion of part, a “poselet”, and an algorithm for selecting good poselets. 2.Develop a novel dataset H3D(Humans in 3D) which is annotated with 3D configuration information.

Related work 1.Work in the pictorial structure tradition Disadvantage: most natural to construct kinematic simulations of a moving person, while may not correspond to the most salient features for visual recognition. 2.Work in the appearance based window classification tradition Disadvantage: not suitable for pose extraction or localization of the anatomical body parts or joints. 3.Work of hybrid approach which have stages of one type followed by a stage of another type Disadvantage: the parts themselves are not jointly optimized with respect to combined appearance and configuration space criteria

Left Hip Left Shoulder Method This paper use keypoints to annotate the joints, eyes, nose, etc. of people to find correspondence at training time

Method(H3D dataset) H3D dataset:  2000 human annotations  Images from Flickr with Creative Commons Attributions License4.  Provides annotation of 15 types of regions of a person, and 19 types of keypoint annotations.

Method (H3D dataset) Why 3D not 2D? 3D2D Use ratio of annotations contribute to the statistics Every annotationOnly frontal view annotations Sensitivity to foreshortening Not strongly affectedStrongly affected Whether allow for decomposing camera view point YesNA Whether allow for query for the appearance of poselets YesNA

Method (H3D dataset) Left: H3D can generate conditional region probability masks. Right: H3D can generate scatter plots of the 2D screen locations of the right elbow and left ankle given the locations of both shoulders.

Method (Finding Candidates) Define the (asymmetric) distance in configuration space from example s to example r as: Where = [x, y, z] are the normalized 3D coordinates of the i-th keypoint of the example s. The weight term is a Gaussian with mean at the center of the patch. The term is a penalty based on the visibility mismatch of keypoint i in the two examples.

Method (Generate Poselet Candidates) Example query regions (left column) and the corresponding closest matches in configuration space generated by H3D.

Method (Training Poselet classifiers) 1.Given a seed patch 2.Find the closest patch (search by running a scanning window over all positions and scales of all annotations) 3.Sort them by residual error 4.Threshold them 5.Select a small set of poselets that are: Individually effective and complementary 6.Use them as positive training examples to train a linear SVM with HOG features

Method (For Detection & Localization) The probability of detecting the object O at position x is: Where is the score that a poselet classifier assigns to location x and is the weight of the poselet, and the author use the Max Margin Hough Transform to learn the weight.

Experiments (1) Detecting Human Torsos ROC curve comparing the proposed torso detection performance together with other published detectors on the H3D test set

Experiments Examples of torso detections using poselets

Experiments (2) Detecting People on PASCAL VOC 2007 Outperform the part-based deformable detector on H3D but get comparable performance on VOC2007.

Experiments (3) Detecting Keypoints Detection rate of some keypoints conditioned on true positive torso detection.

Conclusion & Future Work Conclusion The authors propose a two-layer classification/ regression model for detecting people and localizing body components. And the 3D annotation guides the search for good parts. Future work Use H3D more widely.

Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance

Outline Introduction Related work Methods Experiments Conclusion

Introduction Application background Current research: two extremes of individuals and basic-level categories Few research on subordinate categorization What is subordinate categorization? Distinguish by the differing properties of parts.

Introduction Overview of the Proposed approach

Introduction Contribution 1.A framework for detecting volumetric part models 2.A pose-normalized appearance model for comparing part appearance 3.A classification model for aggregating information about part properties

Related work Image features Disadvantages: view-dependent, pose variation Part model Disadvantages: high intra-class variability, significant articulation Hierarchy model Disadvantages: subordinate categories have both subtle and drastic appearance variation Attribute model Disadvantages: Insufficient to model subtle differences between parts

Method Why birds? 1.Exist largest subordinate-level dataset (CUB- 200) 2.Conform with the definition of subordinate- level (share common structure & parts with many subtle part distinctions) 3.Involving highly variable appearances and articulations (challenging)

Method (PNAD) Post-normalized appearance descriptor (PNAD) 1.Map points on a unit sphere onto the ellipsoid’s surface for patch sampling 2.Project patches on ellipsoid surface to original image plane 3.Extend the projected patches for extracting SIFT descriptor 4.Concatenate the location and appearance information for forming PAND descriptor

Method (PNAD)

Method (Birdlet) Volumetric primitive templates 1.Two parts (head & body) 2.Two ellipsoids (parameters: location center, 3D orientation, scale) 3.Alignment (assisted by visible point features: beaktips, eyes, wingtips, feet and tails)

Method (Training & Testing) 1.Get selection windows for detecting objects and parts in testing image(both positive and negative examples for SVM classifier) 2.Get birdlets for integrated classification

Method (Integrated Classification) Stacked Evidence Trees model The Stacked Evidence Tree takes a test feature and finding a set of training features that are similar both in appearance and surface location, and ultimately returning the class label distribution across this similar set

Experiments Classification Confusion Matrices (a) the PHOW/SVM Baseline (37.12% MAP), (b) the PNAD-RF performance on the top 20% of detections (40.25% MAP), and (c) the PNAD-RF performance on the ground truth part locations (66.58% MAP).

Experiments Example Volumetric Primitive Detections Top two images: the bird is detected and localized with reasonable accuracy Low two images: false positive detections

Experiments Classification of Volumetric Detections. For the k top ranked detections, this plots the corresponding PNAD-RF classification performance (using mean-average precision)

Conclusion This paper presented an approach for subordinate categorization using a pose- normalized appearance representation founded upon a volumetric part model.