Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.

Slides:

Advertisements

Similar presentations

Max-Margin Additive Classifiers for Detection

Advertisements

Poselets: Body Part Detectors trained Using 3D Human Pose Annotations Lubomir Bourdev & Jitendra Malik ICCV 2009.

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.

Learning Semantics with Less Supervision

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

A Unified Framework for Context Assisted Face Clustering

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman.

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Capturing Human Insight for Visual Learning Kristen Grauman Department of Computer Science University of Texas at Austin Work with Sudheendra Vijayanarasimhan,

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Computer Vision, ICCV IEEE 11th International.

Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.

Parsing Clothing in Fashion Photographs

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Data Visualization STAT 890, STAT 442, CM 462

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.

Detecting Pedestrians by Learning Shapelet Features

Fast intersection kernel SVMs for Realtime Object Detection

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.

Good morning, everyone, thank you for coming to my presentation.

Region-based Voting Exemplar 1 Query 1 Exemplar 2.

CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

What, Where & How Many? Combining Object Detectors and CRFs

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

1 Visual Processing for Social Media Andrew C. Gallagher Tsuhan Chen September 30, 2012 Cornell University TexPoint fonts used in EMF. Read the TexPoint.

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

A General Framework for Tracking Multiple People from a Moving Camera

Multi-task Low-rank Affinity Pursuit for Image Segmentation Bin Cheng, Guangcan Liu, Jingdong Wang, Zhongyang Huang, Shuicheng Yan (ICCV’ 2011) Presented.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Category Discovery from the Web slide credit Fei-Fei et. al.

Detection, Segmentation and Fine-grained Localization

Face detection Slides adapted Grauman & Liebe’s tutorial

Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.

Pedestrian Detection and Localization

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.

Real-Time Detection, Alignment and Recognition of Human Faces Rogerio Schmidt Feris Changbo Hu Matthew Turk Pattern Recognition Project June 12, 2003.

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Describing People: A Poselet-Based Approach to Attribute Classification.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,

Data Driven Attributes for Action Detection

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Action Recognition ECE6504 Xiao Lin.

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Digit Recognition using SVMS

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Rob Fergus Computer Vision

Presentation transcript:

Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe Systems Inc.

Goal: Extract attributes from images of people

Who has long hair?Who has long hair?

Who has short pants?Who has short pants?

Male or female?Male or female?

Prior work on poselets and on attributes

Prior work on PoseletsPrior work on Poselets Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11]

Prior work on PoseletsPrior work on Poselets Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11]

Prior work on PoseletsPrior work on Poselets Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11]

Prior work on PoseletsPrior work on Poselets Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11]

Prior work on PoseletsPrior work on Poselets Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Prior work on AttributesPrior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]

Poselets for Attribute Classification

Male or female?Male or female?

Gender recognition is easier if we factor out the pose

Poselets [Bourdev & Malik ICCV09][Bourdev & Malik ICCV09]

Poselets Examples may differ visually but have common semantics

How do we train a poselet?How do we train a poselet?

Finding correspondences at training timeFinding correspondences at training time Given part of a human pose How do we find a similar pose configuration in the training set?

We use keypoints to annotate the joints, eyes, nose, etc. of people Left Hip Left Shoulder Finding correspondences at training timeFinding correspondences at training time

Residual Error Finding correspondences at training timeFinding correspondences at training time

Training poselet classifiersTraining poselet classifiers Residual Error: Given a seed patch 2. Find the closest patch for every other person 3. Sort them by residual error 4. Threshold them

Training poselet classifiersTraining poselet classifiers 1. Given a seed patch 2. Find the closest patch for every other person 3. Sort them by residual error 4. Threshold them 5. Use them as positive training examples to train a linear SVM with HOG features

Attribute Classification Algorithm at Test Time

Goal: Extract attributes of this person

Target person bounds Bounds of other nearby people Input:

Step 1: Detect poselet activations [Bourdev et al, ECCV10]

Step 2: Cluster the activations [Bourdev et al, ECCV10]

Step 3: Predict person bounds [Bourdev et al, ECCV10]

Step 4: Identify the correct cluster Max-flow in bipartite graph

Poselet Activations Start with its poselet activationsStart with its poselet activations

Features Features Poselet Activations Pyramid HOG LAB histogram Skin features Hands-skin Legs-skin Poselet patch B.* C Skin mask Arms mask

Poselet Activations Features Poselet-level Attribute Classifiers Attribute Classification OverviewAttribute Classification Overview

Poselet Activations Features Poselet-level Attribute Classifiers Person-level Attribute Classifiers Attribute Classification OverviewAttribute Classification Overview

Poselet Activations Features Poselet-level Attribute Classifiers Person-level Attribute Classifiers Context-level Attribute Classifiers Attribute Classification OverviewAttribute Classification Overview

Results

Our datasetOur dataset Source: VOC 2010 trainval for Person + H3D ~8000 annotations (4000 train test) 9 binary attributes specified by 5 independent annotators via AMT Ground truth label: If 4 of the 5 agree Dataset will be made publicly available

Visual search on our test setVisual search on our test set “Female” “Wears hat”

“Has long hair” “Wears glasses”

“Wears shorts” “Has long sleeves”

“Doesn’t have long sleeves”

Our baselineOur baseline Canny-modulated HOG with SPM kernel [Lazebnik et al CVPR06] To help the baseline trained separate SPM for four viewpoints: For each attribute we pick the best SPM as our baseline Full viewHead zoomUpper bodyLegs

Precision/recall on our test setPrecision/recall on our test set Label frequency - - ___ SPM ___ No context ___ Full Model

State-of-the-art Gender RecognitionState-of-the-art Gender Recognition We outperform Cognitec (top-notch face recognizer) We outperform any gender recognizer based on frontal faces (are there others?) 61% of our test have frontal faces. Even with perfect classification of frontal faces, max AP=80.5% vs. our AP of 82.4%

Men most confused as womenConfusions Women most confused as men baseball hatlong hairhair hidden

Short pants most confused to be long pants Non-T-shirt most confused to be T-shirt annotation errors Are these pants short?wrong person occlusion

Best poselets per attributeBest poselets per attribute Gender: Long Hair: Wears glasses:

“A woman with long hair, glasses and long pants”(??) We can describe a picture of a personWe can describe a picture of a person

Conclusion

How poselets help in high-level visionHow poselets help in high-level vision The image is a complex function of the viewpoint, pose, appearance, etc. Poselets decouple pose and camera view from appearance

Google “poselets” to get:Google “poselets” to get: The set of published poselet papers H3D data set + Matlab tools Java3D annotation tool + video tutorial Matlab code to detect people using poselets Our latest trained poselets

“A man with short hair, glasses, short sleeves and shorts” “A man with short hair and long sleeves” “A person with short hair, no hat and long sleeves” “A woman with long hair, glasses, short sleeves and long pants” “A person with long pants” Describing peopleDescribing people “A computer vision professor who likes machine learning” Failure mode Poselets websitePoselets website The set of published poselet papers H3D data set + Matlab tools Java3D annotation tool + video tutorial Matlab code to detect people using poselets Our latest trained poselets