Ankit Gupta CSE 590V: Vision Seminar. Goal Articulated pose estimation ( ) recovers the pose of an articulated object which consists of joints and rigid.

Slides:

Advertisements

Similar presentations

Using Strong Shape Priors for Multiview Reconstruction Yunda SunPushmeet Kohli Mathieu BrayPhilip HS Torr Department of Computing Oxford Brookes University.

Advertisements

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.

Mean-Field Theory and Its Applications In Computer Vision1 1.

1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.

Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Presenter: Duan Tran (Part of slides are from Pedro’s)

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

The chains model for detecting parts by their context Leonid Karlinsky, Michael Dinerstein, Daniel Harari, and Shimon Ullman.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Articulated People Detection and Pose Estimation: Reshaping the Future

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Part Based Models Andrew Harp. Part Based Models Physical arrangement of features.

Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Face Detection, Pose Estimation, and Landmark Localization in the Wild

Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

Bangpeng Yao and Li Fei-Fei

Steerable Part Models Hamed Pirsiavash and Deva Ramanan

Articulated Pose Estimation with Flexible Mixtures of Parts

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Yuanlu Xu Advisor: Prof. Liang Lin Person Re-identification by Matching Compositional Template with Cluster Sampling.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

2.Our Framework 2.1. Enforcing Temporal Consistency by Post Processing  Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible.

Robust Object Tracking via Sparsity-based Collaborative Model

2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.

Cascaded Models for Articulated Pose Estimation

Parsing Human Motion with Stretchable Models

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Good morning, everyone, thank you for coming to my presentation.

Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.

Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.

What, Where & How Many? Combining Object Detectors and CRFs

Proxemics Recognition

A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.

Efficient Algorithms for Matching Pedro Felzenszwalb Trevor Darrell Yann LeCun Alex Berg.

Multiscale Symmetric Part Detection and Grouping Alex Levinshtein, Sven Dickinson, University of Toronto and Cristian Sminchisescu, University of Bonn.

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Object Detection with Discriminatively Trained Part Based Models

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Efficient Matching of Pictorial Structures By Pedro Felzenszwalb and Daniel Huttenlocher Presented by John Winn.

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr

Human Re-identification by Matching Compositional Template with Cluster Sampling Yuanlu Xu 1, Liang Lin 1, Wei-Shi Zheng 1, Xiaobai Liu 2 Abstract This.

Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Probabilistic Inference Lecture 2 M. Pawan Kumar Slides available online

Tracking Hands with Distance Transforms Dave Bargeron Noah Snavely.

Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Learning Mid-Level Features For Recognition

Recognizing Deformable Shapes

ISOMAP TRACKING WITH PARTICLE FILTERING

Bilinear Classifiers for Visual Recognition

Learning to Combine Bottom-Up and Top-Down Segmentation

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

High-Level Vision Object Recognition II.

Presentation transcript:

Ankit Gupta CSE 590V: Vision Seminar

Goal Articulated pose estimation ( ) recovers the pose of an articulated object which consists of joints and rigid parts Slide taken from authors, Yang et al.

Classic Approach Part Representation Head, Torso, Arm, Leg Location, Rotation, Scale Marr & Nishihara 1978 Slide taken from authors, Yang et al.

Classic Approach Part Representation Head, Torso, Arm, Leg Location, Rotation, Scale Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005 Marr & Nishihara 1978 Pictorial Structure Unary Templates Pairwise Springs Slide taken from authors, Yang et al.

Classic Approach Part Representation Head, Torso, Arm, Leg Location, Rotation, Scale Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005 Marr & Nishihara 1978 Andriluka etc Eichner etc Johnson & Everingham 2010 Sapp etc Singh etc Tran & Forsyth 2010 Epshteian & Ullman 2007 Ferrari etc Lan & Huttenlocher 2005 Ramanan 2007 Sigal & Black 2006 Wang & Mori 2008 Pictorial Structure Unary Templates Pairwise Springs Slide taken from authors, Yang et al.

Pictorial Structures for Object Recognition Pedro F. Felzenszwalb, Daniel P. Huttenlocher IJCV, 2005

Pictorial structure for Face

Pictorial Structure Model Slide taken from authors, Yang et al.

Pictorial Structure Model Slide taken from authors, Yang et al.

Pictorial Structure Model Slide taken from authors, Yang et al.

Using this Model

Test phaseTrain phase

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features Test phaseTrain phase

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features Test phaseTrain phase Standard Structural SVM formulation - Standard solvers available (SVMStruct)

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features Test phaseTrain phase Standard Structural SVM formulation - Standard solvers available (SVMStruct) Given: - Image (I) Need to compute - Part locations (L) Algorithm - L* = arg max (S(I,L))

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features Test phaseTrain phase Standard Structural SVM formulation - Standard solvers available (SVMStruct) Given: - Image (I) Need to compute - Part locations (L) Algorithm - L* = arg max (S(I,L)) Standard inference problem - For tree graphs, can be exactly computed using belief propagation

Yi Yang & Deva Ramanan University of California, Irvine

Problems with previous methods: Wide Variations In-plane rotationForeshortening ScalingOut-of-plane rotation Intra-category variationAspect ratio Slide taken from authors, Yang et al.

Problems with previous methods: Wide Variations In-plane rotationForeshortening ScalingOut-of-plane rotation Intra-category variationAspect ratio Naïve brute-force evaluation is expensive Slide taken from authors, Yang et al.

Our Method – “Mini-Parts” Key idea: “mini part” model can approximate deformations Slide taken from authors, Yang et al.

Example: Arm Approximation Slide taken from authors, Yang et al.

Pictorial Structure Model Slide taken from authors, Yang et al.

The Flexible Mixture Model Slide taken from authors, Yang et al.

Our Flexible Mixture Model Slide taken from authors, Yang et al.

Our Flexible Mixture Model Slide taken from authors, Yang et al.

Our Flexible Mixture Model Slide taken from authors, Yang et al.

Co-occurrence “Bias” Slide taken from authors, Yang et al.

Co-occurrence “Bias”: Example Let part i : eyes, mixture m i = {open, closed} part j : mouth,mixture m j = {smile, frown}

Co-occurrence “Bias”: Example b (closed eyes, smiling mouth) b (open eyes, smiling mouth) vs Let part i : eyes, mixture m i = {open, closed} part j : mouth,mixture m j = {smile, frown}

Co-occurrence “Bias”: Example b (closed eyes, smiling mouth) b (open eyes, smiling mouth) < learnt Let part i : eyes, mixture m i = {open, closed} part j : mouth,mixture m j = {smile, frown}

Using this Model

Test phaseTrain phase

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features - Co-occurrence Test phaseTrain phase Standard Structural SVM formulation - Standard solvers available (SVMStruct)

Given: - Images (I) - Known locations of the parts (L) Need to learn - Unary templates - Spatial features - Co-occurrence Test phaseTrain phase Standard Structural SVM formulation - Standard solvers available (SVMStruct) Given: - Image (I) Need to compute - Part locations (L) - Part mixtures (M) Algorithm - (L*,M*) = arg max (S(I,L,M)) Standard inference problem - For tree graphs, can be exactly computed using belief propagation

Results

Achieving articulation Slide taken from authors, Yang et al.

Achieving articulation Slide taken from authors, Yang et al.

Qualitative Results

Failure cases To be put

Benchmark Datasets PARSE Full-body nan/papers/parse/index.html BUFFY Upper-body gg/data/stickmen/index.html Slide taken from authors, Yang et al.

Quantitative Results on PARSE 1 second per image Image Parse Testset MethodHeadTorsoU. LegsL. LegsU. ArmsL. ArmsTotal Ramanan Andrikluka Johnson Singh Johnson Our Model % of correctly localized limbs Slide taken from authors, Yang et al.

Quantitative Results on BUFFY All previous work use explicitly articulated models Subset of Buffy Testset MethodHeadTorsoU. ArmsL. ArmsTotal Tran Andrikluka Eichner Sapp 2010a Sapp 2010b Our Model % of correctly localized limbs Slide taken from authors, Yang et al.

More Parts and Mixtures Help 14 parts (joints)27 parts (joints + midpoints) Slide taken from authors, Yang et al.

Discussion Possible limitations? Something other than human pose estimation? Can do more useful things with the Kinect? Can this encode occlusions well?

References Code and benchmark datasets available