Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Slides:

Advertisements

Similar presentations

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.

Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -

Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Parsing Clothing in Fashion Photographs

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

3/5/2002Phillip Saltzman Video Motion Capture Christoph Bregler Jitendra Malik UC Berkley 1997.

Robust Object Tracking via Sparsity-based Collaborative Model

1 P. Arbelaez, M. Maire, C. Fowlkes, J. Malik. Contour Detection and Hierarchical image Segmentation. IEEE Trans. on PAMI, Student: Hsin-Min Cheng.

Lecture 8: Stereo.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Models for Scene Understanding – Global Energy models and a Style-Parameterized boosting algorithm (StyP-Boost) Jonathan Warrell, 1 Simon Prince, 2 Philip.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Robust Higher Order Potentials For Enforcing Label Consistency

Stereo & Iterative Graph-Cuts Alex Rav-Acha Vision Course Hebrew University.

Object Detection using Histograms of Oriented Gradients

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Stereo Computation using Iterative Graph-Cuts

Perceptual Organization: Segmentation and Optical Flow.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER 2011 Qian Zhang, King Ngi Ngan Department of Electronic Engineering, the Chinese university.

Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.

What, Where & How Many? Combining Object Detectors and CRFs

Segmentation and tracking of the upper body model from range data with applications in hand gesture recognition Navin Goel Intel Corporation Department.

Manhattan-world Stereo Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.

Automatic User Interaction Correction via Multi-label Graph-cuts Antonio Hernández-Vela, Carlos Primo and Sergio Escalera Workshop on Human Interaction.

Generic object detection with deformable part-based models

Graph-based Segmentation

The Three R’s of Vision Jitendra Malik.

Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.

Structure from images. Calibration Review: Pinhole Camera.

Mutual Information-based Stereo Matching Combined with SIFT Descriptor in Log-chromaticity Color Space Yong Seok Heo, Kyoung Mu Lee, and Sang Uk Lee.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

A General Framework for Tracking Multiple People from a Moving Camera

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Graph Cut Algorithms for Binocular Stereo with Occlusions

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Object Detection with Discriminatively Trained Part Based Models

Stereo Many slides adapted from Steve Seitz.

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

CS 4487/6587 Algorithms for Image Analysis

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.

Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Gaussian Mixture Models and Expectation-Maximization Algorithm.

Non-Ideal Iris Segmentation Using Graph Cuts

Solving for Stereo Correspondence Many slides drawn from Lana Lazebnik, UIUC.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.

More sliding window detection: Discriminative part-based models

Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.

Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.

Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:

CSE 185 Introduction to Computer Vision Stereo 2.

Object detection with deformable part-based models

Nonparametric Semantic Segmentation

Counting In High Density Crowd Videos

Counting in High-Density Crowd Videos

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Automatic User Interaction Correction via Multi-label Graph-cuts

“Traditional” image segmentation

Presentation transcript:

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV 2013 Presented by Yao Lu

Motivation We already have person detectors Also quite good pose detectors Challenges: Pixel-wise segmentation of multiple people Partial occlusion and depth ordering of people. Depth from stereo video is much noisier.

Target Supervised pixel-wise segmentation and pose estimation of multiple people Input: disparity RGB camera video Dataset: from 2 stereoscopic movies

General framework Given an image, use Felzenswalb’s deformable part model to locate bounding box of human For each bounding box detected, run an articulated pose detector. [Y. Yang and D. Ramanan, CVPR 2011] Do pixel-wise segmentation using different cues.

Model for pixel-wise segmentation Label {0, 1, …, L} for each pixel i. L denotes background Cost of assigning a pixel : pose parameters : disparity parameters

Model : Unary term. Cost of pixel i taking label xi : Spatial smoothness cost of assigning label xi and xj to neighboring pixels i and j. : Temporal smoothness cost. Goal: Estimate pose parameters first. Then:

Estimation of Ɵ To simplify the optimization of the whole cost function Articulated pose mask Step 1. Obtain candidate bounding boxes of people using Felzenswalb’s part-based person detector, with HOG feature on grayscale image and disparity maps. Step 2. Estimate pose within each bounding box using a pose detector [Y. Yang and D. Ramanan, CVPR 2011]

Estimation of Ɵ (cont) After step 1+2, 18 body parts for each person are located. 10 annotated joints, head, neck, 2 shoulders, 2 elbows, 2 wrists, 2 hips Each part is characterized by a set of mixtures. An average mask is obtained for each mixture component from pixel-wise annotation The value at pixel i: frequency of that pixel belongs to the person Just limit the detection to the upper body Then given an estimated pose at test time

Model : Unary term. Cost of pixel i taking label xi : Spatial smoothness cost of assigning label xi and xj to neighboring pixels i and j. : Temporal smoothness cost. Estimate pose parameters first. Then:

Unary term Occlusion-based unary costs: : likelihood of pixel i taking label l. Use a depth-ordering term Sufficient confidence for label l Low evidence for other labels m

Unary term Define: : Articulated pose mask already described. : Disparity potential to encode depth ordering Set for the background pixels.

Model : Unary term. Cost of pixel i taking label xi : Spatial smoothness cost of assigning label xi and xj to neighboring pixels i and j. : Temporal smoothness cost. Estimate pose parameters first. Then:

Spatial smoothness cost Spatial smoothness cost of assigning xi, xj to pixel i, j d: disparity value v: motion vector pb: Pb feature Pb feature: difference of brightness, color and texture gradients histograms (oriented) P. Arbelaez, M. Maire, C. Fowlkes, J. Malik. Contour detection and hierarchical image segmentation. PAMI 2011 Motion vector describes the 2D transformation from a frame to the next. Use block matching.

Temporal smoothness cost Define temporal smoothness cost as difference of Pb features between pixel i and k connected temporally by the motion vector vi Put temporal and spatial smoothness cost together we get:

Inference To infer Step 1. Estimate optimal disparity parameters for all the people in a frame Approximate by only using unary term. Disparity is inversely related to depth and determine front-to-back order of people. Using this cue to limit searching set. Step 2. Compute with fixed. for each pixel using α-expansion algorithm. Y. Boykov, O. Veksler, R. Zabih. Fast approximate energy minimization via graph cuts. PAMI 2001.

Experiments