1. Introduction Humanising GrabCut: Learning to segment humans using the Kinect Varun Gulshan, Victor Lempitksy and Andrew Zisserman Dept. of Engineering.

Slides:

Advertisements

Similar presentations

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Advertisements

Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

From Interactive to Semantic Image Segmentation Varun Gulshan Supervisors: Prof. Andrew Blake Prof. Andrew Zisserman 20 Jan 2012.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Ming-Ming Cheng 1 Ziming Zhang 2 Wen-Yan Lin 3 Philip H. S. Torr 1 1 Oxford University, 2 Boston University 3 Brookes Vision Group Training a generic objectness.

Machine learning continued Image source:

Face Alignment with Part-Based Modeling

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Models for Scene Understanding – Global Energy models and a Style-Parameterized boosting algorithm (StyP-Boost) Jonathan Warrell, 1 Simon Prince, 2 Philip.

Robust Higher Order Potentials For Enforcing Label Consistency

1 How do ideas from perceptual organization relate to natural scenes?

Presented By : Murad Tukan

What, Where & How Many? Combining Object Detectors and CRFs

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Generic object detection with deformable part-based models

Classification 2: discriminative models

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

A General Framework for Tracking Multiple People from a Moving Camera

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Object Detection with Discriminatively Trained Part Based Models

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.

Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.

Human pose recognition from depth image MS Research Cambridge.

Stable Multi-Target Tracking in Real-Time Surveillance Video

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.

Associative Hierarchical CRFs for Object Class Image Segmentation

A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?

Human Detection Method Combining HOG and Cumulative Sum based Binary Pattern Jong Gook Ko', Jin Woo Choi', So Hee Park', Jang Hee You', ' Electronics and.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

More sliding window detection: Discriminative part-based models

DISCRIMINATIVELY TRAINED DENSE SURFACE NORMAL ESTIMATION ANDREW SHARP.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.

Che-An Wu Background substitution. Background Substitution AlphaMa p Trimap Depth Map Extract the foreground object and put into another background Objective.

ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Learning Mid-Level Features For Recognition

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

A Tutorial on HOG Human Detection

HOGgles Visualizing Object Detection Features

Calculate HOC on Depth and HOG on RGB and concatenate them

Computer Vision James Hays

Rob Fergus Computer Vision

RGB-D Image for Scene Recognition by Jiaqi Guo

“The Truth About Cats And Dogs”

Outline Background Motivation Proposed Model Experimental Results

Presentation transcript:

1. Introduction Humanising GrabCut: Learning to segment humans using the Kinect Varun Gulshan, Victor Lempitksy and Andrew Zisserman Dept. of Engineering Science, University of Oxford, UK 2. Top-Down Learning 3. Bottom up refinement: Local GrabCut4. Evaluation Idea: Learn to segment humans in RGB images, using a large dataset of RGB-D images during training. Segmentation Pipeline Original imageBounding box detection Top-down Segmentation Bottom up refinement Dataset Acquisition RGB imageKinect scene labelsCleaned up Ground truth Dataset Local HOGLocal ImageLocal ground truth mask The sparse coding toolbox of [2] is used to learn a dictionary with 2500 elements. A linear classifier is trained to predict the segmentation mask at each location, given the sparsely coded local HOG descriptor. Liblinear is used for training. Training statistics Dimensionality of h i = 325, x i = Local mask y i is scaled to 40x40 pixels => 1600 independent SVM classifiers are trained. Each w l consists of 2500 parameters, total of 1600x2500 = 4 million parameters learned. Total of 180,000 pairs of (h i,y i ) are extracted from the training set (training images are also flipped left-right to generate more data). All the 1600 SVM’s are trained approximately in a total of 1 hour using LibLinear. MethodTrain(%)Test(%) Box+GC76.5 ± ± 0.6 Box+LocalGC78.0 ± ± 0.6 LinSVM73.9 ± ± 0.3 SpSVM86.1 ± ± 0.2 SpSVM+Pos89.8 ± ± 0.2 SpSVM+Pos+GC87.3 ± ± 0.3 SpSVM+Pos+LocalGC91.8 ± ± 0.2 MethodTrain(%)Test(%) SpSVM+Pos+LocalGC80.6± ±0.6 Ladicky[1]70.4± ±0.5 Ladicky+Detection71.4± ±0.5 In order to train the top-down segmentation, a large training corpus that captures variations in human poses, clothing and backgrounds, is captured using the Kinect. OpenNI ( libraries are used for segmenting humans from depth images, and registration with the RGB camera. Dataset limited to indoor locations, as the Kinect works indoor only. Sample images and segmentations from our dataset Testing At test time, predicted segmentations in overlapping local regions are combined using majority voting. Training Spatial SVM Level 1 Level 2 Level 3 Level 4 Separate SVMs are learnt for each vertical level, to make the training task easier..... Top-down segmentation Local color model window After graph-cut segmentation Unary terms Similar to SnapCut[4], local windows of size 61x61 pixels are slid over the image. Windows that intersect the segmentation boundary are used to estimate color models. Unary terms from overlapping local windows are averaged Energy function: Unary term obtained using local color models Unary term penalising deviations from top-down segmentation Repeat 4 times References: [1] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. Associative hierarchical crfs for object class image segmentation. In Proceedings, ICCV, [2] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Proceedings, ICML, 2009 [3] P. F. Felzenszwalb, R. B. Grishick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, [4] X. Bai, J. Wang, D. Simons, and G. Sapiro. Video SnapCut: Robust video object cutout using localized classifiers. In Proc. ACM SIGGRAPH, Box+GCSpSVM+PosSpSVM+Pos+LocalGC Results using ground truth bounding boxes (Overlap score with GT) Results using bounding box detectors of [3] Qualitative results 3386 segmented humans in total (1930 training and 1456 in testing). 10 human subjects across 4 indoor locations for training, and 6 human subjects across 4 indoor locations for testing. Human subjects and locations are disjoint between the train and test sets. Available at: The bounding box around the object is divided into dense overlapping local regions, and independent per-pixel classifiers are trained to predict the label of every pixel within the local region. Non-linear mapping via sparse coding [2] Local regions are described using HOG descriptors h i, that are sparsely coded onto a dictionary to give our feature vector x i.