Presenter: Jae Sung Park

Slides:



Advertisements
Similar presentations
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Advertisements

Image Repairing: Robust Image Synthesis by Adaptive ND Tensor Voting IEEE Computer Society Conference on Computer Vision and Pattern Recognition Jiaya.
Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Joydeep Biswas, Manuela Veloso
11/06/14 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Real-Time Human Pose Recognition in Parts from Single Depth Images
Silhouette-based Object Phenotype Recognition using 3D Shape Priors Yu Chen 1 Tae-Kyun Kim 2 Roberto Cipolla 1 University of Cambridge, Cambridge, UK 1.
Hilal Tayara ADVANCED INTELLIGENT ROBOTICS 1 Depth Camera Based Indoor Mobile Robot Localization and Navigation.
Semantic Texton Forests for Image Categorization and Segmentation We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד.
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012.
Learning Visual Similarity Measures for Comparing Never Seen Objects Eric Nowak, Frédéric Jurie CVPR 2007.
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.
Lecture Pose Estimation – Gaussian Process Tae-Kyun Kim 1 EE4-62 MLCV.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
AlgirdasBeinaravičius Gediminas Mazrimas Salman Mosslem.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Kinect Case Study CSE P 576 Larry Zitnick
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Realtime Object Recognition Using Decision Tree Learning Implemented with a Sony AIBO Robot CS 510 Presentation Chris Jorgensen.
Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Learning the space of time warping functions for Activity Recognition Function-Space of an Activity Ashok Veeraraghavan Rama Chellappa Amit K. Roy-Chowdhury.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
12/01/11 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Sean Ryan Fanello. ^ (+9 other guys. )
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)
Computer Vision James Hays, Brown
ICPR/WDIA-2012 High Quality Novel View Synthesis Based on Low Resolution Depth Image and High Resolution Color Image Jui-Chiu Chiang, Zheng-Feng Liu, and.
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
A General Framework for Tracking Multiple People from a Moving Camera
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
WSCG2008, Plzen, 04-07, Febrary 2008 Comparative Evaluation of Random Forest and Fern classifiers for Real-Time Feature Matching I. Barandiaran 1, C.Cottez.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Computer Vision Lab Seoul National University Keyframe-Based Real-Time Camera Tracking Young Ki BAIK Vision seminar : Mar Computer Vision Lab.
Algirdas Beinaravičius Gediminas Mazrimas Salman Mosslem.
Markov Random Fields Probabilistic Models for Images
Human pose recognition from depth image MS Research Cambridge.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Optimal Sampling Strategies for Multiscale Stochastic Processes Vinay Ribeiro Rolf Riedi, Rich Baraniuk (Rice University)
Duy & Piotr. How to reconstruct a high quality image with the least amount of samples per pixel the least amount of resources And preserving the image.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Image Segmentation Shengnan Wang
Image Segmentation Superpixel methods Speaker: Hsuan-Yi Ko.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
COMP24111: Machine Learning Ensemble Models Gavin Brown
11/05/15 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
RGB-D Images and Applications
EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao
COM24111: Machine Learning Decision Trees Gavin Brown
CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.
ICCV 2007 National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Half Quadratic Analysis for Mean Shift: with Extension.
Effect of Hough Forests Parameters on Face Detection Performance: An Empirical Analysis M. Hassaballah, Mourad Ahmed and H.A. Alshazly Department of Mathematics,
Microsoft Kinect Jason Wong Pierce Nichols Rick Berggreen Tri Le.
Microsoft Kinect How does a machine infer body position?
Demo.
How the Kinect Works Computational Photography
COMP61011 : Machine Learning Ensemble Models
Real-Time Human Pose Recognition in Parts from Single Depth Image
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
Presentation transcript:

Presenter: Jae Sung Park COMP 790-133 Human Pose Recognition Real-Time Human Pose Recognition in Parts from Single Depth Images Presenter: Jae Sung Park

Joint Position Proposal Results Introduction Data Acquisition Body Part Inference Joint Position Proposal Results In today’s talk, I am going to start with introduction, then go over the details of the paper, And finally I will show some results of this paper.

Poselets [Bourdev & Malik, 2009] Motivation Human body tracking has many applications gaming, telepresence, security, … RGB or intensity camera? Higher computational cost Human joint tracking can be used in many applications. Gaming is one of these applications like this. And the authors mention that using RGB or intensity camera is not appropriate For real time applications Because of higher computational cost. One example of human pose recognition using RGB camera Is an algorithm called Poselets, Poselets [Bourdev & Malik, 2009] use RGB camera Kinect game

Motivation: Kinect Kinect sensor Gets color & depth images ~30Hz frame rate Color sensor Depth sensor Color image Depth image Point cloud Position Color This is how the Kinect sensor gets color and depth images. The frame rate is about 30 frames per second, Designed for real time application.

Goal Body joint position proposal from single depth images The goal of this paper is to find the 3D joint positions from a single depth image. For training, depth images are synthesized of captured through the real Kinect, And body parts are labelled by per-pixel operations, And finally the 3D positions are proposed using mean-shift algorithm.

Joint Position Proposal Results Introduction Data Acquisition Body Part Inference Joint Position Proposal Results The first step of this algorithm is data acquisition.

Need of Data Synthesis Lack of real world data Variation of clothing, hair, body shape, camera position Generally, it is difficult to have a huge dataset from real Kinect. It would be really tedious task for human to label each body part in every images. Also, the shape of clothing, hair style, body shape and camera position Will make different looking depth images, To deal with every aspects, the size of dataset should be large.

Generation of Depth Images Camera position CMU Mocap Depth rendering So they synthesized depth images and body part labels from 3D character models and motions. CMU motion capture data or CMU mocap is used for motion generations. The motions are defined by joint angles of human. They used various character models with various clothing and hair styles, Then retargeted to the mocap data. Also they used different virtual camera locations To get depth images and body part labelled color images. Retarget Color rendering Via Texture mapping Character Models

Joint Position Proposal Results Introduction Data Acquisition Body Part Inference Joint Position Proposal Results Now I am going to talk about body part inference method.

Body Part Labeling 31 Body parts Variable depending on applications 5 for head and neck 16 for upper body 10 for lower body Variable depending on applications Sufficiently small, not too large They used 31 different body parts labelling. The labelling scheme can be different depending on applications. The number of different parts should be sufficiently small In order to localize body joints, And it also should be not too large Because the large number of body parts labelling will waste capacity of the classifier.

Depth Image Features where : feature function : depth value at pixel x in image I : two offsets in pixel space : normalization factor They defined the feature as this equation. D I of x is the depth value in the depth image at 2D pixel location x U and v are 2d offset vectors in the pixel space 1 over d of x is used for normalization.

Depth Image Features: Examples For example, The yellow cross denotes x Two arrows for theta 1 and theta 2 denote the offset vectors u and v The feature measures the depth difference between the two circled points. The left figure shows large responses because they measure depth difference between human and background depth The right figure shows small responses because of small difference between two human body pixels or between two background pixels. Large responses Small responses

Depth Image Features: Properties 1. ensures depth invariant One of the property of this feature is that It is a depth invariant feature. To make it depth invariant, normalization factor is multiplied by u and v As you can see in this figure the offset length has been changed.

Depth Image Features: Properties 2. Translation invariant measures depth difference Uses two offset vectors 3. Computationally efficient 3 image pixel reads, 5 arithmetic operations Direct implementation on GPU It is also translational invariant because First it measures depth difference, And second it uses two offset vectors. The most important thing is that this feature is computationally efficient It uses 3 image pixel fetches and 5 arithmetic operations. Which can be directly implemented on GPU.

Randomized Decision Forests Each internal node has Feature Threshold Each leaf node has distribution over body-part label They used randomized decision forests for classification. Each internal node has feature and threshold. Evaluate the feature f sub theta and If the value is less than threshold, move to the left node Otherwise move to the right node And repeat recursively until reaching to a leaf node. Leaf node has probabilistic distribution of body part label given pixel x.

Randomized Decision Forests: Learning Random subset of pixels Random subset of splitting candidates where Partition pixels into left/right subsets The learning process is as follows, First, pick a random subset of pixel locations Second, pick random candidate pairs of theta and tau, Third, partition the pixels into left and right subsets

Randomized Decision Forests: Learning Compute the giving the largest information gain where is entropy Recurse for left/right subsets Repeat 1-5 for generating several trees The next step is to find the best pair of theta and tau Which gives the largest information gain These steps are repeated until terminating condition For example reaching maximum depth. Then finally generate multiple trees.

Randomized Decision Forests: Inference Starting from each root Move to left or right node according to feature and threshold until reaching leaf node Average distribution over all trees Body part label inference is followed as I said, Evaluate function value and decide which direction to move For each tree, Then average the probabilistic distributions over all trees.

Inference Results Synthetic data Real data These are some inference results. The colors are showing the label color which has the highest probability.

Inference Results

Joint Position Proposal Results Introduction Data Acquisition Body Part Inference Joint Position Proposal Results The final step is joint position proposal.

Joint Position Proposal “Local mode-finding approach based on mean shift with weighted Gaussian kernel” Mean shift is an algorithm for locating the maxima of a density function. where : pre-learned per-part bandwidth

Joint Position Proposal Modes are on surface of body Push back in z direction by a pre-learned parameter = 0.039m = 0.065m

Introduction Data Acquisition Body Part Inference Joint Position Proposal Results

Speed Learning: Inference: 1 day on a 1000 core cluster training 3 trees, depth 20 from 1 million images Inference: Under 5ms per frame on Xbox 360 GPU

Depth of trees

Maximum Probe Offset Maximum Probe Offset = max value for offset u and v

Comparison with Other Methods Comparison with [Ganapathi et al., 2010]