1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Bayesian Belief Propagation
Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Joydeep Biswas, Manuela Veloso
Real-Time Human Pose Recognition in Parts from Single Depth Images
Multiple Frame Motion Inference Using Belief Propagation Jiang Gao Jianbo Shi Presented By: Gilad Kapelushnik Visual Recognition, Spring 2005, Technion.
Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012.
Face Alignment with Part-Based Modeling
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Parallel Tracking and Mapping for Small AR Workspaces Vision Seminar
Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.
Real Time Motion Capture Using a Single Time-Of-Flight Camera
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
SA-1 Body Scheme Learning Through Self-Perception Jürgen Sturm, Christian Plagemann, Wolfram Burgard.
Kinect Case Study CSE P 576 Larry Zitnick
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
1 Face Tracking in Videos Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa.
Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER 2011 Qian Zhang, King Ngi Ngan Department of Electronic Engineering, the Chinese university.
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
3D Fingertip and Palm Tracking in Depth Image Sequences
A General Framework for Tracking Multiple People from a Moving Camera
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
3D SLAM for Omni-directional Camera
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Person detection, tracking and human body analysis in multi-camera scenarios Montse Pardàs (UPC) ACV, Bilkent University, MTA-SZTAKI, Technion-ML, University.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Recognizing Action at a Distance Alexei A. Efros, Alexander C. Berg, Greg Mori, Jitendra Malik Computer Science Division, UC Berkeley Presented by Pundik.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH Michael Black, Brown University.
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
Human pose recognition from depth image MS Research Cambridge.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
Fast Semi-Direct Monocular Visual Odometry
RGB-D Images and Applications
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Presenter: Jae Sung Park
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
SIFT.
Microsoft Kinect How does a machine infer body position?
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Tracking Objects with Dynamics
Real-Time Human Pose Recognition in Parts from Single Depth Image
Identifying Human-Object Interaction in Range and Video Data
Online Graph-Based Tracking
SIFT.
CSSE463: Image Recognition Day 30
CSSE463: Image Recognition Day 30
Robust Feature Matching and Fast GMS Solution
Presentation transcript:

1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a Single Depth Images(2011)

Fig From [2]

Why do we need this? Robotics Smart surveillance virtual reality motion analysis Gaming - Kinect

Microsoft Xbox 360 console “You are the controller” Launched - 04/11/10 In the first 60 days on the market sold over 8M units! (Guinness world record)

mocap using markers – expensive Multi View camera systems – limited applicability. Monocular – simplified problems.

Time Of Flight Camera. (TOF) Dense depth High frame rate (100 Hz) Robust to: Lighting shadows other problems.

2.1 previous work 2.2 What’s new? 2.3 Overview 2.4 results 2.5 limitations & future work 2.6 Evaluation

Many many many articles… (Moeslund et al 2006–covered 350 articles…) (2006) (2006) (1998)

 TOF technology  Propagating information up the kinematic chain.  Probabilistic model using the unscented transform.  Multiple GPUs.

1. Probabilistic Model 2. Algorithm Overview:  Model Based Hill Climbing Search  Evidence Propagation  Full Algorithm

15 body parts DAG – Directed Acyclic Graph pose speed range scan DBN– Dynamic Bayesian Network

dynamic Bayesian network (DBN) Assumptions  Use ray casting to evaluate distance from measurement.  Goal: F ind the most likely states, given previous frame MAP, i.e.: Fig From [1]

1.Hill climbing search (HC) 2.Evidence Propagation –EP

Fig From [1] Calculate evaluate likelihood choose best point! Grid around Sample Coarse to fine Grids.

The good: Simple Fast run in parallel in GPUS The Bad: Local optimum Ridges, Plateau, Alleys Can lose track when motion is fast,or occlusions occur.

Also has 3 stages: 1.Body part detection (C. Plagemann et al 2010) 2.Probabilistic Inverse Kinematics 3.Data association and inference

Bottom up approach: 1.Locate interest points with AGEX – Accumulative Geodesic Extrema. 2.Find orientation. 3.Classify the head, foots and hands using local shape descriptors. Fig From [3]

Results: Fig From [3]

 Assume Correspondence  Need new MAP conditioned on.  Problem – isn’t linear!  Solution: Linearize with the unscented Kalman filter.  Easy to determine.

X’>X best ?

Experiments: 28 real depth image sequences. Ground Truth - tracking markers., – real marker position – estimated position perfect tracks. fault tracking. Compared 3 algorithms: EP, HC, HC+EP.

best – HC+EP, worse – EP. Runs close to real time. HC: 6 frames per second. HC+EP: 4-6 frames per second. Fig From [1]

HC HC+EP Lose track Extreme case – 27: Fig From [1]

Limitations: Manual Initialization. Tracking more than one person at a time. Using temporal data – consume more time, reinitialization problem. Future work: improving the speed. combining with color cameras fully automatic model initialization. Track more than 1 person.

Well Written Self Contained Novel combination of existing parts New technology Achieving goals (real time) Missing examples on probabilistic model. Not clear how is defined Extensively validated: Data set and code available not enough visual examples in article No comparison to different algorithms

2.1 previous work 2.2 What’s new? 2.3 Overview 2.4 results 2.5 limitations & future work 2.6 Evaluation

 Same as Article [1].

 Using no temporal information – robust and fast (200 frames per second).  Object recognition approach.  per pixel classification.  Large and highly varied training dataset. Fig From [2]

1. Database construction 2. Body part inference and joint proposals: Goals: computational efficiency and robustness

Pose estimation is often overcome lack of training data… why??? Huge color and texture variability. Computer simulation don’t produce the range of volitional motions of a human subject.

Fig From [2]

1. Body part labeling 2. Depth image features 3. Randomized decision forests 4. Joint position proposals

31 body parts labeled. The problem now can be solved by an efficient classification algorithms. Fig From [2]

Simple depth comparison features :(1) – depth at pixel x in image I, offset normalization - depth invariant. computational efficiency: no preprocessing. Fig From [2]

How does it work? Node = feature Classify pixel x: Fig From [2] Pixel x

Training Algorithm: 1M Images – 2000 pixels Per image *H-antropy  Training 3 trees, depth 20, 1M images~ 1 day (1000 core cluster) 1M images*2000pixels*2000 *50 =

Fig From [2] Trained tree:

Local mode finding approach based on mean shift with a weighted Gaussian kernel. Density estimator: Fig From [4]

Experiments: 8800 frames of real depth images synthetic depth images. Also evaluate Article [1] dataset. Measures : 1. Classification accuracy – confusion matrix. 2. joint accuracy –mean Average Precision (mAP) results within D=0.1m –TP.

Fig From [2]

high correlation between real and synthetic. Depth of tree – most effective Fig From [2]

Comparing the algorithm on: real set (red) – mAP ground truth set (blue) – mAP mAP – upper body Fig From [2]

Comparing algorithm to ideal Nearest Neighbor matching, and realistic NN - Chamfer NN. Fig From [2]

Comparison to Article[1]: Run on the same dataset Better results (even without temporal data) Runs 10x faster. Fig From [2]

Full rotations and multiple people Right-left ambiguity mAP of ( good for our uses) Result Video Fig From [2]

Faster proposals When using simple bottom-up clustering instead of mean shift: Mean shift: 50fps mAP. Simple cluster: 200fps mAP.

Future work: better synthesis pipeline Is there efficient approach that directly regress joint positions? (already done in future work - Efficient offset regression of body joint positions) Efficient offset regression of body joint positions

Well Written Self Contained Novel combination of existing parts New technology Achieving goals (real time) Extensively validated: Used in real console Many results graphs and examples (Another pdf of supplementary material) Broad comparison to other algorithms data set and code not available

[1] Real Time Motion Capture Using a Single TOF Camera (V. Ganapathi et al. 2010) [2] Real Time Human Pose Recognition In Parts Using a Single Depth Images(Shotton et al. & Xbox Incubation 2011) [3] Real time identification and localization of body parts from depth images (C. Plagemann et al. 2010) [4] Computer Graphics course (046746), Technion.