Sean Ryan Fanello. ^ (+9 other guys. )

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Miroslav Hlaváč Martin Kozák Fish position determination in 3D space by stereo vision.
Loris Bazzani*, Marco Cristani*†, Alessandro Perina*, Michela Farenzena*, Vittorio Murino*† *Computer Science Department, University of Verona, Italy †Istituto.
Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.
Silhouette-based Object Phenotype Recognition using 3D Shape Priors Yu Chen 1 Tae-Kyun Kim 2 Roberto Cipolla 1 University of Cambridge, Cambridge, UK 1.
January 29, 2007Videometrics IX, Electronic Imaging 2007, San Jose, CA, U.S.A. Real-Time Range Imaging by Phase-Stamp Method Using Correlation Image Sensor.
Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012.
An Overview of Machine Learning
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Color Compatibility From Large Datasets Peter O’Donovan University of Toronto Aseem Agarwala Adobe Systems, Inc. Aaron Hertzmann University of Toronto.
Automatic scene inference for 3D object compositing Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.; Jin, H.; Fonte, R.; Sittig, M., David Forsyth.
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.
Lecture Pose Estimation – Gaussian Process Tae-Kyun Kim 1 EE4-62 MLCV.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Face Alignment by Explicit Shape Regression
Face Alignment at 3000 FPS via Regressing Local Binary Features
Computer vision: models, learning and inference
Illumination Model How to compute color to represent a scene As in taking a photo in real life: – Camera – Lighting – Object Geometry Material Illumination.
Kinect Case Study CSE P 576 Larry Zitnick
Lecture 23: Photometric Stereo CS4670/5760: Computer Vision Kavita Bala Scott Wehrwein.
Biological neuron artificial neuron.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Face Recognition Based on 3D Shape Estimation
High-Quality Video View Interpolation
6/23/2015CIC 10, Color constancy at a pixel [Finlayson et al. CIC8, 2000] Idea: plot log(R/G) vs. log(B/G): 14 daylights 24 patches.
Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Crash Course on Machine Learning
Theory of Decision Time Dynamics, with Applications to Memory.
Machine Vision ENT 273 Image Filters Hema C.R. Lecture 5.
Deep Learning and Neural Nets Spring 2015
01/28/05© 2005 University of Wisconsin Last Time Improving Monte Carlo Efficiency.
Automatic Registration of Color Images to 3D Geometry Computer Graphics International 2009 Yunzhen Li and Kok-Lim Low School of Computing National University.
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
Perception Introduction Pattern Recognition Image Formation
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Sensing for Robotics & Control – Remote Sensors R. R. Lindeke, Ph.D.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Human pose recognition from depth image MS Research Cambridge.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
The 18th Meeting on Image Recognition and Understanding 2015/7/29 Depth Image Enhancement Using Local Tangent Plane Approximations Kiyoshi MatsuoYoshimitsu.
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Local Illumination and Shading
Paper presentation topics 2. More on feature detection and descriptors 3. Shape and Matching 4. Indexing and Retrieval 5. More on 3D reconstruction 1.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Layer-finding in Radar Echograms using Probabilistic Graphical Models David Crandall Geoffrey C. Fox School of Informatics and Computing Indiana University,
Illumination Model How to compute color to represent a scene As in taking a photo in real life: – Camera – Lighting – Object Geometry Material Illumination.
Speaker Min-Koo Kang March 26, 2013 Depth Enhancement Technique by Sensor Fusion: MRF-based approach.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Presenter: Jae Sung Park
Microsoft Kinect Jason Wong Pierce Nichols Rick Berggreen Tri Le.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
An Empirical Comparison of Supervised Learning Algorithms
Intrinsic images and shape refinement
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Real-Time Human Pose Recognition in Parts from Single Depth Image
Common Classification Tasks
Identifying Human-Object Interaction in Range and Video Data
Image Based Modeling and Rendering (PI: Malik)
Illumination Model How to compute color to represent a scene
Digital Image Fundamentals
Unsupervised Perceptual Rewards For Imitation Learning
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Unrolling the shutter: CNN to correct motion distortions
Sign Language Recognition With Unsupervised Feature Learning
Directional Occlusion with Neural Network
Presentation transcript:

Learning to be a Depth Camera for Close-Range Human Capture and Interaction Sean Ryan Fanello*^ (+9 other guys*) *Microsoft Research ^iCub Facility – Istituto Italiano di Tecnologia Presented by Supasorn Suwajanakorn

2D Camera -> Depth sensing

Camera’s modification

Camera’s modification

Machine Learning Per pixel IR intensity Depth value IR reflection is skin-tone invariant [Simpson et al. 1998]

Multi-layered Decision Forest IR Image pixel Layer 1 Classification Forest 0-25 cm Probability 25-50 cm 50-75 cm Quantized depth 75-100 cm Layer 2 Regression Forests 23cm 40cm 60cm 76cm Output = 23 x + 40 x + 60 x + 76 x

Features Left branch Right branch Tree splitting parameter yes no u x v Tree splitting parameter Invariant to additive illumination yes no Left branch Right branch

Training Layer 1 (output quantized depth posterior) Shanon Entropy Maximize information gain Layer 2 (output depth value) Differential Entropy ( Fitting Gaussian to )

Training Data Real Synthetic Photos taken with calibrated depth camera + IR camera 100K Renders of hand and face Models illumination fall off, noise, vignetting, subsurface scattering Face variations thru 100+ Blendshapes Hand variations thru 26-dof model Random global transformations

Hyper-parameter Number of quantized depth Error Optimal : 4 quantized depths, 3 trees / forest, 25 max tree depth

3D Fused

Generalization Across Different People Across Different Cameras

vs Single-Layer Regressor Use single-tree forest for all Claimed benefits Multi-layer can infer useful intermediate results that simplify primary task, which in turn increases the accuracy of the model. Shallower tree depth with comparable accuracy. Low memory usage.

Video Demo

Conclusion Low-cost technique to convert 2D camera to real-time depth sensor. Discriminative approach implicitly handles variations e.g. light physics, skin color, geometry Multi-layer can achieve comparable accuracy to deeper single-layer forest Such as changes in shape, geometry, skin color, ambient illumination

Limitations & Discussion Fails to generalize beyond face and hand Strong shape prior / bias in the forest Works only on uniform albedo (skin) Feature too simple? Why does it even work? Output is noisy even if IR image is smooth No spatial or temporal smoothness Baseline method too simple? Interesting to compare to single view modeling