Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

QR Code Recognition Based On Image Processing
Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.
www-video.eecs.berkeley.edu/research
--- some recent progress Bo Fu University of Kentucky.
11/06/14 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Real-Time Human Pose Recognition in Parts from Single Depth Images
Semantic Texton Forests for Image Categorization and Segmentation We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Object Recognition & Model Based Tracking © Danica Kragic Tracking system.
1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Stereo.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Kinect Case Study CSE P 576 Larry Zitnick
A Study of Approaches for Object Recognition
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2005 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Structured light and active ranging techniques Class 8.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Introduction to Computer Vision 3D Vision Topic 9 Stereo Vision (I) CMPSCI 591A/691A CMPSCI 570/670.
3D from multiple views : Rendering and Image Processing Alexei Efros …with a lot of slides stolen from Steve Seitz and Jianbo Shi.
CSE473/573 – Stereo Correspondence
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2006 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
12/01/11 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Sean Ryan Fanello. ^ (+9 other guys. )
Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Professor: S. J. Wang Student : Y. S. Wang
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
Realtime 3D model construction with Microsoft Kinect and an NVIDIA Kepler laptop GPU Paul Caheny MSc in HPC 2011/2012 Project Preparation Presentation.
Stereo Many slides adapted from Steve Seitz.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Computer Vision, Robert Pless
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.
Human pose recognition from depth image MS Research Cambridge.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Expectation-Maximization (EM) Case Studies
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
Associative Hierarchical CRFs for Object Class Image Segmentation
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
COMP24111: Machine Learning Ensemble Models Gavin Brown
11/05/15 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
RGB-D Images and Applications
3D Reconstruction Using Image Sequence
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Presenter: Jae Sung Park
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Microsoft Kinect Jason Wong Pierce Nichols Rick Berggreen Tri Le.
Microsoft Kinect How does a machine infer body position?
제 5 장 스테레오.
José Manuel Iñesta José Martínez Sotoca Mateo Buendía
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
How the Kinect Works Computational Photography
COMP61011 : Machine Learning Ensemble Models
Real-Time Human Pose Recognition in Parts from Single Depth Image
Eric Grimson, Chris Stauffer,
Presentation transcript:

Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012

Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/ A1 PrimeSense Patent 2/13/20122

3 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/20124 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/20125 Motion sensing input device by Microsoft Depth camera tech. developed by PrimeSense Invented in 2005 Software tech. developed by Rare First announced at E as “Project Natal” Windows SDK Releases /en-us/kinectforwindows/ discover/features.aspx

2/13/20126

7 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/20128 Depth Image Body Parts Joint Position IR Structured Light Random Decision Forest Mean Shift

2/13/20129 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/201210

2/13/ Main Problem To recover shape from multiple views, need CORRESPONDENCES between the images Matching/Correspondence problem is hard Occlusions, Texture, Colors.. Etc. Solution: Structured light Idea: Simplify matching Strategy: Use illumination to create your own correspondences

2/13/ Basic Principle Use a projector to create unambiguous correspondences Light projection If we project a single point, matching is unique

2/13/ Line projection ( Line Scan ) For calibrated cameras, the epipolar geometry is known Project a line instead of a single point

2/13/ Project Multiple Stripes or Grids Which stripe matches which? Correspondence Again

2/13/ Answer 1: Assume Surface Continuity Ordering Constraint

2/13/ Answer 2: Coloured stripes (De Bruijn) Difficult to use for coloured surfaces

2/13/ Answer 2: Coloured dots (M-array) Difficult to use for coloured surfaces

2/13/ Answer 3: Pattern dots (M-array) Difficult for industrial manufacturing

2/13/ Answer 4: Time-coded light patterns (Time multiplexing) Use a sequence of binary patterns → (log N) images Each stripe has a unique binary illumination code

2/13/ All of the above are categorized as Discrete Methods There are a lot more Continuous Structured Light Methods such as Phase shifting and etc. Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8):

2/13/ All of the above are human designed patterns. Random Speckle Structured light using randomly generated patterns May obtain denser depth information by solving correspondence problem

2/13/ A Projector is just an inverse of a camera One projector and one camera is enough for triangulation Need Calibration

2/13/ US 2010/ Projector-Camera system Already calibrated structure δZ results in δX in 32

2/13/ US 2010/ Structured Light-1 Pseudo-random distribution Local: Random Global: Gray level decreases Can make a rough estimate in a low resolution image

2/13/ US 2010/ Structured Light-2 Quasi-periodic pattern Five-fold symmetry Results in distinct peaks in freq. domain Contain no unit cell repeats over spatial domain Use to reduce noise and ambient light in environment

2/13/201226

2/13/ US 2010/

2/13/ US 2010/ Uses a special (“astigmatic”) lens with different focal length in x- and y- directions Orientation of the circle indicates depth

2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/ Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) Algorithms runs 5ms per frame on Xbox GPU Novelty: Intermediate body parts representation

2/13/ Body part labeling 31 body parts Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body

2/13/ Depth image features dI(x) is the depth at pixel x in image I θ=(u,v) describe offsets u and v Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations

2/13/ Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification

2/13/ Multiple classifiers work together Committees E.g. Averaging the predictions of a set of individual models E.g. Majority votes Boosting Classifiers trained in sequence E.g. AdaBoost Decision Tree Binary selection corresponding to the traversal of a tree

2/13/ Three major aspect A splitting criterion A stop-splitting rule A rule to assign each leaf to a specific class Decision Forests A Decision Tree Committee

2/13/ Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification How to train?

2/13/ Training Each tree train on different images Each image pick 2000 example pixels Algorithm

2/13/ Algorithm(cont.) Shannon entropy given Z on Y

2/13/ Algorithm(cont.) Training takes a lot of efforts 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster Where are those training data?

2/13/ Depth imaging Simplify the task of background subtraction Most important: easy to synthesize!!! Take Real Images Learning Synthesize Parameters Generate Lots of training data

2/13/ Depth Image Body Parts Joint Position IR Structured Light Random Decision Forest Mean Shift

2/13/ From the previous section, Use Mean Shift with a weighted Gaussian kernel

2/13/ Kernel density estimator Discrete points -> Continuous function Calculate the gradient at initial point and shift Iterate till stop

2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/ Synthetic Real

2/13/ Failure

2/13/ Training parameters vs. classification accuracy

2/13/ Comparisons

2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

2/13/ Depth images may contain enough information to solve human pose problems Depth images are color and texture invariant, which simplifies a lot of the corresponding problem A deep combining model with sufficient training data can become a good classifier even with simple features Buy a Kinect for LAB

2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References

Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/ A1 Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping, US 2010/ A1 2/13/201252

2/13/ Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE. Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE. Breiman, L. (2001). "Random forests." Machine learning 45(1): Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7):

2/13/ John MacCormick, “How does the Kinect work? ” users.dickinson.edu/~jmac/selected-talks/kinect.pdf “Structured Light”, structured.pdf structured.pdf the-anandtech-review/2 the-anandtech-review/2 Chen, Y. S. and B. T. Chen (2003). "Measuring of a three- dimensional surface by use of a spatial distance computation." Applied optics 42(11):