Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012
Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/ A1 PrimeSense Patent 2/13/20122
3 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/20124 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/20125 Motion sensing input device by Microsoft Depth camera tech. developed by PrimeSense Invented in 2005 Software tech. developed by Rare First announced at E as “Project Natal” Windows SDK Releases /en-us/kinectforwindows/ discover/features.aspx
2/13/20126
7 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/20128 Depth Image Body Parts Joint Position IR Structured Light Random Decision Forest Mean Shift
2/13/20129 What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/201210
2/13/ Main Problem To recover shape from multiple views, need CORRESPONDENCES between the images Matching/Correspondence problem is hard Occlusions, Texture, Colors.. Etc. Solution: Structured light Idea: Simplify matching Strategy: Use illumination to create your own correspondences
2/13/ Basic Principle Use a projector to create unambiguous correspondences Light projection If we project a single point, matching is unique
2/13/ Line projection ( Line Scan ) For calibrated cameras, the epipolar geometry is known Project a line instead of a single point
2/13/ Project Multiple Stripes or Grids Which stripe matches which? Correspondence Again
2/13/ Answer 1: Assume Surface Continuity Ordering Constraint
2/13/ Answer 2: Coloured stripes (De Bruijn) Difficult to use for coloured surfaces
2/13/ Answer 2: Coloured dots (M-array) Difficult to use for coloured surfaces
2/13/ Answer 3: Pattern dots (M-array) Difficult for industrial manufacturing
2/13/ Answer 4: Time-coded light patterns (Time multiplexing) Use a sequence of binary patterns → (log N) images Each stripe has a unique binary illumination code
2/13/ All of the above are categorized as Discrete Methods There are a lot more Continuous Structured Light Methods such as Phase shifting and etc. Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8):
2/13/ All of the above are human designed patterns. Random Speckle Structured light using randomly generated patterns May obtain denser depth information by solving correspondence problem
2/13/ A Projector is just an inverse of a camera One projector and one camera is enough for triangulation Need Calibration
2/13/ US 2010/ Projector-Camera system Already calibrated structure δZ results in δX in 32
2/13/ US 2010/ Structured Light-1 Pseudo-random distribution Local: Random Global: Gray level decreases Can make a rough estimate in a low resolution image
2/13/ US 2010/ Structured Light-2 Quasi-periodic pattern Five-fold symmetry Results in distinct peaks in freq. domain Contain no unit cell repeats over spatial domain Use to reduce noise and ambient light in environment
2/13/201226
2/13/ US 2010/
2/13/ US 2010/ Uses a special (“astigmatic”) lens with different focal length in x- and y- directions Orientation of the circle indicates depth
2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/ Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) Algorithms runs 5ms per frame on Xbox GPU Novelty: Intermediate body parts representation
2/13/ Body part labeling 31 body parts Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body
2/13/ Depth image features dI(x) is the depth at pixel x in image I θ=(u,v) describe offsets u and v Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations
2/13/ Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification
2/13/ Multiple classifiers work together Committees E.g. Averaging the predictions of a set of individual models E.g. Majority votes Boosting Classifiers trained in sequence E.g. AdaBoost Decision Tree Binary selection corresponding to the traversal of a tree
2/13/ Three major aspect A splitting criterion A stop-splitting rule A rule to assign each leaf to a specific class Decision Forests A Decision Tree Committee
2/13/ Fast and effective multi-class classifier Each split node consists of a feature fθ and a threshold τ At the leaf node in tree t, given a learned Final classification How to train?
2/13/ Training Each tree train on different images Each image pick 2000 example pixels Algorithm
2/13/ Algorithm(cont.) Shannon entropy given Z on Y
2/13/ Algorithm(cont.) Training takes a lot of efforts 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster Where are those training data?
2/13/ Depth imaging Simplify the task of background subtraction Most important: easy to synthesize!!! Take Real Images Learning Synthesize Parameters Generate Lots of training data
2/13/ Depth Image Body Parts Joint Position IR Structured Light Random Decision Forest Mean Shift
2/13/ From the previous section, Use Mean Shift with a weighted Gaussian kernel
2/13/ Kernel density estimator Discrete points -> Continuous function Calculate the gradient at initial point and shift Iterate till stop
2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/ Synthetic Real
2/13/ Failure
2/13/ Training parameters vs. classification accuracy
2/13/ Comparisons
2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
2/13/ Depth images may contain enough information to solve human pose problems Depth images are color and texture invariant, which simplifies a lot of the corresponding problem A deep combining model with sufficient training data can become a good classifier even with simple features Buy a Kinect for LAB
2/13/ What is Kinect? Kinect Architecture From IR to depth image History of Structured Light PrimeSense Invented Structured Light From depth image to joint positions Body Part Interference Joint Proposals Experiments and Results Conclusion References
Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/ A1 Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping, US 2010/ A1 2/13/201252
2/13/ Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE. Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE. Breiman, L. (2001). "Random forests." Machine learning 45(1): Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7):
2/13/ John MacCormick, “How does the Kinect work? ” users.dickinson.edu/~jmac/selected-talks/kinect.pdf “Structured Light”, structured.pdf structured.pdf the-anandtech-review/2 the-anandtech-review/2 Chen, Y. S. and B. T. Chen (2003). "Measuring of a three- dimensional surface by use of a spatial distance computation." Applied optics 42(11):