How the Kinect Works Computational Photography

Slides:



Advertisements
Similar presentations
875: Recent Advances in Geometric Computer Vision & Recognition
Advertisements

11/06/14 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Stereo Vision Reading: Chapter 11
Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923.
Gratuitous Picture US Naval Artillery Rangefinder from World War I (1918)!!
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Stereo and Projective Structure from Motion
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Recap from Previous Lecture Tone Mapping – Preserve local contrast or detail at the expense of large scale contrast. – Changing the brightness within.
Lecture 8: Stereo.
Stereo.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2012.
Kinect Case Study CSE P 576 Larry Zitnick
Last Time Pinhole camera model, projection
CS6670: Computer Vision Noah Snavely Lecture 17: Stereo
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2005 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
Computer Vision : CISC 4/689 Adaptation from: Prof. James M. Rehg, G.Tech.
Stereopsis Mark Twain at Pool Table", no date, UCR Museum of Photography.
The plan for today Camera matrix
3D from multiple views : Rendering and Image Processing Alexei Efros …with a lot of slides stolen from Steve Seitz and Jianbo Shi.
Stereo and Structure from Motion
Stereo Computation using Iterative Graph-Cuts
CSE473/573 – Stereo Correspondence
Announcements PS3 Due Thursday PS4 Available today, due 4/17. Quiz 2 4/24.
Project 1 artifact winners Project 2 questions Project 2 extra signup slots –Can take a second slot if you’d like Announcements.
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2006 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/12/11 Many slides adapted from Lana Lazebnik,
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Stereo Readings Trucco & Verri, Chapter 7 –Read through 7.1, 7.2.1, 7.2.2, 7.3.1, 7.3.2, and 7.4, –The rest is optional. Single image stereogram,
12/01/11 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/05/15 Many slides adapted from Lana Lazebnik,
Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)
Structure from images. Calibration Review: Pinhole Camera.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
Recap from Monday Image Warping – Coordinate transforms – Linear transforms expressed in matrix form – Inverse transforms useful when synthesizing images.
Stereo Vision Reading: Chapter 11 Stereo matching computes depth from two or more images Subproblems: –Calibrating camera positions. –Finding all corresponding.
Stereo Readings Szeliski, Chapter 11 (through 11.5) Single image stereogram, by Niklas EenNiklas Een.
Stereo Many slides adapted from Steve Seitz.
Project 2 code & artifact due Friday Midterm out tomorrow (check your ), due next Fri Announcements TexPoint fonts used in EMF. Read the TexPoint.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
Computer Vision, Robert Pless
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
Computer Vision Stereo Vision. Bahadir K. Gunturk2 Pinhole Camera.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.
Geometric Transformations
Lecture 16: Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
stereo Outline : Remind class of 3d geometry Introduction
776 Computer Vision Jan-Michael Frahm Spring 2012.
Solving for Stereo Correspondence Many slides drawn from Lana Lazebnik, UIUC.
11/05/15 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
RGB-D Images and Applications
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
Multiple View Geometry and Stereo. Overview Single camera geometry – Recap of Homogenous coordinates – Perspective projection model – Camera calibration.
Presenter: Jae Sung Park
CSE 185 Introduction to Computer Vision Stereo 2.
Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
Epipolar Geometry and Stereo Vision
제 5 장 스테레오.
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
Stereo and Structure from Motion
Real-Time Human Pose Recognition in Parts from Single Depth Image
EECS 274 Computer Vision Stereopsis.
Thanks to Richard Szeliski and George Bebis for the use of some slides
Computer Vision Stereo Vision.
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

How the Kinect Works Computational Photography 11/09/17 T2 Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from: http://www.blisteredthumbs.net/2010/11/dance-central-angry-review

Kinect Device

Kinect Device illustration source: primesense.com

What the Kinect does Get Depth Image Application (e.g., game) Estimate Body Pose

How Kinect Works: Overview IR Projector IR Sensor Projected Light Pattern Stereo Algorithm Segmentation, Part Prediction Depth Image Body Pose

Part 1: Stereo from projected dots IR Projector IR Sensor Projected Light Pattern Stereo Algorithm Segmentation, Part Prediction Depth Image Body Pose

Part 1: Stereo from projected dots Overview of depth from stereo How it works for a projector/sensor pair Stereo algorithm used by Primesense (Kinect)

Depth from Stereo Images Dense depth map Some of following slides adapted from Steve Seitz and Lana Lazebnik

Depth from Stereo Images Goal: recover depth by finding image coordinate x’ that corresponds to x X x x' f x x’ Baseline B z O O’ X

Depth from disparity X z x x’ f f O Baseline B O’ Disparity is inversely proportional to depth.

Stereo and the Epipolar constraint X X X x x’ x’ x’ Potential matches for x have to lie on the corresponding line l’. Potential matches for x’ have to lie on the corresponding line l.

Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images

Basic stereo matching algorithm If necessary, rectify the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image Find corresponding epipolar scanline in the right image Examine all pixels on the scanline and pick the best match x’ Compute disparity x-x’ and set depth(x) = fB/(x-x’)

Correspondence search Left Right scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation

Correspondence search Left Right scanline SSD

Correspondence search Left Right scanline Norm. corr

Results with window search Data Window-based matching Ground truth

Add constraints and solve with graph cuts Before Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 For the latest and greatest: http://www.middlebury.edu/stereo/

Failures of correspondence search Occlusions, repetition Textureless surfaces Non-Lambertian surfaces, specularities

Dot Projections http://www.youtube.com/watch?v=28JwgxbQx8w

Depth from Projector-Sensor Only one image: How is it possible to get depth? Scene Surface Projector Sensor

Same stereo algorithms apply Projector Sensor

Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97

Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97

Region-growing Random Dot Matching Detect dots (“speckles”) and label them unknown Randomly select a region anchor, a dot with unknown depth Windowed search via normalized cross correlation along scanline Check that best match score is greater than threshold; if not, mark as “invalid” and go to 2 Region growing Neighboring pixels are added to a queue For each pixel in queue, initialize by anchor’s shift; then search small local neighborhood; if matched, add neighbors to queue Stop when no pixels are left in the queue Repeat until all dots have known depth or are marked “invalid” http://www.wipo.int/patentscope/search/en/WO2007043036

Projected IR vs. Natural Light Stereo What are the advantages of IR? Works in low light conditions Does not rely on having textured objects Not confused by repeated scene textures Can tailor algorithm to produced pattern What are advantages of natural light? Works outside, anywhere with sufficient light Uses less energy Resolution limited only by sensors, not projector Difficulties with both Very dark surfaces may not reflect enough light Specular reflection in mirrors or metal causes trouble

Uses of Kinect (part 1) 3D Scanner: http://www.youtube.com/watch?v=V7LthXRoESw IllumiRoom: http://research.microsoft.com/apps/video/default.aspx?id=191304 https://www.youtube.com/watch?v=nGGMv9RnJIA

To learn more Warning: lots of wrong info on web Great site by Daniel Reetz: http://www.futurepicture.org/?p=97 Kinect patents: http://www.faqs.org/patents/app/20100118123 http://www.faqs.org/patents/app/20100020078 http://www.faqs.org/patents/app/20100007717

How does the Kinect v2 work? Time of flight sensor Turn on and off two co-located pixels in alternation very quickly (e.g., 10 GHz) Pulse laser just as quickly Ratio of light achieved by the two pixels tells travel time (i.e., distance) of laser By using two pairs of pixels flipping at different rates, you get a low-res and high-res estimate of travel time http://www.gamasutra.com/blogs/DanielLau/20131127/205820/The_Science_Behind_Kinects_or_Kinect_10_versus_20.php

Part 2: Pose from depth IR Projector IR Sensor Projected Light Pattern Stereo Algorithm Segmentation, Part Prediction Depth Image Body Pose

Goal: estimate pose from depth image Real-Time Human Pose Recognition in Parts from a Single Depth Image Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake CVPR 2011

Goal: estimate pose from depth image RGB Depth Part Label Map Joint Positions http://research.microsoft.com/apps/video/default.aspx?id=144455

Challenges Lots of variation in bodies, orientation, poses Needs to be very fast (their algorithm runs at 200 FPS on the Xbox 360 GPU) Pose Examples Examples of one part

Extract body pixels by thresholding depth

Basic learning approach Very simple features Lots of data Flexible classifier

Features Difference of depth at two offsets Offset is scaled by depth at center

Get lots of training data Capture and sample 500K mocap frames of people kicking, driving, dancing, etc. Get 3D models for 15 bodies with a variety of weight, height, etc. Synthesize mocap data for all 15 body types

Body models

Part prediction with random forests Randomized decision forests: collection of independently trained trees Each tree is a classifier that predicts the likelihood of a pixel belonging to each part Node corresponds to a thresholded feature The leaf node that an example falls into corresponds to a conjunction of several features In training, at each node, a subset of features is chosen randomly, and the most discriminative is selected

Joint estimation Joints are estimated using mean-shift (a fast mode-finding algorithm) Observed part center is offset by pre-estimated value

Results Ground Truth

More results

Accuracy vs. Number of Training Examples

Uses of Kinect (part 2) Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg Robot Control: https://www.youtube.com/watch?v=7vq-1TiXi3g Capture for holography: http://www.youtube.com/watch?v=4LW8wgmfpTE Virtual dressing room: http://www.youtube.com/watch?v=1jbvnk1T4vQ Fly wall: http://vimeo.com/user3445108/kiwibankinteractivewall

Note: pose from rgb video works amazingly well now https://www.youtube.com/watch?v=pW6nZXeWlGM&t=77s

Coming up Project 5 due Mon Detecting fake photographs (Tues) Computational approaches to cameras (Thurs) Final project proposal due after Thanksgiving break