Evaluation of the stability of SIFT keypoint correspondence across cameras or.. “can we put a ‘C’ in SIFT?” max van kleek 6.869: learning and interfaces thursday may 11, 2005
ubiquitous computing: computers (and cameras) are everywhere!
little sister: follow-me-around user modeling object correspondence across varying cameras/lighting/scenes is an important subgoal
applications: buliding a personal life log for yourself, interest profiling, social network mining, health care
identifying objects with local features: the SIFT transform
Orientation histogram = SIFT feature vector
recognizing objects identified and oriented keypoints used to vote for pose orientations in a hough transform pose-and-scale space
keypoint correspondence 1. % keypoints detected 2. stability of orientation histogram object orientation (away from frontal parallel) object deformation lighting direction, intensity, shading lens distortion, sharpness, ccd “quality”, noise, capture artifacts Mikolajczyk, K., C. Schmidt “A Performance Evaluation of Local Descriptors”, CVPR ‘03 Lowe, D.G. “Distinctive Image Features from Scale-Invariant Keypoints”, ICJV Me! well, sort of.. ???
cameras vary widely in sizes, configurations, capabilities, and prices the experiment Logitech QC Express 640x bit color YUV4:2:2 AGC, Auto Exposure manual focus USB iface $15 Logitech QC Pro 3000 CCD by Phillips 640x bit color YUV4:2:2, RGB AGC, Auto Exposure Auto WB manual focus USB iface $50 Sony EVI-D30 Steerable NTSC camera, DV capture card 720x480 luminance, less for color Raw DV AGC, Auto Exposure Auto focus ~$300 + $200 Nikon Coolpix 990 Digital still camera, 2048x1536 RGB Auto Gain, Auto WB, Auto Exposure Auto focus $1000 -> $500
experiment setup: 5 incandenscent lights 12 ft between camera and subject
acquiring image sets background 10 images stationary 2 front2 face right2 face left for each camera: = 16 images/cam * 4 cameras = 48
320x240 (or standard, and downsampled afterwards) RGB colorspace; jpeg quality 100; default camera settings except disabled AGC, disabled AE (locked to optimal settings)
algorithm for keypoint correspondence source image contrast stretch over whole set contrast-stretched background images background model (mean) - find(p(img) < epsilon) compute_sift_points
dilate fg mask with a disc strel intersection filter out bg key points keeping only relevant keypoints by intersecting keypoints with foreground pts
image A w/ sift keypoints orientation histograms for each keypoint in A orientation histograms for each keypoint in B image B w/ sift keypoints
orientation histograms for each keypoint in A orientation histograms for each keypoint in B match keypoints using nearest-neighbor in SIFT space
orientation histograms for each keypoint in A orientation histograms for each keypoint in B match keypoints using nearest-neighbor in SIFT space
1.1 -> > 1.1 sanity check: same (dv) camera, slightly different pose 1.1: 15 keypoints detected 1.2: 11 keypoints detected 6 properly assigned 8 in common 5 properly assigned 8 in common
4.1 -> > 4.1 nikon coolpix versus sony steerable 4.1: 18 keypoints detected 1.1: 15 keypoints detected 5 properly assigned 10 in common 5 properly assigned 10 in common
4.1 -> > 4.1 nikon coolpix versus qc pro 4.1: 18 keypoints detected 3.1: 17 keypoints detected 2 properly assigned 10 in common 5 properly assigned 10 in common
4.1 -> 2.1 nikon coolpix versus qc express 4.1: 18 keypoints detected 2.1: 22 keypoints detected 2 properly assigned 10 in common 0 properly assigned 10 in common
other results: qc pro vs qc exp (3.1 -> 2.1) 3.1->2.1 : 17 / 22 4 correct out of 6 in common 2.1->3.1: 22 / 17 0 correct out of 6 in common poor reproducibility with qcs? qc exp test (is the qc exp just too noisy?) 2.1->2.2 : 22/20 1 correct out of 8 in common 2.2->2.1 : 20/22 2 correct out of 8 in common yes. qc pro reproducibility 3.1->3.2: 17/24 6 correct out of 9 in common 3.2->3.1 : 24/17 7 correct out of 9 in common angle test using qc pro 3.1->3.4 : 0 correct out of 0 in common sensitive to out-of-plane rotation
experiment setup: 5 incandenscent lights 3 ft between camera and robot
320x240 (or standard, and downsampled afterwards) RGB colorspace; jpeg quality 100; default camera settings except disabled AGC, (locked to optimal settings)
parameters: –bins / histogram –pixels / quadrant –quadrants / keypoint –gaussian dropoff covariance keypoint splitting / multiple primary gradient directions source keypoint merging histogram blurring