Download presentation
Presentation is loading. Please wait.
Published byBennett Dawson Modified over 9 years ago
1
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR 2008. Anchorage, Alaska.
3
Location Recognition Where am I? Instance recognition Category recognition (more difficult) Lobby? Cubicle? Hallway? Kitchen?
5
Geometry Based Recognition SLAM & structure from motion Why do we need metric reconstruction? Lose the flexibility to do class recognition. F. Schaffalitzky and A. Zisserman Training Images Testing Image Geometry &Labels Features Local Feature Database G. Schindler, M. Brown, R. Szeliski
6
Appearance Based Recognition Capture global appearance information Gaussian mixture model used by A. Torralba, et. al Training Images ImageVectors Appearance Model Preprocessing Training A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin M. Cummins and P. Newman (e.g. PCA)
7
Appearance or Geometry? Can we do better by fusing both information together? A small example with 2 location labels: cubicle and corridor
8
The Simplest Model Nearest neighbor classification Naive but still effective with enough samples. A small shift may disrupt the recognition. Does not capture uncertainty.
9
How to Incorporate Translation Invariance? We need something better than a “bag of frames” model Training images Testing image
10
Panorama It models both appearance & geometry Adapts to camera rotation and focal length change M. Brown and D. G. Lowe Generative An image is a patch “extracted” from the panorama
11
Cons of Panoramas Not easy to build a panorama due to parallax Do not capture uncertainty Only work for location instance recognition No compact representation for repetitive scenes
12
Gaussian Mixture Model Six mixtures trained as in Torralba et al’s paper Handles uncertainties but no translation invariance Means Variances Remove boundariesMuch more blurred
13
A Weak Panorama 3D motions can be roughly modeled by 2D translation + scaling. 2D translation Scaling
14
Epitome = Panorama + GMM Epitome Generative model for image patches /video frames Captures repetitive patterns in the original image Mapping = 2D translation + scaling A source imageImage patches Epitome N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006
15
Means Variances Location Epitome Epitome as Probabilistic Panorama Model 3D scenes rather than a single 2D image Environment = Virtual panorama
16
Learning the Location Epitome Initialize epitome randomly EM Iterations E-step: infer the posteriors over all mappings M-step: use the posteriors as weights to update the mean and variance of epitome pixels Free energy EM iterations
17
Model Comparison Epitome is a smart mixture of Gaussians model with parameters sharing among components For the same number of parameters, the epitome generalizes better
19
Build Label Maps The label maps are the posterior of the label given the mapping Epitome Label maps Corridor label map Cubicle label map
20
Recognition from Location Epitomes Fast correlation: infer the best mapping region Sum the pixel-wise votes Temporal smoothing using HMM Input testing image Best matching patch Corridor label map Cubicle label map Location epitome
22
Color is not always the best feature Other features besides RGB For example, stereo feature captures the depth info. Do not need high stereo accuracy (efficient DP here) CorridorCubicleKitchen
23
BG R Stereo Integrating Multiple Features Stack multiple feature “channels”
24
Local Histograms Enable better translation invariance and more generalization Error rate: 0.49 0.36 in a test, 4-class dataset Improve the efficiency dramatically: 30 times speed-up
25
Supervised Learning Incorporates training image labels Helps discriminate images with similar features but different location labels. An example epitome An example label feature A monitor in the cubicle A microwave in the kitchen Discriminative features
27
MIT Image Database Created by Antonio Torralba, and et. al. 17 sequences, 62 locations, 7 categories, 72077 images
28
Results on Recognizing Location Instances Location epitome vs. GMM, 10% better in average
29
Results on Recognizing Location Classes Location Epitome vs. GMM, 10%-20% better
30
MSRC Data Set Captured with a stereo camera 5409 images collected at the speed of 4 fps 11 sequences and 7 classes corridor_visionlabcubicle_mlpkitchen-fl2-northlectureroom-large lectureroom-smallstairs-1st-to-2ndstairs-2nd-to-1st
31
Integrate Depth Cues corridor_visionlabcubicle_mlpkitchen-fl2-northlectureroom-large lectureroom-smallstairs-1st-to-2ndstairs-2nd-to-1st
32
Instance Recognition with Multiple Features RGB & Stereo overwhelms the other features Learning: 5.7 fps Recognition: 116 fps = 29 times the capture speed
33
Summary A generative model for the recognition of both location instances and classes Fast: capable of real-time applications Flexible: capable of integrating various features Probabilistic: capable of capturing uncertainties Future applications Navigation for visually impaired people Appearance-based loop closing for SLAM problems
34
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn Thank you !
35
Local Histograms (2) Improves efficiency (both training and testing) The bottle neck: convoluting epitome and images Compression rate: 3*(C 1 C 2 ) 2 /50 = 2400 Learning: 3 hours 6 mins, 30 times faster Convolute 3-dimension RGB featuresConvolute 50-dimension local histograms M N Image Epitome MeMe NeNe M/C 1 N/C 2 * * M e /C 1 N e /C 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.