Download presentation
Presentation is loading. Please wait.
Published bySheila Lewis Modified over 8 years ago
1
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon
2
MESA BRIDGES Project Outline A content-based image retrieval system on Android phone Finding similar images that matching the image captured on the cell phone Gist Algorithm
3
MESA BRIDGES Accuracy: should retain enough information to be able to make broad categorizations Speed: should be able to quickly perform gist transformation and exemplar matching Gist & Scene Categorization Source image 160 x 120 pixels 19,200 numbers (grayscale) Gist vector ~100 numbers Requirements Category Exemplars Some new scene
4
MESA BRIDGES Client-Server application Project Design Camera Image Recorder Gist Estimator Http Handler User Interface Web Server PHP handler Perl Module C++ SVM Classifier Image Database Http Request Http Response
5
Compute SIFT grid Feature Extraction Spatial Pyramid Spatial Histogram Computer Gist Vector SVM Classification MESA BRIDGES Lazebnik Algorithm
6
MESA BRIDGES Edge points at 8 orientations and 2 scales. These channels are the vocabulary. Vocabulary size M = 16 SIFT on 16 x 16 pixel patches Vocabulary from K-means on SIFT descriptors. Typically, M = 200 or 400 Lazebnik Algorithm Feature Extraction Weak Features Strong Features
7
MESA BRIDGES Lazebnik Algorithm Spatial Matching The idea is to “contextualize” the visual words by performing a sort of geometric match X m and Y m are sets of 2D vectors representing positions of the visual words in the input and training images For each word, we apply the pyramid match kernel K L to the above position vectors Categorization is done with an SVM trained using the one-versus-all rule
8
MESA BRIDGES Lazebnik Algorithm Pyramid Matching
9
MESA BRIDGES Caltech 101. 100%-0%,75%-25%,50%-50% 8 categories: Car Side, Cellphone, Chair, Cup, Faces, Laptop, Motorbikes, Pizza Vocabulary Size: 25,50,100,200 Training is done on the server-side Experimental Setup
10
MESA BRIDGES 25% Training 75% Testing. 200 Vocabulary 57.3% overall classification accuracy Testing Result Car SideCellphoneChairCupFacesLaptopMotorbikesPizzaUnknown Car Side 8700000005 Cellphone 0400000004 Chair 0000002044 Cup 1000400037 Faces 00003210005 Laptop 01000161042 Motorbikes 000000576022 Pizza 00001012017 Ground Truth
11
MESA BRIDGES 123 Speed vs. Accuracy
12
MESA BRIDGES Edge points at 8 orientations and 2 scales. These channels are the vocabulary. Vocabulary size M = 16 SIFT on 16 x 16 pixel patches Vocabulary from K-means on SIFT descriptors. Typically, M = 200 or 400 Result 3 Pyramid Matching
13
MESA BRIDGES Client-Server Design makes application easy to port different embedded system. Compute gist vector is an expensive process on embedded system. Reduce vocabulary size will improve processing speed with lower some accuracy Discussion & Conclusions
14
MESA BRIDGES Lazebnik, S., Schmid, C., Ponce, J. "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Catgories“ CVPR, 2006 Iryna Gordon and David G. Lowe, "Scene modelling, recognition and tracking with invariant image features," International Symposium on Mixed and Augmented Reality (ISMAR) 2004. http://ilab.usc.edu/wiki/index.php/Goggle Or http://ilab.usc.edu/~mviswana/Goggle Or http://ilab.usc.edu/~kai/Goggle
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.