Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.

Slides:



Advertisements
Similar presentations
James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.
Advertisements

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Empowering visual categorization with the GPU Present by 陳群元 我是強壯 !
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
What Makes Paris Look like Paris? Carl Doersch 1 Saurabh Singh 1 Abhinav Gupta 1 Josef Sivic 2 Alexei A. Efros 1,2 1 Carnegie Mellon University 2 INRIA.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Exercise Session 10 – Image Categorization
Bag-of-Words based Image Classification Joost van de Weijer.
Mapping the World’s Photos
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
AUTOMATIC ANNOTATION OF GEO-INFORMATION IN PANORAMIC STREET VIEW BY IMAGE RETRIEVAL Ming Chen, Yueting Zhuang, Fei Wu College of Computer Science, Zhejiang.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Jakob Verbeek December 11, 2009
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Speaker : Shau-Shiang Hung ( 洪紹祥 ) Adviser : Shu-Chen Cheng ( 鄭淑真 ) Date : 99/05/04 1 Qirui Zhang, Jinghua Tan, Huaying Zhou, Weiye Tao, Kejing He, "Machine.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
System for Semi-automatic ontology construction
Learning Mid-Level Features For Recognition
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Summary Presented by : Aishwarya Deep Shukla
Video Google: Text Retrieval Approach to Object Matching in Videos
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
PEBL: Web Page Classification without Negative Examples
Finding Clusters within a Class to Improve Classification Accuracy
The Open World of Micro-Videos
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009

Outline Introduction Building Internet-Scale Datasets Image Classification Experiments Conclusion

Introduction Goal – Image classification on much larger datasets featuring millions of images and hundreds of categories Image classification – Multiclass SVM Flickr – landmark – Geotagged photos – Text tag

Introduction Number of imageCategory PASCAL VOC 2008[7] LabelMe[13] Tiny Images[16]Millionsnone

Building Internet-Scale Datasets Long-term goal – to create large labeled datasets To retrieve Flickr 60 million geotagged photos – x, y coordinates Eliminate photos (worse than about a city block) -> 30 million photos Mean shift cluster – radius of the disc is about 100m[3] Peaks in the photo density distribution[5] – at most 5 photos from any given Flickr user towards any given peak Top 500 peaks as categories – 500 th peak has 585 photos – 1000 th peak has 284 photos Final Dataset 1.9 million photos

Top 5 categories

Image Feature(visual) Visual word Clustering SIFT descriptors from photos in the training set k-means Approximate nearest neighbor(ANN)[1] Form a frequency vector which counts the number of occurrences of each visual word in the image Normalize L2-norm of 1

Image Feature(text tag) At least 3 different users Binary vector indicate presence or absence Normalize L2-norm of 1

Image Feature(Combination) WordsABCD Freq.2102 Tags1234 Pres.1111 Normalize L2-norm of 1 Word s ABCDTags1234 Freq.2/31/302/3Pres.1/2

Image Classification Find which class has the highest score – m is the number of classes – x is the feature vector of an image – is the weighting model – is the score for class y under w It’s by nature a multiway(as opposed to binary) classification problem

Image Classification Multiclass SVM[4] to learn model w – Using the SVM software package[9] A set of training examples – Multiclass SVM optimize the objective function

Experiments(1/6) Dataset 2 million images Each of these experiments evenly divided the dataset into test and training image sets The number of images used in an m-way classification experiment, the baseline probability of a correct random guess is 1/m.

Experiments(2/6)

Experiments(3/6)

Experiments(4/6) 20 well-traveled people to each label 50 photos taken at the world’s top ten landmarks. Textual tags were also shown for a random subset of the photos. the average human classification accuracy was 68.0% without textual tags and 76.4% when both the image and tags were shown Thus the humans performed better than the automatic classifier when using visual features alone (68.0% versus 57.55%) but about the same when both text and visual features were available (76.4% versus 80.91%).

Experiments(5/6) Visual vocabulary K 20% 50%

Experiments(6/6) Image classification on a single 2.66 GHz cpu – total time 2.4s – most of which is consumed by SIFT interest point detection If SIFT features are extracted, classification requires only – 3.06 ms for 200 categories – 0.15 ms for 20 categories

Conclusion Creating large labeled image datasets from geotagged image collections, which nearly 2 million are labeled. Demonstrate multiclass SVM classifiers using SIFT-based bag-of-word features achieve quite good classification rates for largescale problems, with accuracy that in some cases is comparable to that of humans on the same task. With text features from tagging, the accuracy can be hundreds of times the baseline.