Tour the World: building a web-scale landmark recognition engine ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro.

Slides:

Advertisements

Similar presentations

James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.

Advertisements

Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro Bissacco2, Fernando Brucher2 Tat-Seng Chua1, and Hartmut Neven2 1.

Clustering Basic Concepts and Algorithms

Presented by Xinyu Chang

Detecting Faces in Images: A Survey

Large-Scale Entity-Based Online Social Network Profile Linkage.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

Large-Scale Image Retrieval From Your Sketches Daniel Brooks 1,Loren Lin 2,Yijuan Lu 1 1 Department of Computer Science, Texas State University, TX, USA.

Query Specific Fusion for Image Retrieval

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Data-driven Visual Similarity for Cross-domain Image Matching

Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.

Robust and large-scale alignment Image from

HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,

Object retrieval with large vocabularies and fast spatial matching

ADVISE: Advanced Digital Video Information Segmentation Engine

A Study of Approaches for Object Recognition

Week 9 Data Mining System (Knowledge Data Discovery)

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Scale Invariant Feature Transform (SIFT)

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Spatial Pyramid Pooling in Deep Convolutional

Predicting Matchability - CVPR 2014 Paper -

Dynamic Cascades for Face Detection 第三組馮堃齊、莊以暘. 2009/01/072 Outline Introduction Dynamic Cascade Boosting with a Bayesian Stump Experiments Conclusion.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,

Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.

POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.

Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.

Research Area B Leif Kobbelt. Communication System Interface Research Area B 2.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:

SWARM INTELLIGENCE IN DATA MINING Written by Crina Grosan, Ajith Abraham & Monica Chis Presented by Megan Rose Bryant.

Using Statistic-based Boosting Cascade Weilong Yang, Wei Song, Zhigang Qiao, Michael Fang 1.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Learning Based Hierarchical Vessel Segmentation

Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.

Sign Classification Boosted Cascade of Classifiers using University of Southern California Thang Dinh Eunyoung Kim

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison February 2, 2010 Acknowledgments:

CANONICAL IMAGE SELECTION FROM THE WEB ACM International Conference on Image and Video Retrieval, 2007 Yushi Jing Shumeet Baluja Henry Rowley.

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

Presented by David Lee 3/20/2006

Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.

SIFT Scale-Invariant Feature Transform David Lowe

Deeply learned face representations are sparse, selective, and robust

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Finding Clusters within a Class to Improve Classification Accuracy

Applying Key Phrase Extraction to aid Invalidity Search

CSc4730/6730 Scientific Visualization

Topic 5: Cluster Analysis

Presentation transcript:

Tour the World: building a web-scale landmark recognition engine ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro Bissacco2, Fernando Brucher2 Tat-Seng Chua1, and Hartmut Neven2

Outline Introduction Methods Experiments & Results Conclusion

Introduction

Motivation – the vast amount of landmark images in the Internet Applications – clean landmark images for building virtual tourism – content understanding and geo-location detection – provide tour guide recommendation and visualization Issues – no readily available list of landmarks in the world – even if there were such a list, it is still challenging to collect true landmark images – efficiency for such a large-scale system.

Discovering Landmarks in the World Comprehensive and well-organized list of landmarks Two sources on the Internet – Geographically calibrated images in photo sharing websites(ex: Picasa) Geo-tag, text-tag, popularity – Travel guide articles from websites(ex: wikitravel) Text-based

Frameworks Agglomerative hierarchical clustering Google image search & Google map Extract landmark names from the city tour guide corpus[16] [16] J. Yi and N. Sundaresan. A classifier for semi-structured documents. In Proc. of Conf. on Knowledge Discovery and Data Mining, pages 340–344, New York, NY, USA, , 3 Validation criterion: The number of authors in cluster > threshold

Landmarks mined from GPS-tagged

Landmarks extracted from tour guide

Frameworks SIFT-128 -> PCA -> feature dimension 40 Affine transformation[9] [9] D. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, volume 20, pages 91–110, 2003.

Graph cluster

Frameworks Validation criterion: 1. Landmark name too long or most of its words are not capitalized 2. The number of authors in cluster > threshold Cleaning: 1. Photographic vs non-photographic classifier(Adaboost algorithm) 2. Face detector[15] [15] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. of Conf. on Computer Vision and Pattern Recognition, volume 1, pages I–511–I–518 vol.1, 2001.

Photographic vs non-photographic classifier

Efficiency Issues Parallel computing to mine true landmark images Efficiency in hierarchical clustering Indexing local feature for matching – Query, k-d tree

Experiments & Results ~20 million GPS-tagged photos from picasa and panoramio to construct geo-cluster Mined from tour guide corpus in Google Image Search to construct noisy image set from first 200 returned images. The total number of images ~21.4 million.

Statistics of mined landmarks GPS-tagged photos delivers – 2240 validated landmarks, from 812 cities in 104 countries. The tour guide corpus yields – 3246 validated landmarks, from 626 cities in 130 ountries. Only 174 landmarks in both list

Statistics of mined landmarks The combined list of landmarks consists of 5312 unique landmarks from 1259 cities in 144 countries.

Statistics of mined landmarks Our processing language focuses on English only. The number of landmarks in China amounts to 101 only -> a multi-lingual data mining task

Evaluation of landmark image mining set the minimum cluster size to 4 The visual clustering yields – ~14k visual clusters with ~800k images for landmarks mined from GPS-tagged photos – ~12k clusters with ~110k images for landmarks mined from tour guide corpus visual clusters are randomly selected to evaluate the correctness – Outlier cluster rate 0.68% ( 68 out of 1000) – By validation and cleaning, drops to 0.37% (37 out of 1000).

Evaluation of landmark recognition Positive and Negative query images Positive testing image set consists of 728 images from 124 randomly selected landmarks – manually annotated from images that range from 201 to 300 in the Google Image Search

Evaluation of landmark recognition Negative testing set consists of (Caltech-256) (Pascal VOC 07) = images in total The recognition by local feature matching of query image against model images – nearest neighbor (NN) principle.

Evaluation of landmark recognition Positive testing image set, 417 images are detected by the system to be landmarks, of which 337 are correctly identified. The accuracy of identification is 80.8%, which is fairly satisfactory, considering the large number of landmark models in the system. This high accuracy enables our system to provide landmark recognition to other applications, like image content analysis and geo-location detection. The identification rate (correctly identified / positive testing images) is 46.3% (337/728)

Evaluation of landmark recognition

For the negative testing set, 463 out of images The false acceptance rate is only 1.1% After careful examination, we find that most false matches occur in two scenarios: – (1) the match is technically correct, but the match region is not representative to the landmark; and – (2) the match is technically false, due to the visual similarity between negative images and landmark.

Conclusion We build a world-scale landmark recognition engine with a multi-source and multi-modal data mining task. We have employed the GPS-tagged photos and online tour guide corpus to generate a worldwide landmark list. We then utilize ~21.4 M images to build up landmark visual models, in an unsupervised fashion. The experiments demonstrate that the engine can deliver satisfactory recognition performance, with high efficiency.