Download presentation
Presentation is loading. Please wait.
Published byMuriel Booker Modified over 9 years ago
1
Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments: These slides combine and modify slides provided by Yantao Zheng et al. (National University of Singapore/Google)
2
Introduction Distributed Collaboration Google Goggles –Personal object recognition World-Wide Landmark Recognition Building Rome in a Day –Distributed matching and reconstruction
3
Distributed Collaboration Disaster or emergency –Time is of the essence Telecommunication networks down No maps or GPS What can we do to help ourselves and those around us?
4
Mobile Phones for Distributed Collaboration Camera for collecting visual information Ad-hoc wireless LAN –e.g. Bluetooth Goals: –Determine location, exits and hazardous paths Have I or someone else been here before?
5
Model Scenarios Firefighters Trapped miners Natural Disasters –Large population exodus –Building collapse Multiple agents collaborating to traverse an unknown environment
6
Visual search using picture as query Combination of algorithms –Object recognition –Optical character recognition –Geo-location (GPS & compass) Identify –Books and products –Businesses and landmarks
7
A World-Wide Landmark Recognition Engine with Web Learning Goal: Build a landmark recognition engine at earth-scale
8
Challenge I No list of landmarks in the world We only have: noisy data on Internet Tourist web articles Tourist photos geographical location
9
Challenge II How to learn landmark visual models Image search engine Photo-sharing websites
10
Challenge III Efficiency –Learning from enormous data –Recognizing from huge model
11
Discovering landmarks in the world Two approaches: Photos in photo sharing websites Online tourist articles Geo-tagged Landmark name
12
Learning landmarks from GPS-Tagged photos 20M images from picasa.com panoramio.com Geo- clustering geo cluster = landmarks? validate by photo authors Noisy image pool Visual clustering Graph clustering based on local features Validate by photo authors Analyzing text tags Compute frequency of n-grams of text tags Premise: Landmark photos are geographically adjacent visually similar uploaded by diff. users
13
Landmarks from GPS-Tagged photos ~20 million GPS-tagged photos 140k geo-clusters and 14k visual clusters 2240 landmarks from 812 cities in 104 countries –biased distribution, mostly in Europe
14
Learning landmarks from tourist web articles Explore article corpus in wikitravel.com Assume a geographical hierarchy Landmark mining = named entity extraction HTML is a structure tree Node: a HTML tag Value: text Classify each tree node, based on semantic clues embedded in the document structure
15
Learning landmarks from tourist web articles Heuristic rules nodes are in "To See" or "See" section nodes are children of “bullet list” nodes. Nodes indicate bold font format Extract all named entities as landmark candidates Validate by visual models
16
Learning landmarks from tourist web articles ~7000 landmarks from 787 cities in 145 countries More evenly distributed
17
Unsupervised learning of landmark images Geo- clusters Landmarks from tour articles Noisy image pool Visual clustering Premise: photos from landmark should be similar Clustering based on local features Validate and clean models Visual model validates landmarks! Photo v.s. non-photo classifer to filter out noisy images ……
18
Local Feature Detection Find invariant and robust features Create distinctive feature descriptions
19
Laplacian-of-Gaussian (LoG) Scale-invariant edge detection Gaussian image filter to remove noise Laplacian filter to find areas of rapid change
20
Local Feature Description Invariant and distinctive description Texture from 118 dimension Gabor wavelet
21
Object matching based on local features Sim( ) = image match score, Image representation Interest points: Laplacian-of-Gaussian (LoG) filter Local feature: Gabor wavelets match score = Probability that match of and is false positive Probability of at least m out of n features match, if Probability of a feature match by chance
22
Constructing match region graph Image matching Node is match region 2 types of edges: match edge: measures match confidence overlap region edge: measures spatial overlapping
23
Graph clustering on match regions Distance between any two regions = shortest path connecting them Why hierarchical agglomerative clustering? but not K-means, GMM etc Because we don't have a priori knowledge of # of clusters. Each cluster should correspond to one aspect of a landmark intuitively Agglomerative hierarchical clustering Match region graph Visual clusters
24
Visual cluster example Corcovado, Rio de Janeiro, BrazilAcropolis, Athens, Greece
25
Visual cluster validation and cleaning Validate by authors or hosting webs of images reflect the popular appeal of landmarks Filter out non-photographic images, like map, logo train Adaboost classifier features: color hist, hough transform, etc. Clean clusters by detecting large area human face
26
Efficiency issues Issue 1: learning landmark image 21.4M photos Recognition engine: ~5000 landmarks Issue 2: recognizing landmark Query image Parallel computing to learn true landmark images Efficient hierarchical clustering Indexing local feature for matching Query time: ~0.2 sec in a P4 computer kd-tree indexing
27
Experiments: statistics of learned landmarks From photos From articles Total Landmark #224032465486 City #8126261259 Country #104130144 small overlap: 174 landmarks shared China: 101 landmarks Under-counted! Why? U.S.- High internet penetration rate & enourmous tour site
28
Evaluation of landmark image learning Randomly select 1000 visual clusters 68 (0.68%) are outliers: maps, logos, human photos Apply photographic v.s. non-photographic classifier 37 outliers. 0.68%=>0.37%
29
Evaluation of landmark recognition Positive testing images: –728 images from 124 landmarks Negative testing images: Caltech-256 (30524 ) + Pascal VOC 07 (9986 ) = 40,510 images. For positive images: –417 images detected to be landmarks –337/417 (80.8%) are correct –Identification rate: 337/728 (46.3%) For negative images: –463 images detected to be landmarks –False acceptance rate: 1.1% Landmarks can be similar!
30
False detected images Match is technically correct, but match region is not landmark Match is technically false, due to visual similarity A problem of model generation A problem of image feature and matching mechanism
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.