Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:

Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments: These slides combine and modify slides provided by Yantao Zheng et al. (National University of Singapore/Google)

Introduction Distributed Collaboration Google Goggles –Personal object recognition World-Wide Landmark Recognition Building Rome in a Day –Distributed matching and reconstruction

Distributed Collaboration Disaster or emergency –Time is of the essence Telecommunication networks down No maps or GPS What can we do to help ourselves and those around us?

Mobile Phones for Distributed Collaboration Camera for collecting visual information Ad-hoc wireless LAN –e.g. Bluetooth Goals: –Determine location, exits and hazardous paths Have I or someone else been here before?

Model Scenarios Firefighters Trapped miners Natural Disasters –Large population exodus –Building collapse Multiple agents collaborating to traverse an unknown environment

Visual search using picture as query Combination of algorithms –Object recognition –Optical character recognition –Geo-location (GPS & compass) Identify –Books and products –Businesses and landmarks

A World-Wide Landmark Recognition Engine with Web Learning Goal: Build a landmark recognition engine at earth-scale

Challenge I No list of landmarks in the world We only have: noisy data on Internet Tourist web articles Tourist photos geographical location

Challenge II How to learn landmark visual models Image search engine Photo-sharing websites

Challenge III Efficiency –Learning from enormous data –Recognizing from huge model

Discovering landmarks in the world Two approaches:  Photos in photo sharing websites  Online tourist articles Geo-tagged Landmark name

Learning landmarks from GPS-Tagged photos 20M images from picasa.com panoramio.com Geo- clustering geo cluster = landmarks? validate by photo authors Noisy image pool Visual clustering Graph clustering based on local features Validate by photo authors Analyzing text tags Compute frequency of n-grams of text tags Premise: Landmark photos are geographically adjacent visually similar uploaded by diff. users

Landmarks from GPS-Tagged photos ~20 million GPS-tagged photos 140k geo-clusters and 14k visual clusters 2240 landmarks from 812 cities in 104 countries –biased distribution, mostly in Europe

Learning landmarks from tourist web articles Explore article corpus in wikitravel.com Assume a geographical hierarchy Landmark mining = named entity extraction HTML is a structure tree Node: a HTML tag Value: text Classify each tree node, based on semantic clues embedded in the document structure

Learning landmarks from tourist web articles Heuristic rules  nodes are in "To See" or "See" section  nodes are children of “bullet list” nodes.  Nodes indicate bold font format Extract all named entities as landmark candidates Validate by visual models

Learning landmarks from tourist web articles ~7000 landmarks from 787 cities in 145 countries More evenly distributed

Unsupervised learning of landmark images Geo- clusters Landmarks from tour articles Noisy image pool Visual clustering Premise: photos from landmark should be similar Clustering based on local features Validate and clean models Visual model validates landmarks! Photo v.s. non-photo classifer to filter out noisy images ……

Local Feature Detection Find invariant and robust features Create distinctive feature descriptions

Laplacian-of-Gaussian (LoG) Scale-invariant edge detection Gaussian image filter to remove noise Laplacian filter to find areas of rapid change

Local Feature Description Invariant and distinctive description Texture from 118 dimension Gabor wavelet

Object matching based on local features Sim( ) = image match score, Image representation Interest points: Laplacian-of-Gaussian (LoG) filter Local feature: Gabor wavelets match score = Probability that match of and is false positive Probability of at least m out of n features match, if Probability of a feature match by chance

Constructing match region graph Image matching Node is match region 2 types of edges: match edge: measures match confidence overlap region edge: measures spatial overlapping

Graph clustering on match regions Distance between any two regions = shortest path connecting them Why hierarchical agglomerative clustering?  but not K-means, GMM etc Because we don't have a priori knowledge of # of clusters.  Each cluster should correspond to one aspect of a landmark intuitively Agglomerative hierarchical clustering Match region graph Visual clusters

Visual cluster example Corcovado, Rio de Janeiro, BrazilAcropolis, Athens, Greece

Visual cluster validation and cleaning Validate by authors or hosting webs of images reflect the popular appeal of landmarks Filter out non-photographic images, like map, logo  train Adaboost classifier  features: color hist, hough transform, etc. Clean clusters by detecting large area human face

Efficiency issues Issue 1: learning landmark image 21.4M photos Recognition engine: ~5000 landmarks Issue 2: recognizing landmark Query image Parallel computing to learn true landmark images Efficient hierarchical clustering Indexing local feature for matching Query time: ~0.2 sec in a P4 computer kd-tree indexing

Experiments: statistics of learned landmarks From photos From articles Total Landmark #224032465486 City #8126261259 Country #104130144 small overlap: 174 landmarks shared China: 101 landmarks Under-counted! Why? U.S.- High internet penetration rate & enourmous tour site

Evaluation of landmark image learning Randomly select 1000 visual clusters 68 (0.68%) are outliers: maps, logos, human photos Apply photographic v.s. non-photographic classifier 37 outliers. 0.68%=>0.37%

Evaluation of landmark recognition Positive testing images: –728 images from 124 landmarks Negative testing images: Caltech-256 (30524 ) + Pascal VOC 07 (9986 ) = 40,510 images. For positive images: –417 images detected to be landmarks –337/417 (80.8%) are correct –Identification rate: 337/728 (46.3%) For negative images: –463 images detected to be landmarks –False acceptance rate: 1.1% Landmarks can be similar!

False detected images Match is technically correct, but match region is not landmark Match is technically false, due to visual similarity A problem of model generation A problem of image feature and matching mechanism

Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:

Similar presentations

Presentation on theme: "Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:

Similar presentations

Presentation on theme: "Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:"— Presentation transcript:

Similar presentations

About project

Feedback