1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science.

1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science University of California at Merced CVPR 2010 June 17 th, 2010

2 Remote sensing: using overhead images of distant scenes to derive geographic information. satellite image (Google Maps) National Land Cover Database (USGS)

3 Proximate sensing: use ground-level images of close-by objects and scenes. Land Cover Map 2000 (UK Centre for Ecology & Hydrology) ? community-contributed photos (Geograph Britain and Ireland project) study area: 100x100 km region in southeastern UK (region TQ in National Grid)

4 community-contributed photos (Geograph Britain and Ireland project) Proximate sensing: use ground-level images of close-by objects and scenes.

5 Proximate Sensing We conjecture that the visual content of georeferenced images can be used to derive maps of what-is-where on the surface of the earth. Motivation: –Such collections are becoming increasingly available, e.g. Flickr (100+ million geotagged images), Panoramio, Picasa, Geograph, TrekEarth. –Derive geographic information not possible through other means, e.g. land-use classification. –Exciting new application of CV that not only provides another context to apply/revisit standard techniques but stands to motivate novel problems.

6 Proximate Sensing: Context Volunteered Geographic Information (Wikipedia): –VGI is the harnessing of tools to create, assemble, and disseminate geographic data provided voluntarily by individuals (Goodchild, 2007). –Goodchild, M. 2007. Citizens as Sensors: The World of Volunteered Geography. proximate sensing  citizen science volunteered geographic information 

7 VGI: Flickr 103,679,986 geotagged items 2.8 million things geotagged this month

8 VGI: Geograph “The Geograph Britain and Ireland project aims to collect geographically representative photographs and information for every square kilometre of Great Britain and Ireland, and you can be part of it.” 9,973 users have contributed 1,897,042 images covering 255,904 grid squares, or 77.1% of the total. “Railway bridge crossing R. Rother This is now a dismantled railway, further east it becomes the Kent & East Sussex Railway.”

9 Objective Eventual goal is to use the visual content of georeferenced photos to produce land use/cover maps. Initial focus on simpler problem of binary classification into developed and undeveloped regions.

10 Related Work Other researchers have leveraged location information in georeferenced photo collections: –To annotate novel images [Quack et al., CIVR 2008; Moxley et al., MIR 2008]. –To geolocate novel images [Hays and Efros, CVPR 2008]. –To organize the collections themselves [Crandall et al., WWW 2009]. However, ours is the first work (to the best of our knowledge) to use the collections to infer what-is-where on the surface of the earth on a large scale.

11 Overview fraction developed map binary classification map training images label images train classifier feature extraction aggregate labels in 1x1 km tiles target images feature extraction classify target images

12 Ground Truth (1) Land Cover Map 2000 (UK Centre for Ecology & Hydrology) LCM AC 10: Oceanic Seas LCM AC 8: Standing open water LCM AC 4: Improved grassland LCM AC 7: Built up areas and gardens LCM AC 3: Arable and horticulture LCM AC 1: Broad-leaved / mixed woodland LCM AC 9: Coastal LCM AC 2: Coniferous woodland LCM AC 5: Semi-natural grass LCM AC 6: Mountain, heath, bog

13 Ground Truth (2) Aggregate 10 land cover classes into 2 superclasses: –Developed: LCM AC:7 Built up areas and gardens –Undeveloped: other 9 classes Derive 2 ground truth maps: –Fraction map: percent developed for each 1x1 km tile. –Binary classification map: apply 50% threshold to fraction map. Ground truth fraction map indicating percent developed for each 1x1km tile. Ground truth binary classification map indicating tiles labelled as developed (white) or undeveloped (black).

14 Datasets (1) Downloaded 920K Flickr images for the TQ region. Distribution for 1x1 km tiles shown to left (log10 scale). 5,420 tiles contain no Flickr images. 4,580 tiles contain average of 200, median of 10, and maximum of 53,840 images. Flickr

15 Datasets (2) Downloaded 120K images from the Geograph Britain and Ireland project Distribution for 1x1 km tiles shown to left (log10 scale). Only 614 tiles without images. 9,386 tiles contain average of 13, median of 5, and maximum of 1,458 images. Geograph

16 Image Features Extract simple five dimensional edge histogram features for each image. Motivated by the observation that images of developed scenes typically have a higher proportion of horizontal and vertical edges than images of undeveloped scenes.

17 Image Classification Perform image level binary classification: –Developed. –Undeveloped. SVM classifier with Gaussian RBF kernel, five- fold cross validation, and grid search for optimal parameter selection.

18 Experiments (1) fraction developed map binary classification map training images label images train classifier feature extraction target images feature extraction aggregate labels in 1x1 km tiles classify target images

19 Experiments (2) Fraction developed map: the fraction of images classified as developed in each tile. Binary classification map: threshold applied to fraction map. Explore two types of thresholds: –Fixed at 0.5. –Adaptive so that 38.9% of the tiles are labelled as developed (this represents prior knowledge on the distribution of developed vs. undeveloped regions).

20 Experiments (3) Results are qualitatively evaluated by visually comparing predicted maps with ground truth maps. Results are quantitatively evaluated using ground truth: –Binary classification: number of tiles with same label. –Fraction developed: correlation coefficient (  ) over tiles. Also, mean absolute difference (MAD) and root mean squared difference (RMSD). Quantitative results computed over 4,553 tiles for which there are both Flickr and Geograph images. –38.9% of these tiles are developed in the ground truth so that chance binary classification is 61.1% achievable by labelling all tiles as undeveloped.

21 Experiments (4) Manual vs. weakly-supervised labelling of training set. Effect of photographer intent. Relative importance of training vs. target set. Filtering out non-informative images. Training set size. Training set quality.

22 Results—Manually Labelled Training Set (1) Training set contains 2,740 Flickr images which have been manually labeled as depicting a scene that is developed or undeveloped. Developed ~ containing constructed materials such as used in houses, buildings, etc.

23 Results—Manually Labelled Training Set (2) Ground Truth Maps Maps Generated Using Flickr Images

24 Binary Maps Fraction Maps Overall Class. RateAvg. Class. Rate Training Set Target Set Training Set Size Fixed Thresh. % Adaptive Thresh. % Fixed Thresh. % Adaptive Thresh. %  MADRMSD Manual (Flickr)Flickr2740 (0.51)66.464.968.8630.3740.2870.383 fraction of images labelled as developed in the training set Performance is better than chance (61.1%) Results—Manually Labelled Training Set (4)

25 Labelled training set constructed in fully automated fashion: –Select 2 images at random from tiles with 4 or more images. –Label them with the majority label of the tile in the ground truth map. Results—Weakly-Supervised Training (1)

26 Results—Weakly-Supervised Training (2) Binary Maps Fraction Maps Overall Class. RateAvg. Class. Rate Training Set Target Set Training Set Size Fixed Thresh. % Adaptive Thresh. % Fixed Thresh. % Adaptive Thresh. %  MADRMSD Manual (Flickr)Flickr2740 (0.51)66.464.968.8630.3740.2870.383 Weakly (Flickr)Flickr5872 (0.52)67.266.968.765.20.3800.2790.373 Weakly-labelled training set outperforms manually- labelled one. –Suggests training sets can be generated from regions for which maps exist and then used to train classifiers for mapping unmapped regions.

27 Results—Photographer Intent (1) Compare Flickr vs. Geograph results.

28 Ground Truth Maps Maps Generated Using Flickr Images Maps Generated Using Geograph Images Results—Photographer Intent (2)

29 Results—Photographer Intent (4) Binary Maps Fraction Maps Overall Class. RateAvg. Class. Rate Training Set Target Set Training Set Size Fixed Thresh. % Adaptive Thresh. % Fixed Thresh. % Adaptive Thresh. %  MADRMSD Flickr 5872 (0.52)67.266.968.765.20.3800.2790.373 Geograph 10576 (0.26)68.274.060.872.60.5200.2710.358 Photographer intent is a significant factor.

30 Results—Importance of Training vs. Target Set (1) Geograph training+target set outperforms Flickr training+target set. Investigate whether improvement is due to training or target set. Training and target sets from different collections.

31 Binary Maps Fraction Maps Overall Class. RateAvg. Class. Rate Training Set Target Set Training Set Size Fixed Thresh. % Adaptive Thresh. % Fixed Thresh. % Adaptive Thresh. %  MADRMSD Flickr goodFlickr5070 (0.49)67.068.167.466.60.3290.2850.374 Geograph goodFlickr5603 (0.47)60.768.353.866.60.3300.2940.381 Geograph goodGeograph5603 (0.47)74.274.671.573.10.5510.2310.308 Flickr goodGeograph5070 (0.49)69.973.171.571.70.4960.2540.331 Photographer intent is more important for target than training set. Results—Importance of Training vs. Target Set (2)

32 Results—Filtering Out Non-informative Images (1) Investigate whether removing images with faces improves results. Motivation: photographs of people are less likely to be geographically informative, especially close-in portraits.

33 Results—Filtering Out Non-informative Images (2) Binary Maps Fraction Maps Overall Class. RateAvg. Class. Rate Training Set Target Set Training Set Size Fixed Thresh. % Adaptive Thresh. % Fixed Thresh. % Adaptive Thresh. %  MADRMSD Flickr 5872 (0.52)67.266.968.765.20.3800.2790.373 Flickr Flickr no faces5872 (0.52)66.866.766.864.20.3670.3010.414 GeographFlickr5603 (0.47)60.768.353.866.60.3300.2940.381 Geograph Flickr no faces5603 (0.47)59.968.052.065.20.3120.3210.428 Filtering out images with faces from the target set does not result in improved performance.

34 Demonstrated that georeferenced community-contributed photo collections can be considered as a form of VGI. Maps of developed/undeveloped regions automatically generated using Flickr and Geograph images shown to be similar to ground truth maps. –Despite simple image features. Discussion (1)

35 Weakly-labelled training set outperforms manually-labelled training set. –Clear benefits for training classifiers. Photographer intent is significant, especially for target set. –Restricts what can be used as target sets. –Poses interesting research challenges such as how to use the Geograph dataset to filter the “noisy” Flickr dataset. Initial results on filtering out images with faces inconclusive. Discussion (2)

36 Improved image features. –Gist. Integrate textual annotations. –Flickr tags. –Geograph descriptive text. Additional land-cover/use classes. Spatial models: –Tobler’s first law of geography: all things are related, but nearby things are more related than distant things. Extensions

37 Come to our poster this afternoon

38 Thank you! and questions? Acknowledgements: This work was funded in part by the following grants: –DOE Early Career Scientist and Engineer Award/PECASE –NSF 0917069: IIS Core Thanks to Nathan Graves for implementing the edge histogram descriptors.

1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science.

Similar presentations

Presentation on theme: "1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science.

Similar presentations

Presentation on theme: "1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback