Data-driven Visual Similarity for Cross-domain Image Matching

Name: Data-driven Visual Similarity for Cross-domain Image Matching
Uploaded: 2017-10-10T19:17:02+00:00
Duration: PTM7S2
Channel: Luke Bailey
Description: Data-driven Visual Similarity for Cross-domain Image Matching

Data-driven Visual Similarity for Cross-domain Image Matching
Abhinav Shrivastava *Tomasz Malisiewicz Abhinav Gupta Alexei A. Efros Carnegie Mellon University *MIT To appear in SIGGRAPH Asia, 2011

Outline Instruction Approach Experimental Validation Applications
Data-driven Uniqueness Algorithm Description Experimental Validation Sketch-to-Image Matching Painting-to-Image Matching Applications Limitations

Instruction

Visual matching approaches
Exact matching: These methods usually fail when tasked with finding similar, but not identical objects (e.g., try using GOOGLE GOGGLES app to find a cup, or a chair).

Approximate matching ：
Most focus on employing various image representations that aim to capture the important, salient parts of the image. （GIST, HoG） Content-Based Image Retrieval (CBIR), the aim is to retrieve semantically-relevant images, even if they do not appear to be visually similar.

Cross-domain matching:
Particular domains sketches to photographs [Chen et al. 2009; Eitz et al. 2010] photos under different illuminants [Chong et al. 2008] Across multiple domains Matching local selfsimilarities across images and videos work by Shechtman and Irani [2007]

each query image decides what is the best way to weight its constituent parts.

Approach There are two requirements for a good visual similarity function: It has to focus on the content of the image (the “what”), rather that the style (the “how”). It should be scene-dependent.

Data-driven Uniqueness:
Re-weight the different elements of an image based on how unique they are, the resulting similarity function would. Compute uniqueness in a data-driven way — against a very large dataset of randomly selected images. The features that would best discriminate this image (the positive sample) against the rest of the data (the negative samples).

where xi is Ii’s extracted feature vector.
Given the learned, query-dependent weight vector wq, the visual similarity between a query image Iq and any other image/sub-image Ii can be defined simply as: where xi is Ii’s extracted feature vector. We employ the linear Support Vector Machine (SVM) to learn the feature weight vector. image feature ： Histogram of Oriented Gradients (HOG) template descriptor

To visualize how the SVM captures the notion of data-driven uniqueness：

Algorithm Description：
Learning the weight vector wq amounts to minimizing the following convex objective function: Each query image (Iq) is represented with a rigid grid-like HoG feature template (xq). Due to image misalignment, we create a set of extra positive data-points, P, by applying small transformations (shift, scale and aspect ratio) to the query image Iq, and generating xi for each sample. The SVM classifier is learned using Iq and P as positive samples. Set containing millions of sub-images N (extracted from 10,000 randomly selected Flickr images), as negatives. We use LIBSVM [Chang and Lin 2011] for learning wq with a common regularization parameter λ= 100 and the standard hinge loss function h(x) = max(0,1－x).

Experimental Validation
To demonstrate our approach, we performed a number of image matching experiments on different image datasets, comparing against the following popular baseline methods: Tiny Images GIST BoW Spatial Pyramid Normalized-HoG (N-HoG)

Sketch-to-Image Matching
We collected a dataset of 50 sketches (25 cars and 25 bicycles) to be used as queries. The sketches were used to query into the PASCAL VOC dataset [Everingham et al. 2007].

For quantitative evaluation, we compared how many car and bicycle images were retrieved in the top-K images for car and bicycle sketches respectively. We used the bounded mean Average Precision (mAP) metric used by [J´egou et al. 2008].

Painting-to-Image Matching
We collected a dataset of 50 paintings of outdoor scenes in a diverse set of painting styles geographical locations. The retrieval set was sub-sampled from the 6:4M GPS-tagged Flickr images of [Hays and Efros 2008]. For each query, we created a set of 5,000 images randomly sampled within a 50 mile radius of each painting’s location.

Applications Internet Re-photography

Painting2GPS

Visual Scene Exploration

Limitations Two main failure modes ：
We fail to find a good match due to the relatively small size of our dataset (10,000 images) compared to Google’s billions of indexed images. The query scene is so cluttered that it is difficult for any algorithm to decide which parts of the scene – the car, the people on sidewalk, the building in the background – it should focus on.

Thank you！

Data-driven Visual Similarity for Cross-domain Image Matching

Similar presentations

Presentation on theme: "Data-driven Visual Similarity for Cross-domain Image Matching"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data-driven Visual Similarity for Cross-domain Image Matching

Similar presentations

Presentation on theme: "Data-driven Visual Similarity for Cross-domain Image Matching"— Presentation transcript:

Similar presentations

About project

Feedback