Presentation is loading. Please wait.

Presentation is loading. Please wait.

BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.

Similar presentations


Presentation on theme: "BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin."— Presentation transcript:

1 BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin

2 Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search International Journal of Computer Vision, 2011 Sung Ju Hwang and Kristen Grauman

3 Relative importance of objects An image can contain many different objects, but some are more “important” than others. sky water mountain architecture bird cow

4 Relative importance of objects Some objects are background sky water mountain architecture bird cow

5 Relative importance of objects Some objects are less salient sky water mountain architecture bird cow

6 Relative importance of objects Some objects are more prominent or perceptually define the scene sky water mountain architecture bird cow

7 Our goal Goal: Retrieve those images that share important objects with the query image. versus How to learn a representation that accounts for this?

8 TAGS: Cow Birds Architecture Water Sky Idea: image tags as importance cue The order in which person assigns tags provides implicit cues about object importance to scene.

9 Approach overview: Building the image database Extract visual and tag-based features Cow Grass Horse Grass Car House Grass Sky Learn projections from each feature space into common “semantic space” Tagged training images …

10 Cow Tree Retrieved tag-list Image-to-image retrieval Image-to-tag auto annotation Tag-to-image retrieval Approach overview: Retrieval from the database Untagged query image Cow Tree Grass Tag list query Image database Retrieved images

11 Visual features captures the HSV color distribution captures the total scene structure captures local appearance (k-means on DoG+SIFT) Color HistogramVisual Words Gist [Torralba et al.]

12 Tag features Traditional bag-of-(text)words Word Frequency Cow Bird Water Architecture Mountain Sky tagcount Cow1 Bird1 Water1 Architecture1 Mountain1 Sky1 Car0 Person0

13 Tag features Absolute Rank Cow Bird Water Architecture Mountain Sky Absolute rank in this image’s tag-list tagvalue Cow1 Bird0.63 Water0.50 Architecture0.43 Mountain0.39 Sky0.36 Car0 Person0

14 Tag features Relative Rank Cow Bird Water Architecture Mountain Sky Percentile rank obtained from the rank distribution of that word in all tag-lists. tagvalue Cow0.9 Bird0.6 Water0.8 Architecture0.5 Mountain0.8 Sky0.8 Car0 Person0

15 Learning mappings to semantic space Canonical Correlation Analysis (CCA): choose projection directions that maximize the correlation of views projected from same instance. Semantic space: new common feature space View 1 View 2

16 Kernel Canonical Correlation Analysis Linear CCA Given paired data: Select directions so as to maximize: Same objective, but projections in kernel space:, Kernel CCA Given pair of kernel functions:, [Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]

17 Recap: Building the image database Semantic space Visual feature spacetag feature space

18 Experiments We compare the retrieval performance of our method with two baselines: Query image 1 st retrieved image Visual-Only Baseline Query image 1 st retrieved image Words+Visual Baseline [Hardoon et al. 2004, Yakhenenko et al. 2009] KCCA semantic space

19 We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance: Evaluation Doing well in the top ranks is more important. Sum of all the scores (normalization) Reward term score for p th example [Kekalainen & Jarvelin, 2002]

20 We present the NDCG@k score using two different reward terms: Evaluation scalepresence relative rank absolute rank Object presence/scale Ordered tag similarity Cow Tree Grass Person Cow Tree Fence Grass Rewards similarity of query’s objects/scales and those in retrieved image(s). Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).

21 Dataset LabelMe  6352 images  Database: 3799 images  Query: 2553 images  ~23 tags/image Pascal  9963 images  Database: 5011 images  Query: 4952 images  ~5.5 tags/image

22 Image database Image-to-image retrieval We want to retrieve images most similar to the given query image in terms of object importance. Tag-list kernel spaceVisual kernel space Untagged query image Retrieved images

23 Our method Words + Visual Visual only Image-to-image retrieval results Query Image

24 Image-to-image retrieval results Our method Words + Visual Visual only Query Image

25 Image-to-image retrieval results Our method better retrieves images that share the query’s important objects, by both measures. Retrieval accuracy measured by object+scale similarity Retrieval accuracy measured by ordered tag-list similarity 39% improvement

26 Tag-to-image retrieval We want to retrieve the images that are best described by the given tag list Image database Tag-list kernel spaceVisual kernel space Query tags Cow Person Tree Grass Retrieved images

27 Tag-to-image retrieval results Our method better respects the importance cues implied by the user’s keyword query. 31% improvement

28 Image-to-tag auto annotation We want to annotate query image with ordered tags that best describe the scene. Image database Tag-list kernel spaceVisual kernel space Untagged query image Output tag-lists Cow Tree Grass Cow Grass Field Cow Fence

29 Image-to-tag auto annotation results Boat Person Water Sky Rock Bottle Knife Napkin Light fork Person Tree Car Chair Window Tree Boat Grass Water Person k = number of nearest neighbors used

30


Download ppt "BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin."

Similar presentations


Ads by Google