Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sung Ju Hwang and Kristen Grauman University of Texas at Austin.

Similar presentations


Presentation on theme: "Sung Ju Hwang and Kristen Grauman University of Texas at Austin."— Presentation transcript:

1 Sung Ju Hwang and Kristen Grauman University of Texas at Austin

2 Image retrieval Query image Image Database Image 1 Image 2 Image k Content-based retrieval from an image database …

3 Relative importance of objects Query image Image Database Which image is more relevant to the query? ?

4 Relative importance of objects Query image cow bird water cow bird water Image Database cow fence mud Which image is more relevant to the query? ? sky

5 Relative importance of objects An image can contain many different objects, but some are more “important” than others. sky water mountain architecture bird cow

6 Relative importance of objects Some objects are background sky water mountain architecture bird cow

7 Relative importance of objects Some objects are less salient sky water mountain architecture bird cow

8 Relative importance of objects Some objects are more prominent or perceptually define the scene sky water mountain architecture bird cow

9 Our goal Goal: Retrieve those images that share important objects with the query image. versus How to learn a representation that accounts for this?

10 The order in which person assigns tags provides implicit cues about object importance to scene. Idea: image tags as importance cue TAGS Cow Birds Architecture Water Sky

11 TAGS: Cow Birds Architecture Water Sky Idea: image tags as importance cue Learn this connection to improve cross-modal retrieval and CBIR. The order in which person assigns tags provides implicit cues about object importance to scene. Then query with untagged images to retrieve most relevant images or tags.

12 Related work Previous work using tagged images focuses on the noun ↔ object correspondence. Duygulu et al. 02Fergus et al. 05 Li et al., 09 Berg et al. 04 Lavrenko et al. 2003, Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, … Related work building richer image representations from “two-view” text+image data: Bekkerman & Jeon 07, Qi et al. 09, Quack et al. 08, Quattoni et al 07, Yakhnenko & Honavar 09,… Gupta et al. 08 height: 6-11 weig ht: 235 lbs positi on:forward, croati a college: Blaschko & Lampert 08 Hardoon et al. 04

13 Approach overview: Building the image database Extract visual and tag-based features Cow Grass Horse Grass Car House Grass Sky Learn projections from each feature space into common “semantic space” Tagged training images …

14 Cow Tree Retrieved tag-list Image-to-image retrieval Tag-to-image retrieval Image-to-tag auto annotation Approach overview: Retrieval from the database Untagged query image Cow Tree Grass Tag list query Image database Retrieved images

15 Dual-view semantic space Visual features and tag-lists are two views generated by the same concept. Semantic space

16 Learning mappings to semantic space Canonical Correlation Analysis (CCA): choose projection directions that maximize the correlation of views projected from same instance. Semantic space: new common feature space View 2 View 1

17 Kernel Canonical Correlation Analysis Linear CCA Given paired data: Select directions so as to maximize: [Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004] Same objective, but projections in kernel space:, Kernel CCA Given pair of kernel functions:,

18 Semantic space Building the kernels for each view Word frequency, rank kernels Visual kernels

19 Visual features captures the HSV color distribution captures the total scene structure captures local appearance (k-means on DoG+SIFT) Color HistogramVisual Words Gist [Torralba et al.] Average the component χ 2 kernels to build a single visual kernel.

20 Tag features Traditional bag-of-(text)words Word Frequency Cow Bird Water Architecture Mountain Sky tagcount Cow1 Bird1 Water1 Architecture1 Mountain1 Sky1 Car0 Person0

21 Tag features Absolute Rank Cow Bird Water Architecture Mountain Sky Absolute rank in this image’s tag-list tagvalue Cow1 Bird0.63 Water0.50 Architecture0.43 Mountain0.39 Sky0.36 Car0 Person0

22 Tag features Relative Rank Cow Bird Water Architecture Mountain Sky tagvalue Cow0.9 Bird0.6 Water0.8 Architecture0.5 Mountain0.8 Sky0.8 Car0 Person0 Percentile rank, compared to word’s typical rank in all tag-lists.

23 Semantic space Building the kernels for each view Word frequency, rank kernels Visual kernels

24 Experiments We compare the retrieval performance of our method with two baselines: Query image 1 st retrieved image Visual-Only Baseline Query image 1 st retrieved image Words+Visual Baseline [Hardoon et al. 2004, Yakhenenko et al. 2009] KCCA semantic space

25 We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance: Evaluation Doing well in the top ranks is more important. Sum of all the scores for the perfect ranking (normalization) Reward term score for p th ranked example [Kekalainen & Jarvelin, 2002]

26 We present the NDCG@K scores using two different reward terms: Evaluation scalepresence relative rank absolute rank Object presence/scale Ordered tag similarity Cow Tree Grass Person Cow Tree Fence Grass Rewards similarity of query’s objects/scales and those in retrieved image(s). Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).

27 Dataset LabelMe  6352 images  Database: 3799 images  Query: 2553 images  Scene-oriented  Contains the ordered tag lists via labels added  56 unique taggers  ~23 tags/image Pascal  9963 images  Database: 5011 images  Query: 4952 images  Object-central  Tag lists obtained on Mechanical Turk  758 unique taggers  ~5.5 tags/image

28 Image database Image-to-image retrieval We want to retrieve images most similar to the given query image in terms of object importance. Tag-list kernel spaceVisual kernel space Untagged query image Retrieved images

29 Our method Words + Visual Visual only Image-to-image retrieval results Query Image

30 Image-to-image retrieval results Our method Words + Visual Visual only Query Image

31 Image-to-image retrieval results Our method better retrieves images that share the query’s important objects, by both measures. Retrieval accuracy measured by object+scale similarity Retrieval accuracy measured by ordered tag-list similarity 39% improvement

32 Tag-to-image retrieval We want to retrieve the images that are best described by the given tag list Image database Tag-list kernel spaceVisual kernel space Query tags Cow Person Tree Grass Retrieved images

33 Tag-to-image retrieval results Our method better respects the importance cues implied by the user’s keyword query. 31% improvement

34 Image-to-tag auto annotation We want to annotate query image with ordered tags that best describe the scene. Image database Tag-list kernel spaceVisual kernel space Untagged query image Output tag-lists Cow Tree Grass Cow Grass Field Cow Fence

35 Image-to-tag auto annotation results Boat Person Water Sky Rock Bottle Knife Napkin Light fork Person Tree Car Chair Window Tree Boat Grass Water Person Methodk=1k=3k=5k=10 Visual-only0.08260.17650.20220.2095 Word+Visual0.08180.17120.19920.2097 Ours0.09010.19360.22300.2335 k = number of nearest neighbors used

36 Woman Table Mug Ladder Implicit tag cues as localization prior Mug Key Keyboard Toothbrush Pen Photo Post-it Object detector Implicit tag features Computer Poster Desk Screen Mug Poster Training: Learn object-specific connection between localization parameters and implicit tag features. Mug Eiffel Desk Mug Office Mug Coffee Testing: Given novel image, localize objects based on both tags and appearance. P (location, scale | tags) Implicit tag features [Hwang & Grauman, CVPR 2010]

37 Conclusion We want to learn what is implied (beyond objects present) by how a human provides tags for an image Approach requires minimal supervision to learn the connection between importance conveyed by tags and visual features. Consistent gains over content-based visual search tag+visual approach that disregards importance


Download ppt "Sung Ju Hwang and Kristen Grauman University of Texas at Austin."

Similar presentations


Ads by Google