Presentation is loading. Please wait.

Presentation is loading. Please wait.

Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.

Similar presentations

Presentation on theme: "Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento."— Presentation transcript:


2 Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento – Italy

3 Background: approaching to improve image ranking Concerned about user annotation, time and location Propose To define a novel multimodal similarity measure Combined visual features, annotated concepts, and geo tagging. Propose a learning approach based on SVMs(Support Vector Machine).

4 Image-graph based techniques Vertices represent including visual and semantic information. Probabilistic models PLSA(Probabilistic Latent Semantic Analysis) methodology Visual features Annotation GPS coordinates SVMs, able to learn from the data weight to be assigned. Random set of image queries Retrieve a set of images having highest similarity Judged relevant by human annotators Train SVMs with examples.

5 PLSA User generated multimedia contents Visual content Image tagging Geo location Producing corresponding topic spaces with reduced dimensions. Expectation Maximization Fast on-line retrieval for very large dataset

6 PLSA – with 100 topics. Visual feature SIFT(Scale Invariant Feature Transform) 128 element descriptor with 2500 salient points. 2500 salient points (K-Means, training set of 5000 images) Bag-of-words associating a feature vector with each image. Image annotation Consists of all the tags in the dataset, except words used just once or by a single user. Total number:5500 words GPS coordination Calculated as distance between the GPS coordinates of the query and the retrieved images.

7 Improve retrieval accuracy Relies on Development Set(DS) Relevant images Relevant Irrelevant Annotated by users Proposing SVMs Two important property They are robust to overfitting, offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting. Include additional features in the parameter vector

8 SVMs: Multimodal 2(MM2)

9 100.000 images of Paris from Flickr. 2500 SIFT / 50.000 images. 5.500 tags / 50.000 images. Maximum two images per user. Avoid similar images taken by the same photographer. 100 query images and retrieved top-ranked 9 images How to judge it is relevant Half of 72 annotators to consider the image relevant

10 Result 900 retrieved images VS: 305 relevant images TS: 218 relevant images VS+TS: 308 relevant images MM1: 641 GPS coordinates. MM2: accuracy: 72% and MAP of 0.78


12 Figure 4-8 Improve the basic model when the tag annotation is not reliable Improve diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)


14 Presented a novel way to combine visual information with tags and GPS. Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. Result confirm that the approaches improve the accuracy.

15 Presented by Ivan Chiou

Download ppt "Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento."

Similar presentations

Ads by Google