Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Note: a lot of input from Thijs Westerveld)

Similar presentations


Presentation on theme: "(Note: a lot of input from Thijs Westerveld)"— Presentation transcript:

1 (Note: a lot of input from Thijs Westerveld)
Multimedia Retrieval Arjen P. de Vries (Note: a lot of input from Thijs Westerveld) Centrum voor Wiskunde en Informatica

2 Outline Overview Indexing Multimedia Generative Models & MMIR
Probabilistic Retrieval Language models, GMMs Experiments Corel experiments

3 Indexing Multimedia

4 A Wealth of Information
Speech Audio Images Temporal composition Database

5 Associated Information
id gender name picture country Player Profile Biography history

6 User Interaction Poses a query Gives examples Views Evaluates
query text Database Gives examples video segments Views results feedback Evaluates

7 Indexing Multimedia Manually added descriptions
‘Metadata’ Analysis of associated data Speech, captions, OCR, auto-cues, … Content-based retrieval Approximate retrieval Domain-specific techniques

8 Limitations of Metadata
Vocabulary problem Dark vs. somber Different people describe different aspects Dark vs. evening

9 Limitations of Metadata
Encoding Specificity Problem A single person describes different aspects in different situations Many aspects of multimedia simply cannot be expressed unambiguously Processes in left (analytic, verbal) vs. right brain (aesthetics, synthetic, nonverbal)

10 Approximate Retrieval
Based on similarity Find all objects that are similar to this one Distance function Representations capture some (syntactic) meaning of the object ‘Query by Example’ paradigm

11 Feature extraction N-dimensional space Ranking Display

12 Low-level Features

13 Low-level Features

14 Query image

15 So, … Retrieval?!

16 IR is about satisfying vague information needs provided by users, (imprecisely specified in ambiguous natural language) by satisfying them approximately against information provided by authors (specified in the same ambiguous natural language) Smeaton

17 No ‘Exact’ Science! Evaluation is not done analytically, but experimentally real users (specifying requests) test collections (real document collections) benchmarks (TREC: text retrieval conference) Precision Recall ...

18 Known Item

19 Query

20 Results

21 Query

22 Results

23 Observation Automatic approaches are successful under two conditions:
the query example is derived from the same source as the target objects a domain-specific detector is at hand

24 Semantic gap… concepts ? features raw multimedia data

25 1. Generic Detectors

26 Retrieval Process Detector / Query Feature Filtering Ranking Parsing
Database Detector / Feature selection Query Parsing Filtering Ranking . Query type Nouns Adjectives Camera operations People, Names Natural/physical objects Monologues Invariant color spaces

27 Parameterized detectors
Example Results Topic 41 ‘Shots with at least 8 people’ People detector <1, 2, 3, many>

28 Detectors The universe and everything
Camera operations (pan, zoom, tilt, …) People (face based) Names (VideoOCR) Natural objects (color space selection) Physical objects (color space selection) Monologues (specifically designed) Press conferences (specifically designed) Interviews (specifically designed) F O C U S Domain specific detectors

29 2. Domain knowledge

30 Player Segmentation Original image Initial segmentation Final segmentation

31 Advanced Queries Show clips from tennis matches, starring Sampras,
playing close to the net;

32 3. Get to know your users

33 Mirror Approach Gather User’s Knowledge Local Information
Introduce semi-automatic processes for selection and combination of feature models Local Information Relevance feedback from a user Global Information Thesauri constructed from all users

34 Feature extraction N-dimensional space Clustering Ranking Display Concepts Thesauri

35 Low-level Features

36 Identify Groups

37 Representation Groups of feature vectors are conceptually equivalent to words in text retrieval So, techniques from text retrieval can now be applied to multimedia data as if these were text!

38 Query Formulation Clusters are internal representations, not suited for user interaction Use automatic query formulation based on global information (thesaurus) and local information (user feedback)

39 Interactive Query Process
Select relevant clusters from thesaurus Search collection Improve results by adapting query Remove clusters occuring in irrelevant images Add clusters occuring in relevant images

40 Assign Semantics

41 Correct cluster representing
Visual Thesaurus Correct cluster representing ‘Tree’, ‘Forest’ Glcm_47 ‘Incoherent’ cluster Fractal_23 Mis-labeled cluster Gabor_20

42 Learning Short-term: Adapt query to better reflect this user’s information need Long-term: Adapt thesaurus and clustering to improve system for all users

43 Thesaurus Only After Feedback

44 4. Nobody is unique!

45 Collaborative Filtering
Also: social information filtering Compare user judgments Recommend differences between similar users People’s tastes are not randomly distributed You are what you buy (Amazon)

46 Collaborative Filtering
Benefits over content-based approach Overcomes problems with finding suitable features to represent e.g. art, music Serendipity Implicit mechanism for qualitative aspects like style Problems: large groups, broad domains

47 5. Ask for help

48 Query Articulation How to articulate the query? Feature extraction
N-dimensional space How to articulate the query?

49 What is the query semantics ?

50 Details matter

51 Problem Statement Feature vectors capture ‘global’ aspects of the whole image Overall image characteristics dominate the feature-vectors Hypothesis: users are interested in details

52 Irrelevant Background
Query Result

53 Image Spots Image-spots articulate desired image details
Foreground/background colors Colors forming ‘shapes’ Enclosure of shapes by background colors Multi-spot queries define the spatial relations between a number of spots

54 Query Images Results Hist 16Hist Spot Spot+Hist 5968 6563 192 14 6274
7062 2 6098 7107 4 5953 6888 3 6612 7034 1

55 A: Simple Spot Query `Black sky’

56 B: Articulated Multi-Spot Query
`Black sky’ above `Monochrome ground’

57 C: Histogram Search in `Black Sky’ images 2-4: 14:

58 Complicating Factors What are Good Feature Models?
What are Good Ranking Functions? Queries are Subjective!

59 Probabilistic Approaches

60 Generative Models video of Bayesian model to that present the disclosure can a on for retrieval in have is probabilistic still of for of using this In that is to only queries queries visual combines visual information look search video the retrieval based search. Both get decision (a visual generic results (a difficult We visual we still needs, search. talk what that to do this for with retrieval still specific retrieval information a as model still LM abstract

61 aka ‘Language Modelling’
Generative Models… A statistical model for generating data Probability distribution over samples in a given ‘language’ aka ‘Language Modelling’ M P ( | M ) = P ( | M ) P ( | M, ) P ( | M, ) P ( | M, ) © Victor Lavrenko, Aug. 2002

62 … in Information Retrieval
Basic question: What is the likelihood that this document is relevant to this query? P(rel|I,Q) = P(I,Q|rel)P(rel) / P(I,Q) P(I,Q|rel) = P(Q|I,rel)P(I|rel)

63 ‘Language Modelling’ Not just ‘English’ But also, the language of
author newspaper text document image Hiemstra or Robertson?  ‘Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing.’

64 ‘Language Modelling’ Not just ‘English’ But also, the language of
author newspaper text document image Guardian or Times?

65 ‘Language Modelling’ Not just English! But also, the language of or ?
author newspaper text document image or ?

66 Unigram and higher-order models
P ( ) Unigram Models N-gram Models Other Models Grammar-based models, etc. Mixture models = P ( ) P ( | ) P ( | ) P ( | ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | ) © Victor Lavrenko, Aug. 2002

67 The fundamental problem
Usually we don’t know the model M But have a sample representative of that model First estimate a model from a sample Then compute the observation probability P ( | M ( ) ) M © Victor Lavrenko, Aug. 2002

68 Indexing: determine models
Docs Models Indexing Estimate Gaussian Mixture Models from images using EM Based on feature vector with colour, texture and position information from pixel blocks Fixed number of components

69 Retrieval: use query likelihood
Which of the models is most likely to generate these 24 samples?

70 Probabilistic Image Retrieval
?

71 Rank by P(Q|M) P(Q|M1) P(Q|M2) P(Q|M3) Query P(Q|M4)

72 Topic Models Models Documents Query Query Model P(Q|M1) P(D1|QM)

73 Probabilistic Retrieval Model
Text Rank using probability of drawing query terms from document models Images Rank using probability of drawing query blocks from document models Multi-modal Rank using joint probability of drawing query samples from document models

74 Text Models Unigram Language Models (LM)
Urn metaphor P( ) ~ P ( ) P ( ) P ( ) P ( ) = 4/9 * 2/9 * 4/9 * 3/9 © Victor Lavrenko, Aug. 2002

75 Generative Models and IR
Rank models (documents) by probability of generating the query Q: P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/94 P( | ) = 3/9 * 3/9 * 3/9 * 3/9 = 81/94 P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/94 P( | ) = 2/9 * 5/9 * 2/9 * 2/9 = 40/94

76 Image Models Urn metaphor not useful Use Gaussian Mixture Models (GMM)
Drawing pixels useless Pixels carry no semantics Drawing pixel blocks not effective chances of drawing exact query blocks from document slim Use Gaussian Mixture Models (GMM) Fixed number of Gaussian components/clusters/concepts

77 Key-frame representation
Query model split colour channels Take samples Cr Cb Y DCT coefficients position EM algorithm 675 9 12 11 1 4 1517 -9 -3 850 15 -2 661 7 13 5 -5 3 1536 2 -4 844 668 -7 -1 1534 837 665 10 829 669 18 833

78 Image Models Expectation-Maximisation (EM) algorithm iteratively ?
estimate component assignments re-estimate component parameters ?

79 Expectation Maximization
Component 1 Component 2 Component 3

80 Expectation Maximization
animation E M Component 1 Component 2 Component 3

81 Corel Experiments

82 No ‘Exact’ Science! Evaluation is not done analytically, but experimentally real users (specifying requests) test collections (real document collections) benchmarks (TREC: text retrieval conference) Precision Recall ...

83 Testing the Model on Corel
39 classes, ~100 images each Build models from all images Use each image as query Rank full collection Compute MAP (mean average precision) AP=average of precision values after each relevant image is retrieved MAP is mean of AP over multiple queries Relevant  from query class

84 Example results Query: Top 5:

85 MAP per Class (mean: .12) English Pub Signs .36
English Country Gardens .33 Arabian Horses .31 Dawn & Dusk .21 Tropical Plants .19 Land of the Pyramids .19 Canadian Rockies .18 Lost Tribes .17 Elephants .17 Tigers .16 Tropical Sea Life .16 Exotic Tropical Flowers .16 Lions .15 Indigenous People .15 Nesting Birds .13 Sweden .07 Ireland .07 Wildlife of the Galapagos .07 Hawaii .07 Rural France .07 Zimbabwe .07 Images of Death Valley .07 Nepal .07 Foxes & Coyotes .06 North American Deer .06 California Coasts .06 North American Wildlife .06 Peru .05 Alaskan Wildlife .05 Namibia .05

86 Class confusion Query from class A Relevant  from class B
Queries retrieve images from own class Interesting mix-ups Beaches – Greek islands Indigenous people – Lost tribes English country gardens – Tropical plants – Arabian Horses Similar backgrounds

87 Background Matching Query: Top 5:

88 Background Matching Query: Top 5:

89 Care for more? T. Westerveld and A.P. de Vries, Generative probabilistic models for multimedia retrieval: query generation versus document generation, in IEE Vision, Image and Signal Processing.


Download ppt "(Note: a lot of input from Thijs Westerveld)"

Similar presentations


Ads by Google