Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel.

Similar presentations


Presentation on theme: "Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel."— Presentation transcript:

1 Multimedia Retrieval

2 Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel experiments –TREC Video benchmark

3 Indexing Multimedia

4 A Wealth of Information Speech Audio Images Temporal composition Database

5 Associated Information id gender name country history picture Player Profile Biography

6 User Interaction Database Views results Gives examples video segments Poses a query query text feedback Evaluates

7 Indexing Multimedia Manually added descriptions –‘Metadata’ Analysis of associated data –Speech, captions, OCR, … Content-based retrieval –Approximate retrieval –Domain-specific techniques

8 Limitations of Metadata Vocabulary problem –Dark vs. somber Different people describe different aspects –Dark vs. evening

9 Limitations of Metadata Encoding Specificity Problem –A single person describes different aspects in different situations Many aspects of multimedia simply cannot be expressed unambiguously –Processes in left (analytic, verbal) vs. right brain (aesthetics, synthetic, nonverbal)

10 Approximate Retrieval Based on similarity –Find all objects that are similar to this one –Distance function –Representations capture some (syntactic) meaning of the object ‘Query by Example’ paradigm

11 N-dimensional space Feature extraction Ranking Display

12 Low-level Features

13

14 Query image

15 So, … Retrieval?!

16 IR is about satisfying vague information needs provided by users, (imprecisely specified in ambiguous natural language) by satisfying them approximately against information provided by authors (specified in the same ambiguous natural language) Smeaton

17 No ‘Exact’ Science! Evaluation is not done analytically, but experimentally –real users (specifying requests) –test collections (real document collections) –benchmarks (TREC: text retrieval conference) –Precision –Recall –...

18 Known Item

19 Query

20 Results …

21 Query

22 Results …

23 Semantic gap… raw multimedia data features concepts ?

24 Observation Automatic approaches are successful under two conditions: –the query example is derived from the same source as the target objects –a domain-specific detector is at hand

25 1. Generic Detectors

26 Retrieval Process Database Query Parsing Detector / Feature selection Filtering Ranking Query type Nouns Adjectives Camera operations People, Names Natural/physical objects Invariant color spaces........

27 Parameterized detectors People detector Example Query text Topic 41 Results

28 Query Parsing Other examples of overhead zooming in views of canyons in the Western United States Nouns Adjectives The query type Find + names

29 Detectors Camera operations (pan, zoom, tilt, …) People (face based) Names (VideoOCR) Natural objects (color space selection) Physical objects (color space selection) Monologues (specifically designed) Press conferences (specifically designed) Interviews (specifically designed) Domain specific detectors FOCUSFOCUS The universe and everything

30 2. Domain knowledge

31 Player Segmentation Original image Initial segmentation Final segmentation

32 Show clips from tennis matches, starring Sampras, playing close to the net; Advanced Queries

33 3. Get to know your users

34 Mirror Approach Gather User’s Knowledge –Introduce semi-automatic processes for selection and combination of feature models Local Information –Relevance feedback from a user Global Information –Thesauri constructed from all users

35 N-dimensional space Feature extraction ClusteringRanking Concepts Display Thesauri

36 Low-level Features

37 Identify Groups

38 Representation Groups of feature vectors are conceptually equivalent to words in text retrieval So, techniques from text retrieval can now be applied to multimedia data as if these were text!

39 Query Formulation Clusters are internal representations, not suited for user interaction Use automatic query formulation based on global information (thesaurus) and local information (user feedback)

40 Interactive Query Process Select relevant clusters from thesaurus Search collection Improve results by adapting query –Remove clusters occuring in irrelevant images –Add clusters occuring in relevant images

41 Assign Semantics

42 Visual Thesaurus Correct cluster representing ‘Tree’, ‘Forest’ Glcm_47 ‘Incoherent’ cluster Fractal_23 Mis-labeled cluster Gabor_20

43 Learning Short-term: Adapt query to better reflect this user’s information need Long-term: Adapt thesaurus and clustering to improve system for all users

44 Thesaurus Only After Feedback

45 4. Nobody is unique!

46 Collaborative Filtering Also: social information filtering –Compare user judgments –Recommend differences between similar users People’s tastes are not randomly distributed You are what you buy (Amazon)

47 Collaborative Filtering Benefits over content-based approach –Overcomes problems with finding suitable features to represent e.g. art, music –Serendipity –Implicit mechanism for qualitative aspects like style Problems: large groups, broad domains

48 5. Ask for help

49 Query Articulation N-dimensional space Feature extraction How to articulate the query?

50 What is the query semantics ?

51 Details matter

52 Problem Statement Feature vectors capture ‘global’ aspects of the whole image Overall image characteristics dominate the feature-vectors Hypothesis: users are interested in details

53 Irrelevant Background QueryResult

54 Image Spots Image-spots articulate desired image details –Foreground/background colors –Colors forming ‘shapes’ –Enclosure of shapes by background colors Multi-spot queries define the spatial relations between a number of spots

55 Hist16HistSpotSpot+Hist 5968656319214 6274706222 6098710744 5953688833 6612703411 Results Query Images

56 A: Simple Spot Query `Black sky’

57 B: Articulated Multi-Spot Query `Black sky’ above `Monochrome ground’

58 C: Histogram Search in `Black Sky’ images 2-4: 14:

59 Complicating Factors What are Good Feature Models? What are Good Ranking Functions? Queries are Subjective!

60 Probabilistic Approaches

61 Generative Models… A statistical model for generating data –Probability distribution over samples in a given ‘language’ M P ( | M )= P ( | M ) P ( | M, ) © Victor Lavrenko, Aug. 2002 aka ‘Language Modelling’

62 Basic question: –What is the likelihood that this document is relevant to this query? P(rel|I,Q) = P(I,Q|rel)P(rel) / P(I,Q) … in Information Retrieval P(I,Q|rel) = P(Q|I,rel)P(I|rel)

63 ‘Language Modelling’ Not just ‘English’ But also, the language of –author –newspaper –text document –image Hiemstra or Robertson? ‘Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing.’

64 ‘Language Modelling’ Guardian or Times?Not just ‘English’ But also, the language of –author –newspaper –text document –image

65 ‘Language Modelling’ or ?Not just English! But also, the language of –author –newspaper –text document –image

66 Unigram and higher-order models Unigram Models N-gram Models Other Models –Grammar-based models, etc. –Mixture models = P ( ) P ( | ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | ) © Victor Lavrenko, Aug. 2002

67 The fundamental problem Usually we don’t know the model M –But have a sample representative of that model First estimate a model from a sample Then compute the observation probability P ( | M ( ) ) M © Victor Lavrenko, Aug. 2002

68 Indexing: determine models Indexing –Estimate Gaussian Mixture Models from images using EM –Based on feature vector with colour, texture and position information from pixel blocks –Fixed number of components DocsModels

69 Retrieval: use query likelihood Query: Which of the models is most likely to generate these 24 samples?

70 Probabilistic Image Retrieval ?

71 Query Rank by P(Q|M) P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 )

72 Topic Models Models P(Q|M 1 ) P(Q|M 4 ) P(Q|M 3 ) P(Q|M 2 ) Query P(D 1 |QM) P(D 2 |QM) P(D 3 |QM) P(D 4 |QM) Documents Query Model

73 Probabilistic Retrieval Model Text –Rank using probability of drawing query terms from document models Images –Rank using probability of drawing query blocks from document models Multi-modal –Rank using joint probability of drawing query samples from document models

74 Unigram Language Models (LM) –Urn metaphor Text Models P( ) ~ P ( ) P ( ) P ( ) P ( ) = 4/9 * 2/9 * 4/9 * 3/9 © Victor Lavrenko, Aug. 2002

75 Generative Models and IR Rank models (documents) by probability of generating the query Q: P( | ) = 4/9 * 2/9 * 4/9 * 3/9 = 96/9 P( | ) = 3/9 * 3/9 * 3/9 * 3/9 = 81/9 P( | ) = 2/9 * 3/9 * 2/9 * 4/9 = 48/9 P( | ) = 2/9 * 5/9 * 2/9 * 2/9 = 40/9

76 The Zero-frequency Problem Suppose some event not in our example –Model will assign zero probability to that event –And to any set of events involving the unseen event Happens frequently with language It is incorrect to infer zero probabilities –Especially when dealing with incomplete samples ?

77 Smoothing Idea: shift part of probability mass to unseen events Interpolation with background (General English) –Reflects expected frequency of events –Plays role of IDF – +(1- )

78 Hierarchical Language Model MNM Smoothed over multiple levels Alpha * P(T|Shot) + Beta * P(T|‘Scene’) + Gamma * P(T|Video) + (1–Alpha–Beta–Gamma) * P(T|Collection) Also common in XML retrieval –Element score smoothed with containing article

79 Image Models Urn metaphor not useful –Drawing pixels useless Pixels carry no semantics –Drawing pixel blocks not effective chances of drawing exact query blocks from document slim Use Gaussian Mixture Models (GMM) –Fixed number of Gaussian components/clusters/concepts

80 Key-frame representation Query model split colour channels Take samples Cr Cb Y DCT coefficients position EM algorithm 675912111941517-9-30001850154014-211 6617135-511315362-4011084454-201 12 668-7133-3015340-5000083733-30-2113 6651011245215340-500008290300014 669-5187-31-515340-50000833-5403 15

81 ? Image Models Expectation-Maximisation (EM) algorithm –iteratively estimate component assignments re-estimate component parameters

82 Component 1Component 2Component 3 Expectation Maximization E M

83 animation Component 1Component 2Component 3 E M


Download ppt "Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel."

Similar presentations


Ads by Google