Multimedia Information Retrieval

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Multimedia Database Systems
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.
Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.
DL:Lesson 11 Multimedia Search Luca Dini
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
A presentation by Modupe Omueti For CMPT 820:Multimedia Systems
Discussion on Video Analysis and Extraction, MPEG-4 and MPEG-7 Encoding and Decoding in Java, Java 3D, or OpenGL Presented by: Emmanuel Velasco City College.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
ISP 433/633 Week 5 Multimedia IR. Goals –Increase access to media content –Decrease effort in media handling and reuse –Improve usefulness of media content.
T.Sharon 1 Internet Resources Discovery (IRD) Introduction to MMIR.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Presented by Zeehasham Rasheed
CS292 Computational Vision and Language Visual Features - Colour and Texture.
CH 11 Multimedia IR: Models and Languages
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Overview of Search Engines
Content-based Image Retrieval (CBIR)
Information Retrieval in Practice
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Computer vision.
Multimedia and Time-series Data
Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
Multimedia Databases (MMDB)
Image and Video Retrieval INST 734 Doug Oard Module 13.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Multimedia Information Retrieval
Image Retrieval Part I (Introduction). 2 Image Understanding Functions Image indexing similarity matching image retrieval (content-based method)
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
IBM QBIC: Query by Image and Video Content Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC 28223
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Image Classification for Automatic Annotation
Autonomous Robots Vision © Manfred Huber 2014.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
Content-Based Image Retrieval QBIC Homepage The State Hermitage Museum db2www/qbicSearch.mac/qbic?selLang=English.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1/12/ Multimedia Data Mining. Multimedia data types any type of information medium that can be represented, processed, stored and transmitted over.
MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
VISUAL INFORMATION RETRIEVAL Presented by Dipti Vaidya.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
MPEG 7 &MPEG 21.
Digital Video Library - Jacky Ma.
Visual Information Retrieval
Automatic Video Shot Detection from MPEG Bit Stream
Introduction Multimedia initial focus
Multimedia Content-Based Retrieval
Multimedia Information Retrieval
Multimedia Content Description Interface
Multimedia Information Retrieval
Presentation transcript:

Multimedia Information Retrieval Modern Information Retrieval Course Computer Engineering Department Sharif University of Technology Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Support variety of data Different kinds of media Image Graph,… Audio Music, speech,… Video Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 MMIR Motivations Content, content, and more content … How to get what is needed ? Increasing availability of multimedia information Difficult to find, select, filter, manage AV content More and more situations where it is necessary to have ‘information about the content’ Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Key Issues in MMIR Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Goals Want to make multimedia content searchable like text information, Because the value of content depends on how easy it is to find, filter, manage, and use it. Need content description method beyond simple text annotation Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 MMIR Approaches Text Based MMIR Content Based MMIR Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Text-Based Retrieval based on text associated with the file URL: http://www.host.com/animals/dogs/poodle.gif Alt text: <img src=URL alt="picture of poodle"> Hyperlink text: <a href=URL>Sally the poodle</a> Sharif University, Modern Information Retrieval Course, Spring 2006

Text-based Search Engines Indexing based on text in the container webpage Http://www.google.com Http://www.ditto.com … Sharif University, Modern Information Retrieval Course, Spring 2006

Keyword-based System User Video Database Automatic Annotation Keyword Information Need Including filename, video title, caption, related web page Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Why this happens? Most of these search engines are keyword based Have to represent your idea in keywords These keywords are expected to appear in the filename, or corresponding webpage Sharif University, Modern Information Retrieval Course, Spring 2006

Image: The Google Approach How does image search work? Google analyzes the text on the page adjacent to the image, the image caption and dozens of other factors to determine the image content. Google also uses sophisticated algorithms to remove duplicates and ensure that the highest quality images are presented first in your results. Examples Campanile tcd Cliffs of Moher Recall may not be great… Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Google image search Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Google Image Search Sharif University, Modern Information Retrieval Course, Spring 2006

Problems with Text-Based The text in the ALT tag has to be done manually Expensive Time consuming It is incomplete and subjective Some features are difficult to define in text such as texture or object shape Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Therefore…… Unable to handle semantic meaning of images Unable to handle visual position Unable to handle time information Unable to use images as query ………. Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 So … Better for simple concepts e.g. A picture of a giraffe Don’t work for complex queries e.g. A picture of a brick home with black shutters and white pillars, with a pickup truck in front of it (image) Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Architecture for Multimedia Retrieval AV Description Feature extraction Manual / automatic Storage Transmission Encoding (for transmission) Decoding Conf. points Search / query Pull Browse Filter Push Human or machine Sharif University, Modern Information Retrieval Course, Spring 2006

Query-retrieval matrix humming examples speech sketch sound stills text query doc Example  conventional text retrieval text video images speech music sketches multimedia  you roar and get a wildlife documentary  type “floods” and get BBC radio news  hum a tune and get a music piece Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Main Components Feature Extraction & Analysis Description Schemes Searching & Filtering Examples: IBM’s Query By Image Content (QBIC) Virages’s VIR Image Engine Online http://collage.nhil.com/ Sharif University, Modern Information Retrieval Course, Spring 2006

Internal representation Using attributes is not sufficient Feature Information extracted from objects Multimedia object is represented as a set of features Features can be assigned manually, automatically, or using a hybrid approach Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Features for MMIR high-level features words and phrases from text, speech recognition medium-level features face detector, regions classifiers, outdoor etc low-level features Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives Sharif University, Modern Information Retrieval Course, Spring 2006

Internal representation Values of some specific features are assigned to a object by comparing the object with some previously classified objects Feature extraction cannot be precise A weight is usually assigned to each feature value representing the uncertainty of assigning such a value to that feature 80% sure that a shape is a square Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

MMIR Model’s Main Components Query Language Indexing and Searching Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Query languages In designing a multimedia query language, two main aspects require attention How the user enters his/her request to the system Which conditions on multimedia objects can be specified in the user request Sharif University, Modern Information Retrieval Course, Spring 2006

Request specification Interfaces Browsing and navigation Specifying the conditions the objects of interest must satisfy, by means of queries Queries can be specified in two different ways Using a specific query language Query by example Using actual data (object example) Sharif University, Modern Information Retrieval Course, Spring 2006

Conditions on multimedia data Query predicates Attribute predicates Concern the attributes for which an exact value is supplied for each object Exact-match retrieval Structural predicates Concern the structure of multimedia objects Can be answered by metadata and information about the database schema “Find all multimedia objects containing at least one image and a video clip” Sharif University, Modern Information Retrieval Course, Spring 2006

Conditions on multimedia data Semantic predicates Concern the semantic content of the required data, depending on the features that have been extracted and stored for each multimedia object “Find all the red houses” Exact match cannot be applied Sharif University, Modern Information Retrieval Course, Spring 2006

Indexing and searching Searching similar patterns Distance function Given two objects, O1 and O2, the distance (=dissimilarity) of the two objects is denoted by D(O1,O2) Similarity queries Whole match Sub-pattern match Nearest neighbors All pairs Sharif University, Modern Information Retrieval Course, Spring 2006

Spatial access methods Map objects into points in f-D space, and to use multiattribute access methods (also referred to as spatial access methods or SAMs) to cluster them and to search for them Methods R*-trees and the rest of the R-tree family Linear quadtrees Grid-files Linear quadtrees and grid files explode exponentially with the dimensionality Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 R-tree R-tree Represent a spatial object by its minimum bounding rectangle (MBR) Data rectangles are grouped to form parent nodes (recursively grouped) The MBR of a parent node completely contains the MBRs of its children MBRs are allowed to overlap Nodes of the tree correspond to disk pages Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Visual Features ... Colour Shape Texture Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Histograms Greyscale histogram of image A Assuming 256 intensity levels hA(l) (l=1  256) hA(l) =#{(i,j)|A(i,j)=l, i = 1  m, for j = 1  n} i.e. a count of the number of pixels at each level Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Colour Histogram Describe the colors and its percentages in an image. Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Texture Matching Texture characterizes small-scale regularity Color describes pixels, texture describes regions Described by several types of features e.g., smoothness, periodicity, directionality Perform weighted vector space matching Usually in combination with a color histogram Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Texture Test Patterns Sharif University, Modern Information Retrieval Course, Spring 2006

Image Retrieval using low level features See IBM demos at: http://wwwqbic.almaden.ibm.com/ http://mp7.watson.ibm.com/ (video) Hermitage Museum www.hermitagemuseum.org Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Berkeley Blobworld Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Berkeley Blobworld Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 But….. Low-level feature doesn’t work in all the cases Sharif University, Modern Information Retrieval Course, Spring 2006

Solution: Regional Low-level Image Feature Segmentation into objects Extract low-level features from each regions Sharif University, Modern Information Retrieval Course, Spring 2006

Solution: High-level Image Feature Objects: Persons, Roads, Cars, Skies… Scenes: Indoors, Outdoors, Cityscape, Landscape, Water, Office, Factory… Event: Parade, Explosion, Picnic, Playing Soccer… Generated from low-level features Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Audio Genres Important types of audio data Speech-centered Radio programs Telephone conversations Recorded meetings Music-centered Instrumental, vocal Other sources Alarms, instrumentation, surveillance, … Sharif University, Modern Information Retrieval Course, Spring 2006

Speech-based Documents Radio/TV news retrieval. Search archival radio/news broadcasts. Video and audio email. Knowledge management : transfert of tacit knowledge to others. Search audio archives of meetings, lectures, etc… Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Preamble Two utterances of the same words by the same person under the same conditions generate very different waveforms. Variations due to loudness, pitch, brightness, bandwidth, harmonisity, and others are all continuous variables and are equivalent to color and texture in images. Sharif University, Modern Information Retrieval Course, Spring 2006

Detectable Speech Features Content Phonemes, one-best word recognition, n-best Identity Speaker identification, speaker segmentation Language Language, dialect, accent Other measurable parameters Time, duration, channel, environment Sharif University, Modern Information Retrieval Course, Spring 2006

How Speech Recognition Works Three stages What sounds were made? Convert from waveform to subword units (phonemes) How could the sounds be grouped into words? Identify the most probable word segmentation points Which of the possible words were spoken? Based on likelihood of possible multiword sequences All three stages are learned from training data Using hill climbing (a “Hidden Markov Model”) Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Speech Recognition Phoneme n-grams One-best phoneme transcription Phoneme Detection N-best phoneme sequences Phoneme lattices Phoneme transcription dictionary Word Construction One-best word transcript Word n-gram language model Word Selection Words Sharif University, Modern Information Retrieval Course, Spring 2006

Music and audio analysis Music is a large and extremely variable audio class. The range of sounds is large, from music genres to animal cries to synthesizer samples. Any of the above can and will occur in combination. Sharif University, Modern Information Retrieval Course, Spring 2006

Audio retrieval-by-content Require some measure of audio similarity. Most approaches to general audio retrieval take a perceptual approach, using measures such as loudness. Neural net to map a sound clip to a text description : An obvious drawback is the subjective nature of audio description. Sharif University, Modern Information Retrieval Course, Spring 2006

Sample system: Muscle fish To analyze sound files for a specific set of psychoacoustic features. This results in a vector of attributes that include loudness, pitch, bandwidth and harmonicity. Given enough training samples, a Gaussian classifier can be constructed, or for retrieval. Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 An Euclidean distance is used as a measure of similarity. For retrieval, the distance is computed between a given sound example and all other sound examples (about 400 in the demonstration). Sounds are ranked by distance, with the closer ones being more similar. Sharif University, Modern Information Retrieval Course, Spring 2006

Music and MIDI retrieval Using archives of MIDI files, which are score-like representations of music intended for musical synthesizers or sequencers. Given a melodic query, the MIDI files can be searched for similar melodies. Sharif University, Modern Information Retrieval Course, Spring 2006

Polyphonic Music Indexing Technique n-grams encode music as text strings using pitch and onsets index text words with text search engine process query in the same way application: eg, Query by Humming Sharif University, Modern Information Retrieval Course, Spring 2006

Monophonic pitch n-gramming Interval: 0 +7 0 +2 0 -2 0 -2 0 [0 +7 0 +2] [+7 0 +2 0] ZGZB [0 +2 0 -2] GZBZ ZBZb Example: musical strings with interval-only representation Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Application Increasing demand for visual information retrieval Retrieve useful information from databases Sharing and distributing video data through computer networks Example: BBC BBC archive has +500k queries plus 1M new items … per year; From the BBC … Police car with blue light flashing Government plan to improve reading standards Two shot of Kenneth Clarke and William Hague Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Video Search Active Research Area Sharif University, Modern Information Retrieval Course, Spring 2006

Video Search: Features Texture One of the earliest Image features [Harlick et al 70s] Co-occurrence matrix Orientation and distance on gray-scale pixels Contrast, inverse deference moment, and entropy [Gotlieb & Kreyszig] Human visual texture properties: coarseness, contrast, directionality, likeliness, regularity and roughness [Tamura et al] Wavelet Transforms [90s] [Smith & Chang] extracted mean and variance from wavelet subbands Gabor Filters And so on Region Segmentation Partition image into regions Strong Segmentation: Object segmentation is difficult. Weak segmentation: Region segmentation based on some homegenity criteria Scene Segmentation Shot detection, scene detection Look for changes in color, texture, brightness Context based scene segmentation applied to certain categories such as broadcast news Color Robust to background Independent of size, orientation Color Histogram [Swain & Ballard] “Sensitive to noise and sparse”- Cumulative Histograms [Stricker & Orgengo] Color Moments Color Sets: Map RGB Color space to Hue Saturation Value, & quantize [Smith, Chang] Color layout- local color features by dividing image into regions Color Autocorrelograms Sharif University, Modern Information Retrieval Course, Spring 2006

Video Search: Features Face Face detection is highly reliable - Neural Networks [Rwoley] - Wavelet based histograms of facial features [Schneiderman] Face recognition for video is still a challenging problem. - EigenFaces: Extract eigenvectors and use as feature space OCR OCR is fairly successful technology. Accurate, especially with good matching vocabularies. Script recognition still an open problem. ASR Automatic speech recognition fairly accurate for medium to large vocabulary broadcast type data Large number of available speech vendors. Still open for free conversational speech in noisy conditions. Shape Outer Boundary based vs. region based Fourier descriptors Moment invariants Finite Element Method (Stiffness matrix- how each point is connected to others; Eigen vectors of matrix) Turing function based (similar to Fourier descriptor) convex/concave polygons[Arkin et al] Wavelet transforms leverages multiresolution [Chuang & Kao] Chamfer matching for comparing 2 shapes (linear dimension rather than area) 3-D object representations using similar invariant features Well-known edge detection algorithms. Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Video Structures Image structure Absolute positioning, relative positioning Object motion Translation, rotation Camera motion Pan, zoom, perspective change Shot transitions Cut, fade, dissolve, … Sharif University, Modern Information Retrieval Course, Spring 2006 7

Typical Retrieval Framework User : provide query information that represents his information needs Database: store a large collection of video data Goal: Find the most relevant shots from the database Shots: “paragraph” in video, typically 20 – 40 seconds, which is the basic unit of video retrieval Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Bridging the Gap Video Database User Result Sharif University, Modern Information Retrieval Course, Spring 2006

Automatically Structure Video Data The first step for video retrieval: Video “programmes” are structured into logical scenes, and physical shots If dealing with text, then the structure is obvious: paragraph, section, topic, page, etc. All text-based indexing, retrieval, linking, etc. builds upon this structure; Automatic shot boundary detection and selection of representative keyframes is usually the first step; Sharif University, Modern Information Retrieval Course, Spring 2006

Typical automatic structuring of video a video document A set of shots Keyframe browser combined with transcript or object-based search Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Ideal solution Video Database User Information Need Video Structure Understanding the semantic meaning and retrieve Result Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Ideal solution However, Hard to represent query in natural language and for computer to understand Computers have no experience Other representation restriction like position, time Video Database User Information Need Video Structure Understanding the semantic meaning and retrieve Result Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Alternative Solution Video Database User Provide evidence of relevant information ( text, image, audio) Information Need Video Structure Match and combine Result Sharif University, Modern Information Retrieval Course, Spring 2006

Evidence-based Retrieval System General framework for current video retrieval system Video retrieval based on the evidence from both users and database, including Text information Image information Motion information Audio information Return a relevant score for each evidence Combination of the scores Sharif University, Modern Information Retrieval Course, Spring 2006

Keyword-based System Video Database User Automatic Annotation Keyword Information Need Video Structure Including filename, video title, caption, related web page Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Keyword-based System Video Database User Automatic Annotation Keyword Information Need Video Structure Manual Annotation Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Manual Annotation Manually creating annotation/keywords for image / video data Examples: Gettyimage.com (image retrieval) Pros: Represent the semantic meaning of video Cons Time-consuming, labor-intensive Keyword is not enough to represent information need Sharif University, Modern Information Retrieval Course, Spring 2006

Speech and OCR transcription Video Database User Annotation Keyword Information Need Video Structure Speech Transcription OCR Transcription Sharif University, Modern Information Retrieval Course, Spring 2006

Query using speech/OCR information Find pictures of Harry Hertz, Director of the National Quality Program, NIST Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular … OCR: H,arry Hertz a Director aro 7 wa-,i,,ty Program ,Harry Hertz a Director Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 What we lack? Video Database User Annotation Keyword Information Need Video Structure Speech Transcription Image Information OCR Transcription Sharif University, Modern Information Retrieval Course, Spring 2006

Image-based Retrieval Video Database User Text Information Keyword Information Need Video Structure Image Feature Query Images Sharif University, Modern Information Retrieval Course, Spring 2006

Image-based Retrieval Video Database User Text Information Keyword Information Need Video Structure Image Feature Query Images Low-level Feature High-level Feature Sharif University, Modern Information Retrieval Course, Spring 2006

More Evidence in Video Retrieval Video Database User Text Information Keyword Information Need Video Structure Image Information Query Images Motion Information Motion Audio Information Audio Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 MPEG-7: The Objective Standardize object-based description tools for various types of audiovisual information, allowing fast and efficient content searching, filtering and identification, and addressing a large range of applications. New objective for MPEG: MPEG-1, -2 and -4 represent the content itself (‘the bits’) MPEG-7 should represent information about the content (‘the bits about the bits’) Sharif University, Modern Information Retrieval Course, Spring 2006

This is the scope of MPEG-7 Description creation Description consumption description Not the description creation Not the description consumption Just the description ! The goal is to define the minimum that enables interoperability. Sharif University, Modern Information Retrieval Course, Spring 2006

MPEG-7 Terminology: Descriptor Descriptor (D) : A Descriptor is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation. Examples: Feature Descriptor Color Histogram of Y,U,V components Shape ART moments Motion Motion field, coefficients of a model Audio frequency Average frequency components Title Text Annotation Text Genre Text, index in as thesaurus Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Outline Introduction Text-Based MMIR Content-Based Retrieval Multimedia IR Model Image Retrieval Audio Retrieval Video Retrieval Conclusions Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions Simple image retrieval is commercially available Color histograms, texture, limited shape information Segmentation-based retrieval is still in the lab Keep an eye on the Berkeley group Limited audio indexing is practical now Audio feature matching, answering machine detection Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions Multimedia IR Text: good solutions exist Video, Image, Sound – a lot of work to do. Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions The goal of content-based video retrieval is to build more intelligent video retrieval engine via semantic meaning Many applications in daily life Combine evidence from different aspects Hot research topic, few business system State-of-the-art performance is still unacceptable for normal users, space to improve Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions Problems with Content-Based MMIR Must have an example image Example image is 2-D Hence only that view of the object will be returned Large amount of image data Similar colour histogram does not equal similar image Usually the best results come from a combination of both text and content searching For example if we give in a side view image of a horse it will not return images from the front or behind Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions Combination of multi-modal results Difference characteristics between multi-modal information Text-based Information: better for middle and high level queries Image-based Information: better for low and middle level queries Combination of multi-modal information Sharif University, Modern Information Retrieval Course, Spring 2006

Sharif University, Modern Information Retrieval Course, Spring 2006 Conclusions Challenging research questions Draws on computer vision, audio processing, natural language analysis, unstructured document analysis, information retrieval, information visualisation, computer human interaction, artificial intelligence Sharif University, Modern Information Retrieval Course, Spring 2006