DL:Lesson 11 Multimedia Search Luca Dini

DL:Lesson 11 Multimedia Search Luca Dini dini@celi.it

MPEG-4: Content-based Encoding Encodes objects that can be tracked from frame to frame. Video frames are layers of video object planes (VOP). Each VOP is segmented & coded separately throughout the shot Background encoded only once. Objects are not defined as to what they represent, only their motion, shapes, colors and textures, allowing them to be tracked through time. Objects and their backgrounds are brought together again by the decoder.

MPEG-4: Content-based encoding Ghanbari, M. (1999) Video Coding: An Introduction to Standard Codecs Video object plane (VOP) Background encoded only once

AMOS: Tracking Objects Beyond the Frame http://www.ctr.columbia.edu/~dzhong/rtrack/demo.htm

“Are We Doing Multimedia?”* Multimodal Indexing Ramesh Jain: “To solve multimedia problems, we should use as much context as we can.” – Visual (frames, shots, scenes) – Audio (soundtrack: speech recognition) – Text (closed captions, subtitles) – Context—hyperlinks, etc. *IEEE Multimedia. Oct-Nov. 2003 http://jain.faculty.gatech.edu/media_vision/doing_mm.pdf

Snoek, C., Worring, M. Multimodal Indexing: A Review of the State-of-the-art. Multimedia Tools & Applications. January 2005 Settings, Objects, People Modalities: Video, audio, text

Building Video Indexes Same as any indexing process…decide: – What to index: granularity – How to index: modalities (images, audio, etc.) – Which features? Discover spatial and temporal structure: deconstructing the authoring process Construct data models for access

Building Video Indexes: Structured modeling Predict relationship between shots: Pattern recognition Hidden Markov Models SVM (support vector machines) Neural networks Relevance feedback via machine learning

Data Models for Video IR Based on text (DBMS, MARC) Semi-structured (video + XML or hypertext): MPEG-7, SMIL Based on context: Yahoo Video, Blinkx, Truveo Multimodal: Marvel, Virage

Virage VideoLogger TM SMPTE timecode Keyframes Text or audio extracted automatically Mark & annotate clips

Annotation: Metadata Schemes MPEG-7 MPEG-21 METS SMIL

IBM MPEG-7 Annotation Tool

MPEG-7 Output from IBM Annotation Tool - T00:00:27:20830F30000 248 - T00:00:31:23953F30000 - Indoors - 14 15 351 238 Duration of shot in frames Location and dimension of spatial locator in pixelsAnnotation

The MPEG group Motion Picture Expert Group Founded by ISO (International Standards Organization) in 1988 Four standards, MPEG 1, 2, 4 and 7

MPEG-1 Standard in 1992 Gave good quality audio and video Usually low resolution video with around 30 frames per second Three audio layers

MPEG-2 Standardized in 1996 The codec of DVD Very good quality audio and video Uses high resolution and high bit-rate

MPEG-4 Standardized in 1998 Based on MPEG-1, MPEG-2 and QuickTime First real multimedia representation standard Intended for videoconferences Several different versions

MPEG-7 Standardized in 2001 Not a video codec Called “Multimedia Content Description Interface” Utilizes the earlier MPEG Standards Developed to simplify search for media elements

Standardization Progress ITU-T ISO/IEC Joint ITU-T, ISO/IEC H.261 (1990) JPEG (1992) MPEG-1 (1992) MPEG-2 (1994) H.263 (1995) H.26L (2001) MPEG-4 (1999) MPEG-7 (2001) Application Areas Features Videophone PSTN, B-ISDN Low quality 64kbps ~ 1.5Mbps Video CD Internet VHS quality < 1.5 Mbps Stereo Audio Digital Broadcasting DVD Digital Camcoder High quality 1.5 ~ 80 Mbps 5.1 channel Audio Content Production Internet Multimedia Broadcast Various quality Synthetic Audio/Video User Interactivity Content Search Internet, DSM Broadcasting User Interactivity Data CompressionContent Manipulation

MPEG-7 Scope Diversity of Applications – Multimedia, Music/Audio, Graphics, Video Descriptors (Ds) – Describe basic characteristics of audiovisual content – Examples: Shape, Color, Texture, … Description Schemes (DSs) – Describe combinations of descriptors - Example: Spoken Content

Scope Description Production (extraction) Description Consumption Standard Description Normative part of MPEG-7 standard MPEG-7 does not specify -How to extract descriptions -How to use descriptions -The similarity between contents

Descriptions Annotations – cannot be deduced from content – recording date & conditions, author, copyright, viewing age, etc. Features – that is present in the content – low level features color, texture, shape, key, mood, tempo, etc. – high level features composition, event, action, situation, etc.

MPEG-7 Terminology Data – Audiovisual information that will be described using MPEG-7 Feature – A distinctive part or characteristic of data (ex. Color, shape,...) Descriptor – Associates a representation value to one or more features. Description Scheme – Defines a structure and semantics of descriptors and their relationships to model data content. Description Definition Language (DDL) – A language to specify Description Scheme Coded description – A representation of description allowing efficient storage and transmission

Components 1) MPEG-7 Systems 2) MPEG-7 Description Definition Language 3) MPEG-7 Visual 4) MPEG-7 Audio 5) MPEG-7 Multimedia DSs 6) MPEG-7 Reference Software 7) MPEG-7 Conformance

Visual Descriptors Color Descriptors Texture Descriptors Shape Descriptors Motion Descriptors for Video

Colors

Etc… http://mp7.watson.ibm.com/marvel/

DL:Lesson 11 Multimedia Search Luca Dini

Similar presentations

Presentation on theme: "DL:Lesson 11 Multimedia Search Luca Dini"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DL:Lesson 11 Multimedia Search Luca Dini

Similar presentations

Presentation on theme: "DL:Lesson 11 Multimedia Search Luca Dini"— Presentation transcript:

Similar presentations

About project

Feedback