DL:Lesson 11 Multimedia Search Luca Dini
MPEG-4: Content-based Encoding Encodes objects that can be tracked from frame to frame. Video frames are layers of video object planes (VOP). Each VOP is segmented & coded separately throughout the shot Background encoded only once. Objects are not defined as to what they represent, only their motion, shapes, colors and textures, allowing them to be tracked through time. Objects and their backgrounds are brought together again by the decoder.
MPEG-4: Content-based encoding Ghanbari, M. (1999) Video Coding: An Introduction to Standard Codecs Video object plane (VOP) Background encoded only once
AMOS: Tracking Objects Beyond the Frame
“Are We Doing Multimedia?”* Multimodal Indexing Ramesh Jain: “To solve multimedia problems, we should use as much context as we can.” – Visual (frames, shots, scenes) – Audio (soundtrack: speech recognition) – Text (closed captions, subtitles) – Context—hyperlinks, etc. *IEEE Multimedia. Oct-Nov
Snoek, C., Worring, M. Multimodal Indexing: A Review of the State-of-the-art. Multimedia Tools & Applications. January 2005 Settings, Objects, People Modalities: Video, audio, text
Building Video Indexes Same as any indexing process…decide: – What to index: granularity – How to index: modalities (images, audio, etc.) – Which features? Discover spatial and temporal structure: deconstructing the authoring process Construct data models for access
Building Video Indexes: Structured modeling Predict relationship between shots: Pattern recognition Hidden Markov Models SVM (support vector machines) Neural networks Relevance feedback via machine learning
Data Models for Video IR Based on text (DBMS, MARC) Semi-structured (video + XML or hypertext): MPEG-7, SMIL Based on context: Yahoo Video, Blinkx, Truveo Multimodal: Marvel, Virage
Virage VideoLogger TM SMPTE timecode Keyframes Text or audio extracted automatically Mark & annotate clips
Annotation: Metadata Schemes MPEG-7 MPEG-21 METS SMIL
IBM MPEG-7 Annotation Tool
MPEG-7 Output from IBM Annotation Tool - T00:00:27:20830F T00:00:31:23953F Indoors Duration of shot in frames Location and dimension of spatial locator in pixelsAnnotation
The MPEG group Motion Picture Expert Group Founded by ISO (International Standards Organization) in 1988 Four standards, MPEG 1, 2, 4 and 7
MPEG-1 Standard in 1992 Gave good quality audio and video Usually low resolution video with around 30 frames per second Three audio layers
MPEG-2 Standardized in 1996 The codec of DVD Very good quality audio and video Uses high resolution and high bit-rate
MPEG-4 Standardized in 1998 Based on MPEG-1, MPEG-2 and QuickTime First real multimedia representation standard Intended for videoconferences Several different versions
MPEG-7 Standardized in 2001 Not a video codec Called “Multimedia Content Description Interface” Utilizes the earlier MPEG Standards Developed to simplify search for media elements
Standardization Progress ITU-T ISO/IEC Joint ITU-T, ISO/IEC H.261 (1990) JPEG (1992) MPEG-1 (1992) MPEG-2 (1994) H.263 (1995) H.26L (2001) MPEG-4 (1999) MPEG-7 (2001) Application Areas Features Videophone PSTN, B-ISDN Low quality 64kbps ~ 1.5Mbps Video CD Internet VHS quality < 1.5 Mbps Stereo Audio Digital Broadcasting DVD Digital Camcoder High quality 1.5 ~ 80 Mbps 5.1 channel Audio Content Production Internet Multimedia Broadcast Various quality Synthetic Audio/Video User Interactivity Content Search Internet, DSM Broadcasting User Interactivity Data CompressionContent Manipulation
MPEG-7 Scope Diversity of Applications – Multimedia, Music/Audio, Graphics, Video Descriptors (Ds) – Describe basic characteristics of audiovisual content – Examples: Shape, Color, Texture, … Description Schemes (DSs) – Describe combinations of descriptors - Example: Spoken Content
Scope Description Production (extraction) Description Consumption Standard Description Normative part of MPEG-7 standard MPEG-7 does not specify -How to extract descriptions -How to use descriptions -The similarity between contents
Descriptions Annotations – cannot be deduced from content – recording date & conditions, author, copyright, viewing age, etc. Features – that is present in the content – low level features color, texture, shape, key, mood, tempo, etc. – high level features composition, event, action, situation, etc.
MPEG-7 Terminology Data – Audiovisual information that will be described using MPEG-7 Feature – A distinctive part or characteristic of data (ex. Color, shape,...) Descriptor – Associates a representation value to one or more features. Description Scheme – Defines a structure and semantics of descriptors and their relationships to model data content. Description Definition Language (DDL) – A language to specify Description Scheme Coded description – A representation of description allowing efficient storage and transmission
Components 1) MPEG-7 Systems 2) MPEG-7 Description Definition Language 3) MPEG-7 Visual 4) MPEG-7 Audio 5) MPEG-7 Multimedia DSs 6) MPEG-7 Reference Software 7) MPEG-7 Conformance
Visual Descriptors Color Descriptors Texture Descriptors Shape Descriptors Motion Descriptors for Video
Colors
Etc…