Content-based Video Indexing and Retrieval

Slides:



Advertisements
Similar presentations
Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Kien A. Hua Division of Computer Science University of Central Florida.
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
Image Information Retrieval Shaw-Ming Yang IST 497E 12/05/02.
Computer Science Engineering Lee Sang Seon.  Introduction  Basic notions for temporal video boundaries  Micro-Boundaries  Macro-Boundaries  Mega-Boundaries.
DL:Lesson 11 Multimedia Search Luca Dini
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Discussion on Video Analysis and Extraction, MPEG-4 and MPEG-7 Encoding and Decoding in Java, Java 3D, or OpenGL Presented by: Emmanuel Velasco City College.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
TERMS, CONCEPTS and DATA TYPES IN GIS Orhan Gündüz.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
ADVISE: Advanced Digital Video Information Segmentation Engine
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Segmentation Divide the image into segments. Each segment:
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
MPEG-7 Motion Descriptors. Reference ISO/IEC JTC1/SC29/WG11 N4031 ISO/IEC JTC1/SC29/WG11 N4062 MPEG-7 Visual Motion Descriptors (IEEE Transactions on.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Multimedia Database Systems Retrieval by Content Department of Informatics Aristotle University of Thessaloniki Fall-Winter 2008.
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.
Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Information Retrieval in Practice
TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.
Naresuan University Multimedia Paisarn Muneesawang
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
Multimedia Databases (MMDB)
Multimedia Information Retrieval
Multimedia Information Retrieval and Multimedia Data Mining Chengcui Zhang Assistant Professor Dept. of Computer and Information Science University of.
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
Content-Based Image Retrieval
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
Content-Based Image Retrieval QBIC Homepage The State Hermitage Museum db2www/qbicSearch.mac/qbic?selLang=English.
Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Wonjun Kim and Changick Kim, Member, IEEE
VISUAL INFORMATION RETRIEVAL Presented by Dipti Vaidya.
MULTIMEDIA DATA MODELS AND AUTHORING
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Digital Video Library - Jacky Ma.
Visual Information Retrieval
Automatic Video Shot Detection from MPEG Bit Stream
Introduction Multimedia initial focus
Presenter: Ibrahim A. Zedan
Multimedia Content-Based Retrieval
Data Compression.
OUTLINE Basic ideas of traditional retrieval systems
Histogram—Representation of Color Feature in Image Processing Yang, Li
Multimedia Information Retrieval
Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Example of Event-Based Video Data (Touch-down Scenario)
Presentation transcript:

Content-based Video Indexing and Retrieval

Motivation There is an amazing growth in the amount of digital video data in recent years. Lack of tools to classify and retrieve video content There exists a gap between low-level features and high-level semantic content. To let machine understand video is important and challenging.

Motivation Necessity of Video Database Management System Increase in the amount of video data captured. Efficient way to handle multimedia data. Traditional Databases Vs Video Databases Traditional Databases has tuple as basic unit of data. Video Databases has shot as basic unit of data.

Video Management Video consists of: Text Audio Images All change over time

Video Data Management Metadata-based method Text-based method Audio-based method Content-based method Integrated approach

Metadata-based Method Video is indexed and retrieved based on structured metadata information by using a traditional DBMS Metadata examples are the title, author, producer, director, date, types of video.

Text-based Method Video is indexed and retrieved based on associated subtitles (text) using traditional IR techniques for text documents. Transcripts and subtitles are already exist in many types of video such as news and movies, eliminating the need for manual annotation.

Text-based Method Basic method is to use human annotation Can be done automatically where subtitles / transcriptions exist BBC: 100% output subtitled by 2008 Speech recognition for archive material

Text-based Method Key word search based on subtitles Content based Live demo: http://km.doc.ic.ac.uk/vse/

Text-based Method

Audio-based Method Video is indexed and retrieved based on associated soundtracks using the methods for audio indexing and retrieval. Speech recognition is applied if necessary.

Content-based Method There are two approaches for content-based video retrieval: Treat video as a collection of images Divide video sequences into groups of similar frames

Integrated Approach Two or more of the above techniques are used as a combination in order to provide more flexibility in video retrieval.

Video Data Management Video Parsing Video Indexing Manipulation of whole video for breakdown into key frames. Video Indexing Retrieving information about the frame for indexing in a database. Video Retrieval and browsing Users access the db through queries or through interactions.

Video Parsing Scene: single dramatic event taken by a small number of related cameras. Shot: A sequence taken by a single camera Frame: A still image

Video Parsing Video Scenes Shots Frames Detection and identification of meaningful segments of video. Video Obvious Cuts Scenes Shot Boundary Analysis Shots Key Frame Analysis Frames

Video Parsing

System overview

Video Shot Definition A shot is a contiguous recording of one or more video frames depicting a contiguous action in time and space. During a shot, the camera may remain fixed, or may exhibit such motions as panning, tilting, zooming, tracking, etc.

Video Shot Detection Segmentation is a process for dividing a video sequence into shots. Consecutive frames on either side of a camera break generally display a significant quantitative change in content. We need a suitable quantitative measure that captures the difference between two frames.

Video Shot Detection Use of pixel differences: tend to be very sensitive to camera motion and minor illumination changes. Global histogram comparisons: produce relatively accurate results compared to others. Local histogram comparisons: produce the most accurate results compared to others. Use of motion vectors: produce more false positives than histogram-based methods. Use of the DCT coefficients from MPEG files: produce more false positives than histogram-based methods.

Shot Boundary Detection Frame Dissimilarity Normalized color histogram difference is adopted Measure of dissimilarity, or distance Shot Dissimilarity Minimum dissimilarity between any two frames of two shots Measure of Shot Dissimilarity D(Si , Sj) = mink,l D(ƒki , ƒlj) ƒki , frame k of shot i. Distance of Frames D(ƒi , ƒj) = ∑b|hib – hjb|/N hib , a given bin in the histogram of frame i, N , the total number of pixels in each frame.

Shot boundary detection Split video into meaningful segments Traditionally look at inter-frame differences Common problems Gradual changes Rapid motion Our solution Inspired by Pye et al, Zhang et al Moving average over greater range

Shot boundary detection At each frame, compute 4 distance measures – d2 , d4, d8, d16 across ranges of 2,4,8,16 frames respectively Coincident peaks indicate shot boundaries d4 difference used to find transition start/end times

SBD examples Cut Gradual

Video Indexing and Retrieval Based on representative frames Based on motion information Based on objects

Representative Frames The most common way of creating a shot index is to use a representative frame to represent each shot. Features of this frame are extracted and indexed based on color, shape, texture (as in image retrieval)

Representative Frames If shots are quite static, any frame within the shot can be used as a representative. Otherwise, more effective methods should be used to select the representative frame.

Representative Frames Two issues: How many frames must be selected from each shot. How to select these frames.

Representative Frames How many frames per shot? Three methods: One frame per shot. The method does not consider the length and content changes. The number of selected representatives depends on the length of the video shot. Content is not handled properly. Divide shots into subshots and select one representative frame from each subshot. Length and content are taken into account.

Representative Frames Now we know the number of representative frames per shot. The next step is to determine HOW to select these frames.

Representative Frames Definition: SEGMENT is a shot, a second of a video, a subshot

Representative Frames Method I The first frame is selected from the segment. This is based on the observation that a segment is usually described by using the first few frames.

Representative Frames Method II An average frame is defined so that each pixel in this frame is the average of pixel values at the same grid point in all the frames of the segment. Then the frame within the segment that is most similar to the average frame is selected as the representative.

Representative Frames Method III The histograms of all the frames in the segment are averaged. The frame whose histogram is closest to this average histogram is selected as the represenative frame of the segment.

Representative Frames Method IV Each frame is divided into background and foreground objects. A large background is then constructed from the background of all frames, and then the main foreground objects of all frames are superimposed onto the constructed background.

Foreground and Background Variance Method Overview Videos are divided into categories along with their shots. We calculate the Foreground Variance,Background Variance and Average Color of shots, and store them in the database. Shots are retrieved by comparing the Foreground Variance, Background variance and Average Color values. What is Background and Foreground? Background is the area that is outside the primary object. Foreground is the area where the primary object can be found.

Foreground and Background Variance Method Choosing Foreground and Background. W = C*(1/10). c w w w

Foreground and Background Variance Method Actual Method Steps for calculating Foreground Variance values Take each pixel of the Foreground area and access its individual Red, Green, and Blue values. Calculate Average Red, Average Green, and Average Blue Color values for Foreground.Repeat the above process for all the frames of the shot and calculate Average Red, Average Green, and Average Blue Color values. Using the above Foreground values of all the frames of a shot, we calculate the Variance of Red, Green, and Blue.

Foreground and Background Variance Method Actual Method Steps for calculating Foreground Variance values The Formula for calculating the Variance of Red Color for Foreground is VFgRed = ∑ (Xi – Mean)2/(N-1), where Xi = Average values of Red Color of all the frames for Foreground, and N = Total No of frames. The same process in the above step is repeated for Green and Blue and we find VFgGreen and VFgBlue In the similar lines we find the Background Variance values VBgRed, VBgGreen, and VBgBlue.

Foreground and Background Variance Method Actual Method Steps for calculating Average Color values. Access each pixel of each frame and calculate individual color values. Add up all the individual Red, Green and Blue values of each pixel separately. For calculating Average Red Color for one frame, divide the sum of all the red pixel values by the total No of pixels in one frame. For calculating the Average Red Color values for the entire shot, we divide the sum of all the Red Color values of individual frames by the total No of frames. Thus we will be getting AvgRed.

Foreground and Background Variance Method Actual Method Steps for calculating Average Color values. Similarly we calculate the AvgGreen, and AvgBlue values for the entire shot. We have a total of nine different variables and we store these values in the Database. For retrieving similar shots, we compare the above nine values in the Database to the corresponding values of the query shot.

Foreground and Background Variance Method Actual Method We compare the Foreground, Background, and the Average Color values using the formula Ri = √((∆1-∂1)2 + (∆2-∂2)2 + (∆3-∂3)2) where ∆1, ∆2, and ∆3 are Database values. where ∂1, ∂2, and ∂3 are query shot values. We add up all the Ri values after comparing the Foreground, Background and Average Color values. If that values is less than 100, then that shot is retrieved and displayed. The shots are displayed in the increasing order of their closeness to the query shot.

Motion Information Motivation Indexing and retrieval based on representative frames ignores the motion information contained in a video segment.

Motion Information The following parameters are used: Motion content Motion uniformity Motion panning Motion tilting

Motion Information (content) This a measure of the total amount of motion within a given video. It measures the action content of the video. For example, a talking person video has a very small motion content, while a violent car explosion typically has high motion content.

Motion Information (uniformity) This is a measure of the smoothness of the motion within a video as a function of time.

Motion Information (panning) This measure captures the panning motion (left to right, right to left motion of a camera).

Motion Information (tilting) This is a measure of the vertical motion component of the motion within a video. Panning shots have a lower value than a video with large amount of vertical motion.

Motion Information The above measures are associated either with the entire video or with each shot of the video.

Object-based Retrieval Motivation The major drawback of shot-based video indexing is that while the shot is the smallest unit in the video sequence, it does not lend itself directly to content-based representation.

Object-based Retrieval Any given scene is a complex collection of parts or objects. The location and physical qualities of each object, as well as the interaction with others, define the content of the scene. Object-based techniques try to identify objects and relationships among these objects.

Object-based Retrieval In a still image object segmentation and identification is normally a difficult task. In a video sequence, an object moves as a whole. Therefore, we can group pixels that move together into an object. Object segmentation is quite accurate by using the above idea.

Object-based Retrieval Object-based video indexing and retrieval can be performed easily when video is compressed using the MPEG-4 object-based coding standard. An MPEG-4 video is composed of one or more VOs. A VO consists of one ore more video object layers (VOLs).

An Architecture for Video Database System Spatio-Temporal Semantics: Formal Specification of Event/Activity/Episode for Content-Based Retrieval Object Definitions (Events/Concepts) Inter-Object Movement (Analysis) Intra/Inter-Frame Analysis (Motion Analysis) Spatial-Semantics of Objects (human,building,…) Semantic Association (President, Capitol,...) Object Identification and Tracking Physical Object Database Image Features Sequence of Frames (indexed) Raw Image Database Object Description Temporal Abstraction Raw Video Database Frame Spatial Abstraction

Conclusion Video indexing and retrieval is very important in multimedia database management. Video contains more information than other media types (text, audio, images). Methods: representative frames, motion information, object-based retrieval.