Video Google – A google approach to Video Retrieval

Video Google – A google approach to Video Retrieval

Effectively precompute matches Textual analogy
Introduction Problem: Retrieve key frames and shots that of a video containing a particular object or scene with the ease and accuracy of Google. Approach: Effectively precompute matches Textual analogy 2

Architecture Visual Word User-End Storage Indexing 3

Dhruvan Dileep Nishant Pradeep Pramod Sunil
Video Google –Visual words Dhruvan Dileep Nishant Pradeep Pramod Sunil

MSER Maximally Stable Extremal Regions
A Maximally Stable Extremal Region (MSER) is a connected component of an appropriately thresholded image

SA The Shape Adapted regions are invariant to affine transformations.
The SA regions tend to be centered on corner like features.

SIFT Scale Invariant Feature Transform
Invariant to image scaling and rotation Partially invariant to changes in illumination and viewpoint 128 dimensional descriptor

Clustering Clustering Techniques Agglomerative O(n2) space. Kmeans
Aim : To vector quantize descriptors into clusters to be used as Visual words Clustering Techniques Agglomerative O(n2) space. Kmeans O(n+k) space, O(n*k*e) time complexity Fast Kmeans Triangulation inequality used. O(n*k) space. Distance calculations reduced to ~ n than n*k*e

Statistics 19 Half an hour videos: Classification points – 9 hours Points Clusters Time SA 102823 4000 3hr 45 mins MSER 508191 2000 2hr 50 mins

Clustering Evaluation

DB and API Indexing/Retrieval Vid_id Frame_id Pos_x Pos_y V_id Visual Words UI

Results

Future Work Vocabulary Tree for interest point classification Increase the visual vocabulary through efficient clustering.

Indexing and Retrieval in Vgoogle
D Pavan Kumar B Rakesh Babu B Naveen Kumar Ankur Jaiswal V Sreekanth P Kowshik J Shashank

Overview Visual Words Indexing Results Query

Set of visual words in the query rectangle
Input format Pre-processing Video Id Frame Id pos_x pos_y Visual word Id Query Set of visual words in the query rectangle

Output format Retrieved Results Rank Video Id Frame Id

Objectives Efficient Indexing Fast Retrieval Time Good Recall

Approach … Removing the common words Reverse Indexing
Ranking of results

Indexing and Retrieval in Document Retrieval
Stop list Used to remove the common words. Inverse File Structure An entry for each word in the corpus followed by a list of all the documents in which it appears. Spatial Consistency Ranking Use the ordering and separation of words to calculate the relevance of a document.

Stop list In textual context
Words are extracted from text. Words are filtered based on the level of usefulness. For instance words which are independent of subject or event being described are filtered out. Removing such words will have no effect on the results. E.g.: The way the school is long and hard when walking in the rain. Removing `the` will have no effect on the result.

Stop list (contd.) In the current context
Stop list - list of visual words. Occur very often or very less. Determine stop list boundaries empirically. Advantages Reduce number of mismatches Reduce size of inverted file Meaningful visual vocabulary

Stop list (contd…)

Inverse File Structure
Inverted File structure for Indexing Popular DS in Document Retrieval Mapping from words to Document Less query time compared to Forward indexing Forward Indexing – Sequential Inverted Indexing – Random

D1051 D3 D23 D25 D1 D1 D3 D8 ……. D2029 D2 D8 D100 ……. ……. ……. ……. …….
Words D1051 Movie D3 D23 D25 D1 Spain D1 D3 D8 ……. D2029 Table D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Song

Visual Analogy Words ~ Visual Words Documents ~ Frames
Query vector ~ visual words in Sub-Part of frame

D1051 V1 D3 D23 D25 D1 V2 D1 D3 D8 ……. D2029 V3 D2 D8 D100 ……. ……. …….
Visual words D1051 V1 D3 D23 D25 D1 V2 D1 D3 D8 ……. D2029 V3 D2 D8 D100 ……. ……. ……. ……. ……. D12 D1078 D102 D25 Vn

Ranking the results - tf-idf
Document – vector of word frequencies Each component of the vector is given some weight Standard Weighting Method TF-IDF

Ranking the results - tf-idf
Each document is represented as a vector < t1, t2, t3, … ti,…, tk-1, tk > nid - number of occurrences of ith word in document d. nd total number of words in document d. ni number of occurrences of ith visual word in whole database. N number of documents in the whole database IDF – down weights most frequent words Ranked by cosine of angle between query vector and all document vectors.

Ranking the results – Spatial Consistency
There it is. That’s what I ….. been .... have …. I have been there once , while …….. “Google increases the probability of documents having all the search words close to one another"

Spatial Consistency Ranking
Spatial arrangement of objects in images. Spatial consistency measure - Re-rank the results Neighboring matches in the query region lie in a surrounding area in the retrieved image.

Spatial Consistency Ranking
Search area is defined by 15 nearest neighbors. A neighbor in the surrounding area in the retrieved image counts as a vote. Match with no support / hits is rejected. Repeat this for every match. Total number of votes decides the rank.

V V Number of votes = 3

V1 Vn V2 V3 ……. 10 4 9 7 2 3 8 14 Frame 1 Frame 2 Frame 3 Frame N 57
4 9 7 2 3 8 14 Frame 1 Frame 2 Frame 3 Frame N 57 23 36 Visual words

After Spatial Consistency
Initial Match After Stoplist After Spatial Consistency

Future Work More efficient implementation of spatial consistency.
Improve the retrieval time.

Chetan Chhaya Nishant Revanth Sandeep Sheetal
USER INTERFACE Chetan Chhaya Nishant Revanth Sandeep Sheetal

Objective Build a web interface for retrieving shots from news video database which matches the given image query Display the ranked list of shots eg Date, Channel, Maximum match, Month

Input & Output

About The Interface… The interface constitutes of the following three parts. Database Schema Data Directories Source Code Files

Database Schema All the videos and metadata corresponding to the videos is stored in SQL database which can be queried using MySQL. Following two tables used: Table Table 2 videos videoId channelId Date channels channelId channelName

Data Directories Contains following five directories where data is stored Thumbnails Keyframes Stories Shots videos

Source Files The interface part consists of 8 files. index.cgi
server.cgi shots.cgi keyframes.cgi SelectRect.js display.cgi play.cgi conf.py Each file is a module.

index.cgi Home page of the Interface.
This page lists todays videos as thumbnail of first keyframe corresponding to the first shot of the video. It also gives the user option to select specific videos based upon the criterias of date and channel through comboboxes.

Server.cgi User can be directed to this page from any of the pages since all give the user choice to select from the combo boxes. This page lists the results of the user selection from the comboboxes(based upon the criterias of date and channels) The displayed result shows the thumbnail of first keyframe of each video.

shots.cgi Page used to display the shots of the video selected from the previous page. The constituting stories of the videos are displayed on the screen one after another. Corresponding to each story ,we display the thumbnail of the keyframe of all the shots in that particular story.

keyframes.cgi This page displays the keyframe of the selected shot in its original size( 352 X 288 here ). The user can select a rectangle region from the frame to query the database. This query ( Rectangle Co-ordinates of the selected region (xmin, ymin, xmax, ymax , videoId , shotId) are passed to display.cgi.

SelectRect.js This module of the library takes care of user-interaction. Its a selection tool, basically to select a part of the image. It is a JavaScript code which works with just two clicks on the required part of the image, first click denotes the start-coordinates and the second click denotes the end-coordinates. Input to the module: Loaded Image Output: Selected co-ordinates.

display.cgi This page is used to display the results of the query.
The matching keyframe and its adjacent keyframes are displayed for all the results. Their corresponding thumbnails are displayed on the screen. The input for this file comes from the indexing module.

Play.cgi This page is used to play the video starting from the matching frame. We use an embedded Quicktime player to play the video. The functionalities of the player include: seek, play and, pause. funcionalities controlled by some buttons using Javascript.

conf.py This file contains information regarding the Directory paths of the Data Directories and Database details. It is imported in all the cgi files used.

Thank You

Video Google – A google approach to Video Retrieval

Similar presentations

Presentation on theme: "Video Google – A google approach to Video Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Video Google – A google approach to Video Retrieval

Similar presentations

Presentation on theme: "Video Google – A google approach to Video Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback