Video Fingerprinting: Features for Duplicate and Similar Video Detection and Query- based Video Retrieval Anindya Sarkar, Pratim Ghosh, Emily Moxley and.

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Aggregating local image descriptors into compact codes

Presented by Xinyu Chang

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.

Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA

Image and video descriptors

Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,

Uncertainty Representation. Gaussian Distribution variance Standard deviation.

Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.

Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Computer Graphics Lab Electrical Engineering, Technion, Israel June 2009 [1] [1] Xuemiao Xu, Animating Animal Motion From Still, Siggraph 2008.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

CMPT-884 Jan 18, 2010 Video Copy Detection using Hadoop Presented by: Cameron Harvey Naghmeh Khodabakhshi CMPT 820 December 2, 2010.

Adaptive MPEG-2 Video Data Hiding Scheme Anindya Sarkar, Upmanyu Madhow, Shivkumar Chandrasekaran, B. S. Manjunath Presented by: Anindya Sarkar Vision.

ADVISE: Advanced Digital Video Information Segmentation Engine

A Study of Approaches for Object Recognition

Object-based Image Representation Dr. B.S. Manjunath Sitaram Bhagavathy Shawn Newsam Baris Sumengen Vision Research Lab University of California, Santa.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.

Distinctive Image Feature from Scale-Invariant KeyPoints

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Scale Invariant Feature Transform (SIFT)

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Ashish Uthama EOS 513 Term Paper Presentation Ashish Uthama Biomedical Signal and Image Computing Lab Department of Electrical.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

An Introduction to Video Fingerprinting

1 Faculty of Information Technology Generic Fourier Descriptor for Shape-based Image Retrieval Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info.

Shape Matching for Model Alignment 3D Scan Matching and Registration, Part I ICCV 2005 Short Course Michael Kazhdan Johns Hopkins University.

Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.

Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

Dengsheng Zhang and Melissa Chen Yi Lim

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.

Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

1 Faculty of Information Technology Enhanced Generic Fourier Descriptor for Object-Based Image Retrieval Dengsheng Zhang, Guojun Lu Gippsland School of.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Naifan Zhuang, Jun Ye, Kien A. Hua

Another Example: Circle Detection

A. M. R. R. Bandara & L. Ranathunga

SIFT Scale-Invariant Feature Transform David Lowe

Scale Invariant Feature Transform (SIFT)

Video Google: Text Retrieval Approach to Object Matching in Videos

The Earth Mover's Distance

Feature description and matching

Aim of the project Take your image Submit it to the search engine

Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.

Video Google: Text Retrieval Approach to Object Matching in Videos

Feature descriptors and matching

Topological Signatures For Fast Mobility Analysis

Presentation transcript:

Video Fingerprinting: Features for Duplicate and Similar Video Detection and Query- based Video Retrieval Anindya Sarkar, Pratim Ghosh, Emily Moxley and B. S. Manjunath Presented by: Anindya Sarkar Vision Research Lab, Department of Electrical & Computer Engg, University of California, Santa Barbara Januray 30, 2008

June 27, 2015 Problem Definition: Duplicate video and similar video detection –we represent a video compactly (fingerprint), for efficient storage and faster search without compromising the retrieval accuracy Query-based video retrieval –Input: short length (1-2% of big video length) query video –Output: actual “big” video from which the query is taken

June 27, 2015 Generation of Duplicate Videos Dataset: BBC rushes dataset, provided for the TRECVID task of video summarization Operations performed: –Image processing (per frame) based: Blurring using 3x3 and 5x5 window Gamma correction by 20% and -20% Gaussian noise addition at SNR of -20,0,10,20,30 and 40 dB JPEG compression at QF=10,30,50,70 and 90 –Frame drop based errors: frame drops of 20%, 40% and 60% of the original video for both random and bursty case.

June 27, 2015 Interpretation of Similar videos Different takes of the same scene are considered as “similar” videos These videos are similar in content –However, due to human variability at both the cameraman and actor level, (camera angles, cuts, and actor performance), videos may look similar but are still different BBC rushes dataset has unedited footage of the different retakes – hence, ideally suited for generation of similar videos

June 27, 2015 Keyframe based Video Fingerprint Video Summarization and key-frame extraction N frames in the actual video K key-framesK x d Video Fingerprint d-dimensional signature computed per key-frame Features used for fingerprint creation: 1. Compact Fourier Mellin Transform 2. Scale Invariant Feature Transform

June 27, 2015 R R is the maximum radius of in-circle m,n=0M-1 N-1 ∆r∆r ∆θ∆θ ∆r= log(R)/M, ∆θ=2π/N M is the no of concentric circles. N is the no. of diverging radial lines. x=e m∆r cos(n∆θ) y=e m∆r sin(n∆θ) (x,y) (m,n) Log-Polar Transformation First fix the value of M,N origin Any 2-D Matrix

June 27, 2015 CFMT FEATURE EXTRACTION m, n=0M-1 N-1 -(K-1) K-1 V-1 |FFT| -(V-1) Normalization & vectorization 50% A.C. Energy PCA Quantization

June 27, 2015 SIFT Feature Generally used for object recognition – hence, can be used as an image similarity measure Distance between SIFT features – number of descriptor comparisons makes it computationally prohibitive Speed up – quantize descriptors to a finite vocabulary (consisting of words) –Each image is a weighted vector of the word frequencies

June 27, 2015 most specific words M=1 M=3 more general words words image descriptors Straight vocabulary – created by clustering – e.g. 12 dimensional feature needs 12 clusters Vocabulary tree: created using hierarchical k-means on SIFT features; final vocabulary size=3+9=12 Each feature belongs to one “word” at each level M=9

June 27, 2015 Straight Vocabulary vs Vocabulary Tree Straight vocabulary: –Does not consider relationship between words That is, ignores that certain words are closer to each other than other words. –At very coarse level (dictionary size ~10-20), additional words are more descriptive than the relationship among words. Therefore, outperforms vocabulary tree. In our experiments, low-dimensional SIFT features, obtained using straight vocabulary, perform much better as “fingerprints” than tree-based features

June 27, 2015 Non-keyframe based Video Fingerprint N frames P=N/K frames, where each window has P frames P frames Video Fingerprint K x 125 Video Fingerprint Extraction for each of K windows Computing the 125-dim YCbCr Histogram in YCbCr Space using P Consecutive Frames and thus avoiding Key Frames Extraction. Whole color space is quantized into 125 bins (5 bins for each of Y, C b and C r ). Features used for fingerprint creation: YCbCr histogram based feature

June 27, 2015 Signature Distance Computation For two (K x d) fingerprints, X and Y, where X(i) = i th feature vector of X Properties of this distance function: Such a distance relation is called a “quasi-distance” d ( X ; Y ) = K X i = 1 ½ m i n 1 · j ·K jj X ( i ) ¡ Y ( j )jj 1 ¾ d ( X ; Y ) = 0 ; i sposs i bl eeven i f X 6 = Y d ( X ; Y ) 6 = d ( Y ; X )

June 27, 2015 Motivation Behind Distance Function This closest-overlap based distance is robust to: Frame reordering: For 2 signatures, temporal sequence may not be maintained between them – e.g. a video consisting of a reordering of scenes from the same video is still regarded as a duplicate Frame drops: If frame drops occur or some video frames are corrupted by noise, distance between duplicate videos should still be small

June 27, 2015 Experiments and Results We present precision-recall plots for both similarity and duplicate detection, over 3888 videos –CFMT for dimensions 36/24/20/12/4 –SIFT for dimensions 781/341/33/21/12 –CFMT vs best performing SIFT for duplicate detection –SIFT vs best performing CFMT for similarity detection CFMT performs better for duplicate detection SIFT performs better for similarity detection

June 27, 2015 Precision-recall curves for different dimensional CFMT for duplicate detection Precision-recall curves for different dimensional CFMT for similarity detection

June 27, 2015 Precision-recall curves for different dimensional SIFT for duplicate detection Precision-recall curves for different dimensional SIFT for similarity detection

June 27, 2015 Precision-recall curves, comparing different descriptors for similarity detection Precision-recall curves comparing different descriptors for duplicate detection

June 27, 2015 Full-length Video Retrieval with Clip Querying Generation of the small-length query: –We put together 4 different scenes from a full length video to create our input query: –Each individual scene is represented by 8 keyframes –For a single query, we have 4x8=32 keyframes –We experiment with different features for query representation Repository is of full-length video signature (65 videos): –Number of keyframes used to create the signature size for “large video” is varied from 1%-4% of video length

June 27, 2015 Algorithm Step 1: Input query signature X query is a (32 x d) matrix Step 2: Its distance from all the stored “large video” signatures (X large ) is computed, as shown below: Step 3: The best matched video is returned ¢ ( i ) = m i n j jj X query ( i ) ¡ X l arge ( j )jj 1 ; 1 · i · 32 ( 1 ) D ( X query ; X l arge ) = 32 X i = 1 ¢ ( i )= 32 ( 2 )

June 27, 2015 Video name CFMT- 36 CFMT- 20 CFMT- 12 YCbCr- 125 SIFT -781 SIFT -31 SIFT- 21 Query Query Query Query Video name CFMT- 36 CFMT- 20 CFMT- 12 YCbCr- 125 SIFT -781 SIFT -31 SIFT- 21 Query Query Query Query Retrieval results for 1% summary lengths for “large” videos Retrieval results for 4% summary lengths for “large” videos

June 27, 2015 Conclusions CFMT features provide quick/accurate retrieval for duplicate videos SIFT features perform better for similar video detection Future work –expanding the domain of “similar” videos (non-retakes yet still similar ?) –Importance of an efficient summary to create video signature (strategic keyframes vs random keyframes ?)

June 27, 2015 Thanks for your patience. Questions?