Multiple Feature Hashing for Real-time Large Scale

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Aggregating local image descriptors into compact codes

Three things everyone should know to improve object retrieval

Presented by Xinyu Chang

Kien A. Hua Division of Computer Science University of Central Florida.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.

Query Specific Fusion for Image Retrieval

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.

Special Topic on Image Retrieval Local Feature Matching Verification.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.

Tour the World: building a web-scale landmark recognition engine ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Project IST_1999_ ARTISTE – An Integrated Art Analysis and Navigation Environment Review Meeting N.1: Paris, C2RMF, November 28, 2000 Workpackage.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Distributed and Efficient Classifiers for Wireless Audio-Sensor Networks Baljeet Malhotra Ioanis Nikolaidis Mario A. Nascimento University of Alberta Canada.

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.

Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.

1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,

School of Information Technology & Electrical Engineering Multiple Feature Hashing for Real-time Large Scale Near-duplicate Video Retrieval Jingkuan Song*,

Multimedia Databases (MMDB)

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

Click to edit Present’s Name Xiaoyang Zhang 1, Jianbin Qin 1, Wei Wang 1, Yifang Sun 1, Jiaheng Lu 2 HmSearch: An Efficient Hamming Distance Query Processing.

Curtis Kelsey University of Missouri A FINGERPRINTING SYSTEM MOBILE MODEL FOR VIDEO COPY PROTECTION.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.

Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.

Chittampally Vasanth Raja 10IT05F vasanthexperiments.wordpress.com.

Chittampally Vasanth Raja vasanthexperiments.wordpress.com.

Outline Problem Background Theory Extending to NLP and Experiment

Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

Cross-modal Hashing Through Ranking Subspace Learning

Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)

Naifan Zhuang, Jun Ye, Kien A. Hua

Managing Massive Trajectories on the Cloud

Introduction Multimedia initial focus

A review of audio fingerprinting (Cano et al. 2005)

Fast nearest neighbor searches in high dimensions Sami Sieranoja

Saliency-guided Video Classification via Adaptively weighted learning

Multimedia Content-Based Retrieval

Personalized Social Image Recommendation

Real-time Large Scale Near-duplicate Web Video Retrieval

SoC and FPGA Oriented High-quality Stereo Vision System

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Spatio-Temporal Databases

Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.

Locality Sensitive Hashing

Multimedia Information Retrieval

CS5112: Algorithms and Data Structures for Applications

Chapter 17 Designing Databases

Topological Signatures For Fast Mobility Analysis

Presentation transcript:

Multiple Feature Hashing for Real-time Large Scale Near-duplicate Video Retrieval Jingkuan Song*, Yi Yang**, Zi Huang*, Heng Tao Shen*, Richang Hong*** *University of Queensland, Australia **Carnegie Mellon University, USA ***Hefei University of Technology, China 1 December 2011

Definitions of Near-Duplicate Wu et al. TMM 2009 Shen et al. VLDB 2009 Basharat et al. CVIU 2008 Cherubini et al.ACM MM 2009 Firstly I give a backgroudn introduction to MSS. Here is one of the most representative definition. From the defintion we can notice three keypoints, firstly, MSS puts focuses on multmeidia data. Secondly, retrieve process of MSS is based visual feature, which is distinct from some retrieval process based on associated metadata. The third one we may define some similarity measure to calculate the similarity between two multimedia objects. Because we have lots of similarity measures, like those based on Euclidean distance, or cosine distance. Here is an example of MSS. Suppose we have an image dataset which contains thousands to millions of images, which are represented by their visual feature. Variants: duplicate, copy, (partial) near-duplicate

Applications Copyright protection Database cleansing Recommendation Video Monitoring Video Thread Tracking Multimedia reranking Multimedia tagging Border Security, … Some of the applications are listed here. Multimedia tagging technique can automatically associate an image or video with tags to facilitate text –based search. Using MSS, we can find the similar objects of a new multiimedia data, and then transfer the tags from the existing ones to the new one. Multimedia Recommendation aims to automatic recommend new images or videos to the users according to the watching history. This is based on the assumption that when a users likes a video Near-duplicate

Objective Near-duplicate retrieval Effectiveness Efficiency Given a large-scale video dataset and a query video, find its near-duplicate videos in real-time

A generic framework

Frame-level Local Signatures Frame-level Global Signatures SIFT Feature, Local Binary Pattern (LBP), etc Frame-level Global Signatures Color Histogram, Bag of Words (BoW), etc Video-level Global Signatures Accumulative Histogram Bounded Coordinate System (BCS), etc Spatio-temporal Signature Video Distance Trajectory (VDT), Spatio-temporal LBP (SP_LBP), etc Represent a video stream as a continuous one-dimensional distance trajectory to a reference point VDT

One-dimensional Transformation Hashing Indexing Methods Tree Structures R-tree, M-tree, etc One-dimensional Transformation iDistance, Z-order, etc Hashing LSB-forest Locality Sensitive Hashing (LSH) Spectral Hashing Self-taught Hashing (STH), etc using object approximations to compress the vector data, and thus reduces the amount of data that must be read during similarity searches. Use only i bits per dimension In hash table or hash map is a data structure that uses a hash function to map the key into the index The hashing methods for multimedia data follow the similar idea. That is each data point is mapped to a hash value. And organize the data set in the form of hash table.

Related work – a Hierarchical approach Wu et al. outlines ways to filter out the near-duplicate video using a hierarchical approach Initial triage is fast performed using compact global signatures derived from colour histograms Only when a video cannot be clearly classified as novel or near-duplicate using global signatures, a more expensive local feature based near-duplicate detection is then applied to provide accurate near-duplicate analysis through more costly computation Matching window for keyframes between two videos X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, pages 218–227, 2007.

Related work – Spatiotemporal feature approach L. Shang introduce a compact spatiotemporal feature to represent videos and construct an efficient data structure to index the feature to achieve real-time retrieving performance. Ordinal measure can be rewritten in the form a 36-dimensional ( ) binary feature vector CE-based Spatiotemporal Feature LBP-based Spatiotemporal Feature L. Shang, L. Yang, F. Wang, K.-P. Chan, and X.-S. Hua. Real-time large scale near-duplicate web video retrieval. In ACM Multimedia, pages 531–540, 2010.

Related work - Locality sensitive HASHING A set of data points Xi N h r1…rk Random Projection Search the bucket with the same hash code << N h r1…rk Q Hash table 111101 110111 110101 Query Data Point Q results M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253–262, 2004.

Our Proposal - Multiple Feature Hashing (MFH) Problems & Motivations Single feature may not fully characterize the multimedia content Exiting NDVR concerns more about accuracy rather than efficiency Methodology We present a novel approach - Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues MFH preserves the local structure information of each individual feature and also globally consider the local structures for all the features to learn a group of hash functions With a particular focus on near duplicate video retrieval

MFH – The framework With the rapid development of the Internet and multimedia technologies over the last decade, a huge amount of data has become available, from text corpus, to collections of online images and videos. Cheap storage cost and modern database technologies have made it possible to accumulate large-scale datasets.

MFH – Learning hashing functions Target: Similar items have similar hash codes KNN Graph 1: Sum the distance of nearby hash codes 2: Sum the distance in each feature type, and globally consider the overall hash code 3: Add the regression model to learn the hash functions The final objective function Reformulating

MFH - Experiments We evaluate our approach on 2 video datasets CC_WEB_VIDEO Consisting of 13,129 video clips UQ_VIDEO Consisting of 132,647 videos, which was collected from YouTube by ourselves We present an extensive comparison of the proposed method with a set of existing algorithms, such as Self-taught Hashing, Spectral Hashing and so on Both efficiency and accuracy are reported to compare the performance Compared with first work, compared with SP, STH

MFH - results Results On CC_WEB_VIDEO

MFH - results Results On UQ_VIDEO

MFH - Contributions As far as we know, it is the first hashing algorithm on indexing multiple features We propose a new framework to exploit multiple local and global features of video data for NDVR We have constructed a large scale video dataset UQ_VIDEO consisting of 132,647 videos which have 2,570,554 keyframes

THANK YOU ?