Query Based Video Summarization

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

SmartPlayer: User-Centric Video Fast-Forwarding K.-Y. Cheng, S.-J. Luo, B.-Y. Chen, and H.-H. Chu ACM CHI 2009 (international conference on Human factors.

Detection, Segmentation, and Pose Recognition of Hands in Images by Christopher Schwarz Thesis Chair: Dr. Niels da Vitoria Lobo.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Object Inter-Camera Tracking with non- overlapping views: A new dynamic approach Trevor Montcalm Bubaker Boufama.

Application of light fields in computer vision AMARI LEWIS – REU STUDENT AIDEAN SHARGHI- PH.D STUENT.

CMPT-884 Jan 18, 2010 Video Copy Detection using Hadoop Presented by: Cameron Harvey Naghmeh Khodabakhshi CMPT 820 December 2, 2010.

Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.

ADVISE: Advanced Digital Video Information Segmentation Engine

ACM Multimedia th Annual Conference, October , 2004

Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.

Reduced Support Vector Machine

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Motivation Where is my W-2 Form?. Video-based Tracking Camera view of the desk Camera Overhead video camera.

J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Jason Li Jeremy Fowers Ground Target Following for Unmanned Aerial Vehicles.

Multimedia Information Retrieval and Multimedia Data Mining Chengcui Zhang Assistant Professor Dept. of Computer and Information Science University of.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

ECE 172A SIMPLE OBJECT DETECTOR WITH INDICATOR WHEN A NEW OBJECT HAS BEEN ADDED TO OR MISSING IN A ROOM Presented by by Hugo Groening.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.

Tijana Janjusevic Multimedia and Vision Group, Queen Mary, University of London Clustering of Visual Data using Ant-inspired Methods Supervisor: Prof.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Video Databases What are it uses? –Sports –Surveillance How do we query it? –Mosaic-based Query Language.

Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Estimating Travel Patterns on I ‐ 95 with Automated Technology NCAMPO Conference - Asheville, NC May 3, 2012 Taruna Tayal (M/A/B) Brian Wert (M/A/B) Bill.

MULTIMEDIA DATA MODELS AND AUTHORING

Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

ПОРТФОЛИО профессиональной деятельности Белово 2015 Таюшовой Натальи Борисовны Преподавателя дисциплин «Химия», «Биология»

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,

Automatic Advertisement Ratings Discussion Methods Problem and Motivation The goal is to automatically generate an objective score or ranking for an advertisement.

Predicting Visual Search Targets via Eye Tracking Data

Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,

Rongrong Ji Director, Intelligent Multimedia Laboratory

Saliency-guided Video Classification via Adaptively weighted learning

Multimedia Content-Based Retrieval

Query-Focused Video Summarization – Week 1

Article Review Todd Hricik.

Correlative Multi-Label Multi-Instance Image Annotation

Supporting Fault-Tolerance in Streaming Grid Applications

Video-based human motion recognition using 3D mocap data

CARPENTER Find Closed Patterns in Long Biological Datasets

Counting In High Density Crowd Videos

Market Basket Many-to-many relationship between different objects

Counting in High-Density Crowd Videos

Distributed Representation of Words, Sentences and Paragraphs

Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang

#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),

A User Attention Based Visible Watermarking Scheme

“The Truth About Cats And Dogs”

Online Graph-Based Tracking

Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee

Ying Dai Faculty of software and information science,

University of Central Florida

AHED Automatic Human Emotion Detection

Sensing Object Semantics for Interactive Multimedia Applications

Ying Dai Faculty of software and information science,

Query-based video summarization

UCF-REU in Computer Vision

Multi-UAV to UAV Tracking

Moving Target Detection Using Infrared Sensors

Week 6 Presentation Ngoc Ta Aidean Sharghi.

Robust Feature Matching and Fast GMS Solution

Counting in High-Density Crowd Videos

Presentation transcript:

Query Based Video Summarization Jacob Laurel jslaurel@uab.edu University of Alabama at Birmingham Aidean Sharghi aidean@knights.ucf.edu University of Central Florida I. Problem & Motivation III. Data Collection and Results IV. Algorithm Overview Automatically generate a comprehensive video summary for a set of egocentric videos using a Determinantal Point Process (DPP) algorithm. The algorithm should generate the summary based on a user-specified query (i.e. summarizing the video for concepts such as “Car”) Applications: Surveillance Law-enforcement Consumer electronics (‘GoPro®’, ‘Google Glass’) Dataset No query-specific video summary dataset exists, thus we augment an existing set for various user-input queries and provide ground truths for all cases Data Collection Procedure: Each video was sampled, and every set of 5 frames was used to create a “shot” given to MTurk workers All shots were annotated through a custom Amazon Mechanical Turk User-Interface, and each was given to 3 separate MTurk workers Shots were sorted by respective concept annotations A ground-truth summary for the entire video was generated for each of the 4 videos For each query pair the union of the ground truth video summary with the shots from each concept pair was taken This was then provided to AMT workers for summarization Tested our data set with a k-DPP algorithm, using a k of 2. Video is broken up into sets of 10 frames and then the DPP kernel is computed using SentiBank features for each set From each set 2 representative and distinct frames are chosen based on frame dissimilarity as well as SentiBank[2] features relating to the query V. Algorithm Results We provide extensive baseline results for our dataset For each of the 4 cases, 15 different Query Pairs were selected (60 per video) Example of a generated summary for the query pair “Men” + “Computer” II. Data Set Overview Data Results Average Inter-user agreement among all videos found to be good: 𝑈𝑠𝑒𝑟1⋂𝑈𝑠𝑒𝑟2⋂𝑈𝑠𝑒𝑟3 𝑈𝑠𝑒𝑟1⋃𝑈𝑆𝑒𝑟2⋃𝑈𝑠𝑒𝑟3 ≈0.33 Annotation Counts: Data Overview: 4 Egocentric videos (each of length ~3hr.) originally obtained from UTE data set[1] Each video contains shot-level annotations from a list of 48 concepts We define a shot to be a set of 5 frames, sampled at 1 fps Each video is summarized for pairs of queries for the following 4 cases Case 1) A Pair of queries such that the shot-level intersection is above a threshold, T (both concepts are present in several shots) Case 2) A Pair of queries such that the shot-level intersection is ∅ (both concepts are never found in the same shot), but both concepts are found throughout the video greater than T times Case 3) A Pair of queries such that one occurs less than T times, but the other occurs greater than T times Case 4) A Pair of queries such that both are missing from the video (both occur less than T times) Automatically Generated Summary . . . User’s Query: Men + Computer VII. Future Work Expand data set to include non-egocentric videos, and expand upon other Video-Summarization algorithms to improve benchmarks Concept and Query Pair Selection: Concepts selected by taking most popular Youtube and Vine queries and comparing with SentiBank[2] concepts to select a comprehensive set (48 were ultimately chosen) Query Pairs selected as follows: Case 1) Pairs were sampled with weighting given to the intersection count 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⋂ 𝑄 2 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝 𝑄 1 𝕝(𝑄 2 ) 𝑃(𝑄𝑢𝑒𝑟𝑦1,𝑄𝑢𝑒𝑟𝑦2)∝ 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⋂ 𝑄 2 Case 2) Pairs were sampled with weighting given as follows 𝐶𝑜𝑢𝑛𝑡 𝑄 1 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝( 𝑄 1 ) 𝐶𝑜𝑢𝑛𝑡 𝑄 2 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝( 𝑄 2 ) 𝑃(𝑄𝑢𝑒𝑟𝑦1,𝑄𝑢𝑒𝑟𝑦2)∝ 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⦁ 𝐶𝑜𝑢𝑛𝑡 𝑄 2 𝐶𝑜𝑢𝑛𝑡 𝑄 1 + 𝐶𝑜𝑢𝑛𝑡 𝑄 2 Cases 3 and 4) Pairs were sampled uniformly Example Ground-Truth Summary for “Car” query: VIII. References Input Video Query + Y. J. Lee, J. Ghosh, and K. Grauman, “Discovering important people and objects for egocentric video summarization,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel and Shih-Fu Chang. "Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs," ACM Multimedia Conference, Barcelona, Oct 2013. … … … IX. Acknowledgements We wish to thank Dr. Shah, Dr. Gong, and Dr. Lobo for their support and mentorship throughout the Summer 2016 REU Program. User-selected Summary Shots