Download presentation
Presentation is loading. Please wait.
1
Query Based Video Summarization
Jacob Laurel University of Alabama at Birmingham Aidean Sharghi University of Central Florida I. Problem & Motivation III. Data Collection and Results IV. Algorithm Overview Automatically generate a comprehensive video summary for a set of egocentric videos using a Determinantal Point Process (DPP) algorithm. The algorithm should generate the summary based on a user-specified query (i.e. summarizing the video for concepts such as “Car”) Applications: Surveillance Law-enforcement Consumer electronics (‘GoPro®’, ‘Google Glass’) Dataset No query-specific video summary dataset exists, thus we augment an existing set for various user-input queries and provide ground truths for all cases Data Collection Procedure: Each video was sampled, and every set of 5 frames was used to create a “shot” given to MTurk workers All shots were annotated through a custom Amazon Mechanical Turk User-Interface, and each was given to 3 separate MTurk workers Shots were sorted by respective concept annotations A ground-truth summary for the entire video was generated for each of the 4 videos For each query pair the union of the ground truth video summary with the shots from each concept pair was taken This was then provided to AMT workers for summarization Tested our data set with a k-DPP algorithm, using a k of 2. Video is broken up into sets of 10 frames and then the DPP kernel is computed using SentiBank features for each set From each set 2 representative and distinct frames are chosen based on frame dissimilarity as well as SentiBank[2] features relating to the query V. Algorithm Results We provide extensive baseline results for our dataset For each of the 4 cases, 15 different Query Pairs were selected (60 per video) Example of a generated summary for the query pair “Men” + “Computer” II. Data Set Overview Data Results Average Inter-user agreement among all videos found to be good: 𝑈𝑠𝑒𝑟1⋂𝑈𝑠𝑒𝑟2⋂𝑈𝑠𝑒𝑟3 𝑈𝑠𝑒𝑟1⋃𝑈𝑆𝑒𝑟2⋃𝑈𝑠𝑒𝑟3 ≈0.33 Annotation Counts: Data Overview: 4 Egocentric videos (each of length ~3hr.) originally obtained from UTE data set[1] Each video contains shot-level annotations from a list of 48 concepts We define a shot to be a set of 5 frames, sampled at 1 fps Each video is summarized for pairs of queries for the following 4 cases Case 1) A Pair of queries such that the shot-level intersection is above a threshold, T (both concepts are present in several shots) Case 2) A Pair of queries such that the shot-level intersection is ∅ (both concepts are never found in the same shot), but both concepts are found throughout the video greater than T times Case 3) A Pair of queries such that one occurs less than T times, but the other occurs greater than T times Case 4) A Pair of queries such that both are missing from the video (both occur less than T times) Automatically Generated Summary . . . User’s Query: Men + Computer VII. Future Work Expand data set to include non-egocentric videos, and expand upon other Video-Summarization algorithms to improve benchmarks Concept and Query Pair Selection: Concepts selected by taking most popular Youtube and Vine queries and comparing with SentiBank[2] concepts to select a comprehensive set (48 were ultimately chosen) Query Pairs selected as follows: Case 1) Pairs were sampled with weighting given to the intersection count 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⋂ 𝑄 2 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝 𝑄 1 𝕝(𝑄 2 ) 𝑃(𝑄𝑢𝑒𝑟𝑦1,𝑄𝑢𝑒𝑟𝑦2)∝ 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⋂ 𝑄 2 Case 2) Pairs were sampled with weighting given as follows 𝐶𝑜𝑢𝑛𝑡 𝑄 1 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝( 𝑄 1 ) 𝐶𝑜𝑢𝑛𝑡 𝑄 2 = { 𝑆ℎ𝑜𝑡 1 … 𝑆ℎ𝑜𝑡 𝑁 } 𝕝( 𝑄 2 ) 𝑃(𝑄𝑢𝑒𝑟𝑦1,𝑄𝑢𝑒𝑟𝑦2)∝ 𝐶𝑜𝑢𝑛𝑡 𝑄 1 ⦁ 𝐶𝑜𝑢𝑛𝑡 𝑄 2 𝐶𝑜𝑢𝑛𝑡 𝑄 𝐶𝑜𝑢𝑛𝑡 𝑄 2 Cases 3 and 4) Pairs were sampled uniformly Example Ground-Truth Summary for “Car” query: VIII. References Input Video Query + Y. J. Lee, J. Ghosh, and K. Grauman, “Discovering important people and objects for egocentric video summarization,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel and Shih-Fu Chang. "Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs," ACM Multimedia Conference, Barcelona, Oct 2013. … … … IX. Acknowledgements We wish to thank Dr. Shah, Dr. Gong, and Dr. Lobo for their support and mentorship throughout the Summer 2016 REU Program. User-selected Summary Shots
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.