Download presentation
Presentation is loading. Please wait.
Published byCaroline Pierce Modified over 9 years ago
1
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang, Yanran Wang, Rui Feng, Hanfang Yang, Yingbin Zheng, Xiangyang Xue School of Computer Science, Fudan University, Shanghai, China AAAI 2013 Bellevue, USA Applications: Web Video Search Video Recommendation System Related Work: There is a few studies about predicting Aesthetics and Interestingness of Images Key Idea is building computational model to predict which video is more interesting, when given two videos. Contributions: Conducted a pilot study on video interestingness Built two new datasets to support this study Evaluated a large number of features and get interesting observations Can a computational model automatically analyze video contents and predict the interestingness of videos? We conduct a pilot study on this problem, and demonstrates a simple method to identify more interesting videos. The problem Key Idea VS. Two New Datasets Flickr Dataset: Source: Flickr.com Video Type: Consumer Videos Video Number: 1200 Categories: 15 (basketball, beach…) Duration: 20 hrs in total Label: Top 10% as interesting videos; Bottom 10% as uninteresting YouTube Dataset: Source: YouTube.com Video Type: Advertisements Video Number: 420 Categories: 14 (food, drink…) Duration: 4.2 hrs in total Label: 10 human assessors to compare video pairs Prediction & Evaluation Computational Framework: Aim: train a model to compare the interestingness of two videos Feature: Prediction: Adopt Joachims’ Ranking SVM (Joachims 2003) to train prediction models For both datasets, we use 2/3 of the videos for training and 1/3 for testing Use Kernel-level Fusion & Equal Weights to fuse multiple features. Evaluation : Accuracy (the percentage of correctly ranked test video pairs) Visual features Audio features High-level attribute features Ranking SVM results Multi-modal fusion VS. Multi-modal feature extraction Visual featuresColor HistogramSIFTHOGSSIMGIST Audio featuresMFCCSpectrogram SIFTAudio-Six High-level attribute features ClassemesObjectbankStyle Results Visual Feature Results: Overall the visual features achieve very impressive performance on both datasets Among five features, SIFT and HOG are very effective, and their combination performs best Audio Feature Results: The three audio features are effective and complementary. Comparing them gets best performance Attribute Feature Results: Attribute features do not work as well as we expected. Especially style performs poorly. It is a very interesting observation since in the prediction of image interestingness, style is claimed effective Visual+Audio+Attribute Fusion Results: Fusing visual and audio features leads to substantial performance gains with 2.6% increase on Flickr and 5.4% increase on YouTube. While adding Attribute features is not that effective FlickrYouTube Datasets are available at: www.yugangjiang.info/research/interestingness 76.6 68.0 74.5 67.0 67.1 65.764.8 74.7 64.5 56.8 71.7 78.6 76.6 68.0 2.6% 5.4% Conclusion We conducted a study on predicting video interestingness. We also built two new datasets. A great number of features have been evaluated, leading to interesting observations: Visual and Audio features are effective in predicting video interestingness A few features useful in image interestingness do not extend to video domain (Style…)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.