Download presentation
Presentation is loading. Please wait.
Published byPierce Glenn Modified over 9 years ago
1
Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of Computer Science, Fudan University, Shanghai, China ACM MM, Barcelona, Catalunya, Spain, 2013
2
Overview Task: Design a system to automatically identify aesthetically more appealing videos Contribution: Propose to use free training data Use and evaluate various kinds of features Result : Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset Result : Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset
3
Construct two annotation-free training datasets by assuming images/videos on certain websites are mostly beautiful Free Training Data DPChallenge images Flickr videos Dutch documentary videos + + -
4
The first training set – Using images from DPChallenge as positive samples, – and the Dutch documentary videos frames as negative samples The second training set – Using videos from Flickr as positive samples, – and the Dutch documentary videos as negative samples Free Training Data
5
Multimodal Features Traditional Visual Features Mid-level Semantic Attributes Style Descriptor Video Motion Feature Color LBP SIFT HOG Classemes [ECCV’10] Dense Trajectory [CVPR’11]
6
Framework Image Low-Level Features (Color, LBP, SIFT, HOG) Mid-Level Semantic Attributes (Classemes) Video Motion Feature (Dense Trajectory) SVM Models (Image Training Data) … Style Descriptor SVM Models (Video Training Data) Feature Extraction Classifiers Ranking List Input Videos
7
Using training data from Flickr & Dutch Documentary videos Evaluated on a subset labeled by ourselves Result The best single feature Spearman's rank correlation Dense Trajectory which is very powerful in human action recognition, performs poorly, indicating that motion is less related to beauty
8
The best result Using training data from DPChallenge & Dutch Documentary images/frames Evaluated on a subset labeled by ourselves Result 0.41 0.43 Image-based training is more suitable on NHK dataset, because most NHK videos focus on scenes. The best single feature Spearman's rank correlation
9
Official evaluation results from NHK, on the entire test set We submitted 5 runs Evaluated on NHK’s official labels, which are not publicly available Observations Image training data is more effective, similar to observations on the small subset Color and Classemes are complementary, SIFT is not NOTE: These submitted runs were selected before annotating the subset, which was done later to provide more insights in the paper! Result
10
Demo A collection of clips from the top 10 videos identified by our system
11
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.