Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Limin Wang, Yu Qiao, and Xiaoou Tang
Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.
Patch to the Future: Unsupervised Visual Prediction
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Discriminative Segment Annotation in Weakly Labeled Video Kevin Tang, Rahul Sukthankar Appeared in CVPR 2013 (Oral)
Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.
Large-Scale Object Recognition with Weak Supervision
Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.
ACM Multimedia th Annual Conference, October , 2004
Self-Supervised Segmentation of River Scenes Supreeth Achar *, Bharath Sankaran ‡, Stephen Nuske *, Sebastian Scherer *, Sanjiv Singh * * ‡
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Predicting Matchability - CVPR 2014 Paper -
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
A Tutorial on Object Detection Using OpenCV
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.
Action recognition with improved trajectories
AUTOMATIC ANNOTATION OF GEO-INFORMATION IN PANORAMIC STREET VIEW BY IMAGE RETRIEVAL Ming Chen, Yueting Zhuang, Fei Wu College of Computer Science, Zhejiang.
Yu-Gang Jiang, Yanran Wang, Rui Feng Xiangyang Xue, Yingbin Zheng, Hanfang Yang Understanding and Predicting Interestingness of Videos Fudan University,
Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Multi-task Low-rank Affinity Pursuit for Image Segmentation Bin Cheng, Guangcan Liu, Jingdong Wang, Zhongyang Huang, Shuicheng Yan (ICCV’ 2011) Presented.
Mentor: Salman Khokhar Action Recognition in Crowds Week 7.
Latent SVM 1 st Frame: manually select target Find 6 highest weighted areas in template Area of 16 blocks Train 6 SVMs on those areas Train 1 SVM on entire.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Week 10 Presentation Wesna LaLanne - REU Student Mahdi M. Kalayeh - Mentor.
Image Classification for Automatic Annotation
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Recognition Using Visual Phrases
Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Understanding and Predicting Interestingness of Videos Yu-Gang Jiang, Yanran Wang, Rui Feng, Hanfang Yang, Yingbin Zheng, Xiangyang Xue School of Computer.
Week 10 Emily Hand UNR.
WEEK4 RESEARCH Amari Lewis Aidean Sharghi. PREPARING THE DATASET  Cars – 83 samples  3 images for each sample when x=0  7 images for each sample when.
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.
NEIL: Extracting Visual Knowledge from Web Data Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta Carnegie Mellon University CS381V Visual Recognition -
ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba.
Week 3 Emily Hand UNR. Online Multiple Instance Learning The goal of MIL is to classify unseen bags, instances, by using the labeled bags as training.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
A Hierarchical Deep Temporal Model for Group Activity Recognition
Data Driven Attributes for Action Detection
Query-Focused Video Summarization – Week 1
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
The Open World of Micro-Videos
Multiple Feature Learning for Action Classification
Volume 88, Issue 3, Pages (November 2015)
Deep Visual-Semantic Alignments for Generating Image Descriptions
Objects as Attributes for Scene Classification
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of Computer Science, Fudan University, Shanghai, China ACM MM, Barcelona, Catalunya, Spain, 2013

Overview Task: Design a system to automatically identify aesthetically more appealing videos Contribution: Propose to use free training data Use and evaluate various kinds of features Result : Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset Result : Attain a Spearman‘s rank correlation coefficient of 0.41 on the NHK Dataset

Construct two annotation-free training datasets by assuming images/videos on certain websites are mostly beautiful Free Training Data DPChallenge images Flickr videos Dutch documentary videos + + -

The first training set – Using images from DPChallenge as positive samples, – and the Dutch documentary videos frames as negative samples The second training set – Using videos from Flickr as positive samples, – and the Dutch documentary videos as negative samples Free Training Data

Multimodal Features Traditional Visual Features Mid-level Semantic Attributes Style Descriptor Video Motion Feature Color LBP SIFT HOG Classemes [ECCV’10] Dense Trajectory [CVPR’11]

Framework Image Low-Level Features (Color, LBP, SIFT, HOG) Mid-Level Semantic Attributes (Classemes) Video Motion Feature (Dense Trajectory) SVM Models (Image Training Data) … Style Descriptor SVM Models (Video Training Data) Feature Extraction Classifiers Ranking List Input Videos

Using training data from Flickr & Dutch Documentary videos Evaluated on a subset labeled by ourselves Result The best single feature Spearman's rank correlation Dense Trajectory which is very powerful in human action recognition, performs poorly, indicating that motion is less related to beauty

The best result Using training data from DPChallenge & Dutch Documentary images/frames Evaluated on a subset labeled by ourselves Result Image-based training is more suitable on NHK dataset, because most NHK videos focus on scenes. The best single feature Spearman's rank correlation

Official evaluation results from NHK, on the entire test set We submitted 5 runs Evaluated on NHK’s official labels, which are not publicly available Observations Image training data is more effective, similar to observations on the small subset Color and Classemes are complementary, SIFT is not NOTE: These submitted runs were selected before annotating the subset, which was done later to provide more insights in the paper! Result

Demo A collection of clips from the top 10 videos identified by our system

Thank you!