A Pool of Deep Models for Event Recognition

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

A brief review of non-neural-network approaches to deep learning

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,

Limin Wang, Yu Qiao, and Xiaoou Tang

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Kuan-Chuan Peng Tsuhan Chen

Multimodal Information Analysis for Emotion Recognition

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Deep Convolutional Nets

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Michael.

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.

Wenchi MA CV Group EECS,KU 03/20/2017

Learning to Compare Image Patches via Convolutional Neural Networks

The Relationship between Deep Learning and Brain Function

Guillaume-Alexandre Bilodeau

Object Detection based on Segment Masks

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Saliency-guided Video Classification via Adaptively weighted learning

Improving the Performance of Fingerprint Classification

Lecture 24: Convolutional neural networks

Presenter: Chu-Song Chen

Ajita Rattani and Reza Derakhshani,

Hybrid Features based Gender Classification

Mean Euclidean Distance Error (mm)

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang

Bird-species Recognition Using Convolutional Neural Network

Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang

Introduction to Neural Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

Deep Learning Hierarchical Representations for Image Steganalysis

The Open World of Micro-Videos

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Discriminative Frequent Pattern Analysis for Effective Classification

Objects as Attributes for Scene Classification

Age and Gender Classification using Convolutional Neural Networks

Lecture: Deep Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

Zhedong Zheng, Liang Zheng and Yi Yang

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Evolutionary Ensembles with Negative Correlation Learning

Human-object interaction

Deep Object Co-Segmentation

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

Presented By: Harshul Gupta

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Week 3 Volodymyr Bobyr.

Week 6 Presentation Ngoc Ta Aidean Sharghi.

CVPR 2019 Poster.

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Do Better ImageNet Models Transfer Better?

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

A Pool of Deep Models for Event Recognition K. Ahmad, M. L. Mekhalfi, N. Conci, G. Boato, F. Melgani and F G. B. De Natale Multimedia Signal Processing and Understanding Lab, DISI, University of Trento, Italy Experimental Results Abstract Overall Scheme Network Avg. Acc. VggNet ImageNet Features 45.78% VggNet Places Features 43.46% GoogleNet ImageNet Features 39.83% GoogleNet Places Features 39.71% AlexNet ImageNet Features 42.78% AlexNet Places Features 40.05 This paper proposes a novel two-stage framework for event recognition in still images. First, for a generic event image, we extract deep features obtained via different pre-trained models, and feed them into an ensemble of classifiers, whose posterior classification probabilities are thereafter fused by means of an order induced scheme, which penalizes the yielded scores according to their confidence in classifying the image at hand, and then averages them. Second, we combine the fusion results with a reverse matching paradigm in order to draw the final output of our proposed pipeline. We evaluate our approach on three challenging datasets and we show that better results can be attained, outperforming the SoA in this area. Method Avg. Acc. Baseline Method 39.70% Deep Channel Fusion 42.40% SoA1 44.06% SoA2 53.00% Our Approach 59.49% Table. 4: Comparison with SoA on WIDER Table. 1: Classification results with individual models. Motivation and Approach Network Avg. Acc. VggNet (O+S) 54.22% GoogleNet (O+S) 49.28% ALexNet (O+S) 51.00% GoogleNet (O) + AlexNet (O) 52.00% VggNet (O) + AlexNet (O) 53.37% VggNet (O) + GoogleNet (O) 53.80% VggNet (S) + AlexNet (S) 52.93% VggNet (O+S) + AlexNet (O+S) 56.86% VggNet (O+S) + GoogleNet (O+S) 57.02% VggNet (O+S) + AlexNet (O+S) + GoogleNe (O+S) 58.05% VggNet (O+S) + AlexNet (O+S) + GoogleNe (O+S) + Reverse Matching 59.49% Most of the existing approaches tend to fuse the object-level and scene-level information assigning them equal contributions. This is sub-optimal, as event-related images show significantly diverse sets of chromatic and spatial contexts, often characterized by large inter-class and low intra-class similarities. According to this, we assume that weights should be allocated to each model based on its capacity in representing specific pieces of information and features that are characteristic of the underlying event/image. Method Avg. Acc. Baseline Method 73.40% CNN places 94.00% GoogleNet GAP 95.00% Transferring object and Scene 98.80% Our Approach 99.02% Induced Ordered Weighted Averaging (IOWA) Reverse Matching Strategy Table. 5: Comparison with SoA on UIUC Sports Dataset Method Avg. Acc. on Subset 1 Avg. Acc. On Subset 2 AlexNet pre-trained on USED Dataset 70.03% 65.96 Spatial Paramid 72.00% 79.92 Our Approach 79.13% 87.02% The main contributions of this work can be summarized as follows: A novel fusion scheme, inspired by the concept of Induced Ordered Weighted Averaging operators1 The application of a reverse matching concept, which has been proven to be useful in other computer vision domains2 . A thorough investigation of the performances of different CNN architectures, applied to object-level and scene-level level information Table. 3: A comparison against state of the art on USED dataset Table. 2: Classification results of different combination with IOWA Methodology Conclusions Datasets and Experimental Setup Selected References The proposed approach builds upon 5 processing steps: Feature Extraction: using 3 deep models: AlexNet, Google Net and VggNet with 16 layers. Classification: SVM-based classification Fusion of Deep Models: IOWA-based late fusion. Reverse Matching: based on the approach proposed in [2] . Final Score Computation: based on the simple multiplication of the results of phase 4 and phase 3. We have proposed a novel framework for event recognition in single still pictures. Experimental results demonstrate that: Fusion of different deep models from the same as well as different architectures outperforms the individual models. IOWA-based fusion succeeds in giving more importance to the models with higher confidence in a learning-free fashion. As proven in other computer vision domains, reverse matching can boost event recognition performances. R. R Yager and D. P Filev, “Induced ordered weighted averaging operators,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 29, no. 2, pp. 141–150, 1999. Q. Leng, R. Hu, C. Liang, Y. Wang, and J. Chen, “Bidirectional ranking for person re-identification,” in IEEE ICME , 2013, pp. 1–6. Datasets: For the experimental validation of the proposed approach we use 3 different datasets including: WIDER USED UIUC Sports Dataset Experimental setup: we conducted the following 3 experiments: Experiment 1: investigate the performance of individual models pre-trained on ImageNet and Places datasets. Experiment 2: assess the performance of IOWA-fused models on the same and on different architectures. Experiment 3: include the Reverse matching strategy in the framework to raise the classification rates.