Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Slides:

Advertisements

Similar presentations

Limin Wang, Yu Qiao, and Xiaoou Tang

Advertisements

1 Adjustable prediction-based reversible data hiding Authors: Chin-Feng Lee and Hsing-Ling Chen Source: Digital Signal Processing, Vol. 22, No. 6, pp.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Deep Learning and Neural Nets Spring 2015

Methods in Leading Face Verification Algorithms

Kuan-Chuan Peng Tsuhan Chen

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Video Tracking Using Learned Hierarchical Features

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

Introduction to Video Background Subtraction 1. Motivation In video action analysis, there are many popular applications like surveillance for security,

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue

Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.

Learning Hierarchical Features for Scene Labeling

Gaussian Conditional Random Field Network for Semantic Segmentation

1 Adaptive Data Hiding in Edge Areas of Images with Spatial LSB Domain Systems Source: IEEE Transactions on Information Forensics and Security, Vol. 3,

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham.

A Hierarchical Deep Temporal Model for Group Activity Recognition

Face Recognition based on 2D-PCA and CNN

Learning Multi-Domain Convolutional Neural Networks for Visual Tracking arXiv : [cs.CV] v1 2015, v Hyeonseob Nam, Bohyung Han Dept.

Recent developments in object detection

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

From Vision to Grasping: Adapting Visual Networks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Saliency-guided Video Classification via Adaptively weighted learning

Combining CNN with RNN for scene labeling (segmentation)

Association Analyses Identify Three Susceptibility Loci for Vitiligo in the Chinese Han Population Xian-Fa Tang, Zheng Zhang, Da-Yan Hu, Ai-E Xu, Hai-Sheng.

Part-Based Room Categorization for Household Service Robots

CSCI 5922 Neural Networks and Deep Learning: Image Captioning

Structured Predictions with Deep Learning

Picode: A New Picture-Embedding 2D Barcode

Mean Euclidean Distance Error (mm)

CS6890 Deep Learning Weizhen Cai

convolutional neural networkS

Multiple Organ Detection in CT Volumes using CNN Week 1

convolutional neural networkS

Zero shot learning Presented by: YuYing Chou

Neural Networks and Deep Learning

Outline Background Motivation Proposed Model Experimental Results

SVM-based Deep Stacking Networks

Intent-Aware Semantic Query Annotation

Source: Signal Processing, Vol. 125, pp , August 2016.

Controllable and Trustworthy Blockchain-based Cloud Data Management

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Sourse: Multimedia Tools and Applications, 2018, pp 1–17

Heterogeneous convolutional neural networks for visual recognition

Source：Multimedia Tools and Applications, Vol. 77, No. 20, pp , Oct

Data hiding method using image interpolation

Association Analyses Identify Three Susceptibility Loci for Vitiligo in the Chinese Han Population Xian-Fa Tang, Zheng Zhang, Da-Yan Hu, Ai-E Xu, Hai-Sheng.

Automatic Handwriting Generation

Presented by: Anurag Paul

Image Processing and Multi-domain Translation

Using Association Rules as Texture features

Sourse: Information Sciences, Vol. 494, pp , August 2019

Privacy Protection for E-Health Systems by

Visual Grounding 专题报告 Lejian Ren 4.23.

Deep Structured Scene Parsing by Learning with Image Descriptions

Deep learning: Recurrent Neural Networks CV192

Sourse: arXiv preprint, arXiv: , 2018 (Submit to IEEE Trans

Nguyen Ngoc Hoang, Guee-Sang Lee, Soo-Hyung Kim, Hyung-Jeong Yang

What and How Well You Performed

Rich QR Codes With Three-Layer Information Using Hamming Code

Reversible data hiding in encrypted binary images by pixel prediction

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation Source: IEEE Transactions on Image Processing, Vol. 28, No. 4, pp. 1720-1731, April 2019. Author: Yu-Lei Niu, Zhi-Wu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang Speaker: Chih-Lung Chen Date: 2019/05/23

Outline Introduction Preliminaries Proposed scheme Experiments Conclusions

Introduction (1/2) Cat Dog ? Cat ? Dog Annotation Application

Introduction (2/2) Single label Top-𝑘 label Ground truth Top-𝟑 person, water, mountain, reflection, sky, leaf Ground truth thunder, cloud, tree flower Top-𝟑 thunder, cloud, tree person, water, mountain flower, fire, sky person, water, mountain, reflection, sky, leaf Proposed thunder, cloud, tree flower

Preliminaries (1/5) - NN NN Input Output How are you? I’m fine. Cat Neural network NN Input Output How are you? I’m fine. Cat

Preliminaries (2/5) - NN 𝑦=𝑤𝑥+𝑏 Cat Input Output Basic classifier

Preliminaries (3/5) - CNN Convolutional neural network 3. 2. 1.

Preliminaries (4/5) - CNN 1 -1 1 Neuron -2 -3 3 -2 -1 -2 -2 3 1 -1 -2 -2 -2 3 -2 -2 Image

Preliminaries (5/5) - CNN 1 -1 -2 -3 3 -2 -1 3 -1 -2 -2 3 1 -1 -2 -2 3 -2 -2 3 -2 -2

Proposed scheme (1/2) – MS-CNN Multi-scale convolutional neural network Fusion_1 Fusion_2 Fusion_3 Fusion_4 Conv_1 Conv_2 Conv_3 Conv_4 Conv_5

Proposed scheme(2/2) MS-CNN NN Multi-class Visual feature extraction classification Visual feature extraction Cat Dog Rug Grass . MS-CNN Image NN Cat Dog Rug Concatenate NN Pet Home Tags NN 3 Label quantity prediction Textual feature extraction

Experiments (1/5) Dataset NUS-WIDE MSCOCO Dataset T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, "NUS-WIDE: A real-world Web image database from National University of Singapore", Proc. CIVR, pp. 48, Jan. 2009. O. Vinyals, A. Toshev, S. Bengio, D. Erhan, "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 4, pp. 652-663, Apr. 2017.

Experiments (2/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket NUS-WIDE Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug

Experiments (3/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket MSCOCO Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug

Experiments (4/5) NUS-WIDE MSCOCO

Experiments (5/5)

Conclusions Multi-scale Adaptive label

Thanks for listening

References [22] H. Hu, G.-T. Zhou, Z. Deng, Z. Liao, G. Mori, "Learning structured inference neural networks with label relations", Proc. CVPR, pp. 2960-2968, Jun. 2016. [23] J. Johnson, L. Ballan, L. Fei-Fei, "Love thy neighbors: Image annotation by exploiting image metadata", Proc. ICCV, pp. 4624-4632, Dec. 2015. [24] F. Liu, T. Xiang, T. M. Hospedales, W. Yang, C. Sun, "Semantic regularisation for recurrent image annotation", 2016, [online] Available: https://arxiv.org/abs/1611.05490. [25] J. Jin, H. Nakayama, "Annotation order matters: Recurrent image annotator for arbitrary length image tagging", Proc. ICPR, pp. 2452-2457, Dec. 2016. [26] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, W. Xu, "CNN-RNN: A unified framework for multi-label image classification", Proc. CVPR, pp. 2285-2294, Jun. 2016. [30] Y. Gong, Y. Jia, T. Leung, A. Toshev, S. Ioffe, "Deep convolutional ranking for multilabel image annotation", 2013, [online] Available: https://arxiv.org/abs/1312.4894.