Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Slides:



Advertisements
Similar presentations
Limin Wang, Yu Qiao, and Xiaoou Tang
Advertisements

1 Adjustable prediction-based reversible data hiding Authors: Chin-Feng Lee and Hsing-Ling Chen Source: Digital Signal Processing, Vol. 22, No. 6, pp.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Deep Learning and Neural Nets Spring 2015
Methods in Leading Face Verification Algorithms
Kuan-Chuan Peng Tsuhan Chen
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Video Tracking Using Learned Hierarchical Features
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
Introduction to Video Background Subtraction 1. Motivation In video action analysis, there are many popular applications like surveillance for security,
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.
Learning Hierarchical Features for Scene Labeling
Gaussian Conditional Random Field Network for Semantic Segmentation
1 Adaptive Data Hiding in Edge Areas of Images with Spatial LSB Domain Systems Source: IEEE Transactions on Information Forensics and Security, Vol. 3,
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham.
A Hierarchical Deep Temporal Model for Group Activity Recognition
Face Recognition based on 2D-PCA and CNN
Learning Multi-Domain Convolutional Neural Networks for Visual Tracking arXiv : [cs.CV] v1 2015, v Hyeonseob Nam, Bohyung Han Dept.
Recent developments in object detection
CNN-RNN: A Unified Framework for Multi-label Image Classification
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
From Vision to Grasping: Adapting Visual Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Saliency-guided Video Classification via Adaptively weighted learning
Combining CNN with RNN for scene labeling (segmentation)
Association Analyses Identify Three Susceptibility Loci for Vitiligo in the Chinese Han Population  Xian-Fa Tang, Zheng Zhang, Da-Yan Hu, Ai-E Xu, Hai-Sheng.
Part-Based Room Categorization for Household Service Robots
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Structured Predictions with Deep Learning
Picode: A New Picture-Embedding 2D Barcode
Mean Euclidean Distance Error (mm)
CS6890 Deep Learning Weizhen Cai
convolutional neural networkS
Multiple Organ Detection in CT Volumes using CNN Week 1
convolutional neural networkS
Zero shot learning Presented by: YuYing Chou
Neural Networks and Deep Learning
Outline Background Motivation Proposed Model Experimental Results
SVM-based Deep Stacking Networks
Intent-Aware Semantic Query Annotation
Source: Signal Processing, Vol. 125, pp , August 2016.
Controllable and Trustworthy Blockchain-based Cloud Data Management
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Sourse: Multimedia Tools and Applications, 2018, pp 1–17
Heterogeneous convolutional neural networks for visual recognition
Source:Multimedia Tools and Applications, Vol. 77, No. 20, pp , Oct
Data hiding method using image interpolation
Association Analyses Identify Three Susceptibility Loci for Vitiligo in the Chinese Han Population  Xian-Fa Tang, Zheng Zhang, Da-Yan Hu, Ai-E Xu, Hai-Sheng.
Automatic Handwriting Generation
Presented by: Anurag Paul
Image Processing and Multi-domain Translation
Using Association Rules as Texture features
Sourse: Information Sciences, Vol. 494, pp , August 2019
Privacy Protection for E-Health Systems by
Visual Grounding 专题报告 Lejian Ren 4.23.
Deep Structured Scene Parsing by Learning with Image Descriptions
Deep learning: Recurrent Neural Networks CV192
Sourse: arXiv preprint, arXiv: , 2018 (Submit to IEEE Trans
Nguyen Ngoc Hoang, Guee-Sang Lee, Soo-Hyung Kim, Hyung-Jeong Yang
What and How Well You Performed
Rich QR Codes With Three-Layer Information Using Hamming Code
Reversible data hiding in encrypted binary images by pixel prediction
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation Source: IEEE Transactions on Image Processing, Vol. 28, No. 4, pp. 1720-1731, April 2019. Author: Yu-Lei Niu, Zhi-Wu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang Speaker: Chih-Lung Chen Date: 2019/05/23

Outline Introduction Preliminaries Proposed scheme Experiments Conclusions

Introduction (1/2) Cat Dog ? Cat ? Dog Annotation Application

Introduction (2/2) Single label Top-𝑘 label Ground truth Top-𝟑 person, water, mountain, reflection, sky, leaf Ground truth thunder, cloud, tree flower Top-𝟑 thunder, cloud, tree person, water, mountain flower, fire, sky person, water, mountain, reflection, sky, leaf Proposed thunder, cloud, tree flower

Preliminaries (1/5) - NN NN Input Output How are you? I’m fine. Cat Neural network NN Input Output How are you? I’m fine. Cat

Preliminaries (2/5) - NN 𝑦=𝑤𝑥+𝑏 Cat Input Output Basic classifier

Preliminaries (3/5) - CNN Convolutional neural network 3. 2. 1.

Preliminaries (4/5) - CNN 1 -1 1 Neuron -2 -3 3 -2 -1 -2 -2 3 1 -1 -2 -2 -2 3 -2 -2 Image

Preliminaries (5/5) - CNN 1 -1 -2 -3 3 -2 -1 3 -1 -2 -2 3 1 -1 -2 -2 3 -2 -2 3 -2 -2

Proposed scheme (1/2) – MS-CNN Multi-scale convolutional neural network Fusion_1 Fusion_2 Fusion_3 Fusion_4 Conv_1 Conv_2 Conv_3 Conv_4 Conv_5

Proposed scheme(2/2) MS-CNN NN Multi-class Visual feature extraction classification Visual feature extraction Cat Dog Rug Grass . MS-CNN Image NN Cat Dog Rug Concatenate NN Pet Home Tags NN 3 Label quantity prediction Textual feature extraction

Experiments (1/5) Dataset NUS-WIDE MSCOCO Dataset T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, "NUS-WIDE: A real-world Web image database from National University of Singapore",  Proc. CIVR, pp. 48, Jan. 2009. O. Vinyals, A. Toshev, S. Bengio, D. Erhan, "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge",  IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 4, pp. 652-663, Apr. 2017.

Experiments (2/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket NUS-WIDE Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug

Experiments (3/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket MSCOCO Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug

Experiments (4/5) NUS-WIDE MSCOCO

Experiments (5/5)

Conclusions Multi-scale Adaptive label

Thanks for listening

References [22] H. Hu, G.-T. Zhou, Z. Deng, Z. Liao, G. Mori, "Learning structured inference neural networks with label relations", Proc. CVPR, pp. 2960-2968, Jun. 2016. [23] J. Johnson, L. Ballan, L. Fei-Fei, "Love thy neighbors: Image annotation by exploiting image metadata", Proc. ICCV, pp. 4624-4632, Dec. 2015. [24] F. Liu, T. Xiang, T. M. Hospedales, W. Yang, C. Sun, "Semantic regularisation for recurrent image annotation", 2016, [online] Available: https://arxiv.org/abs/1611.05490. [25] J. Jin, H. Nakayama, "Annotation order matters: Recurrent image annotator for arbitrary length image tagging", Proc. ICPR, pp. 2452-2457, Dec. 2016. [26] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, W. Xu, "CNN-RNN: A unified framework for multi-label image classification", Proc. CVPR, pp. 2285-2294, Jun. 2016. [30] Y. Gong, Y. Jia, T. Leung, A. Toshev, S. Ioffe, "Deep convolutional ranking for multilabel image annotation", 2013, [online] Available: https://arxiv.org/abs/1312.4894.