Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation Source: IEEE Transactions on Image Processing, Vol. 28, No. 4, pp. 1720-1731, April 2019. Author: Yu-Lei Niu, Zhi-Wu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang Speaker: Chih-Lung Chen Date: 2019/05/23
Outline Introduction Preliminaries Proposed scheme Experiments Conclusions
Introduction (1/2) Cat Dog ? Cat ? Dog Annotation Application
Introduction (2/2) Single label Top-𝑘 label Ground truth Top-𝟑 person, water, mountain, reflection, sky, leaf Ground truth thunder, cloud, tree flower Top-𝟑 thunder, cloud, tree person, water, mountain flower, fire, sky person, water, mountain, reflection, sky, leaf Proposed thunder, cloud, tree flower
Preliminaries (1/5) - NN NN Input Output How are you? I’m fine. Cat Neural network NN Input Output How are you? I’m fine. Cat
Preliminaries (2/5) - NN 𝑦=𝑤𝑥+𝑏 Cat Input Output Basic classifier
Preliminaries (3/5) - CNN Convolutional neural network 3. 2. 1.
Preliminaries (4/5) - CNN 1 -1 1 Neuron -2 -3 3 -2 -1 -2 -2 3 1 -1 -2 -2 -2 3 -2 -2 Image
Preliminaries (5/5) - CNN 1 -1 -2 -3 3 -2 -1 3 -1 -2 -2 3 1 -1 -2 -2 3 -2 -2 3 -2 -2
Proposed scheme (1/2) – MS-CNN Multi-scale convolutional neural network Fusion_1 Fusion_2 Fusion_3 Fusion_4 Conv_1 Conv_2 Conv_3 Conv_4 Conv_5
Proposed scheme(2/2) MS-CNN NN Multi-class Visual feature extraction classification Visual feature extraction Cat Dog Rug Grass . MS-CNN Image NN Cat Dog Rug Concatenate NN Pet Home Tags NN 3 Label quantity prediction Textual feature extraction
Experiments (1/5) Dataset NUS-WIDE MSCOCO Dataset T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y. Zheng, "NUS-WIDE: A real-world Web image database from National University of Singapore", Proc. CIVR, pp. 48, Jan. 2009. O. Vinyals, A. Toshev, S. Bengio, D. Erhan, "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 4, pp. 652-663, Apr. 2017.
Experiments (2/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket NUS-WIDE Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug
Experiments (3/5) Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket MSCOCO Cat, Dog, Rug Dog, Chair, Refrigerator Dog, Blanket Dog, Chair, Door Dog, Blanket Cat, Dog, Rug
Experiments (4/5) NUS-WIDE MSCOCO
Experiments (5/5)
Conclusions Multi-scale Adaptive label
Thanks for listening
References [22] H. Hu, G.-T. Zhou, Z. Deng, Z. Liao, G. Mori, "Learning structured inference neural networks with label relations", Proc. CVPR, pp. 2960-2968, Jun. 2016. [23] J. Johnson, L. Ballan, L. Fei-Fei, "Love thy neighbors: Image annotation by exploiting image metadata", Proc. ICCV, pp. 4624-4632, Dec. 2015. [24] F. Liu, T. Xiang, T. M. Hospedales, W. Yang, C. Sun, "Semantic regularisation for recurrent image annotation", 2016, [online] Available: https://arxiv.org/abs/1611.05490. [25] J. Jin, H. Nakayama, "Annotation order matters: Recurrent image annotator for arbitrary length image tagging", Proc. ICPR, pp. 2452-2457, Dec. 2016. [26] J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, W. Xu, "CNN-RNN: A unified framework for multi-label image classification", Proc. CVPR, pp. 2285-2294, Jun. 2016. [30] Y. Gong, Y. Jia, T. Leung, A. Toshev, S. Ioffe, "Deep convolutional ranking for multilabel image annotation", 2013, [online] Available: https://arxiv.org/abs/1312.4894.