Presentation is loading. Please wait.

Presentation is loading. Please wait.

CNN-RNN: A Unified Framework for Multi-label Image Classification

Similar presentations


Presentation on theme: "CNN-RNN: A Unified Framework for Multi-label Image Classification"— Presentation transcript:

1 CNN-RNN: A Unified Framework for Multi-label Image Classification
Xueying Bai, Jiankun Xu

2 Multi-label Image Classification
Co-occurrence dependency Higher-order correlation: one label can be predicted using the previous label Semantic redundancy: labels have overlapping meanings (cat and kitten)

3 Previous Models Multiple single-label classification
Fail to model the dependency between multiple labels Graphic model Large amount of parameters; Can not model higher-order correlation

4 RNN-CNN Model Learn the semantic redundancy and the co-occurrence dependencies Have an end-to-end training process Predict more objects that need contexts (higher-order correlation)

5 CNN-RNN Framework

6 Joint Embedding Model Label embedding: the embedding vector in a low-d Euclidian space in which embeddings of semantically similar labels are close to each other Image embedding: the embedding vector close to that of its associated labels in the same space Exploit semantic redundancy problem: share classification parameters

7 Model Diagram Output of CNN: Image embedding
Output of RNN (o(t)): new embedding including the information from previous label (to model higher order correlations)

8 LSTM

9 Recurrent Neural Network

10 Inference Prediction Path Beam Search
Find top N labels in each time step as candidates Find top N prediction paths for each time (t+1)

11 Beam Search When comes to ‘End’: add to the candidate path set
Termination condition: probability of current intermediate paths is smaller than that of all candidate paths.

12 Experiments CNN module uses the 16 layers VGG network
Dimension of label embedding is 64 Dimension of LSTM RNN layer is 512 Test on Datasets: NUS-WIDE, MS COCO and VOC PASCAL 2007

13 Precision: correctly annotated labels/ generated labels
Evaluation Metric Precision: correctly annotated labels/ generated labels Recall: correctly annotated labels/ ground-truth labels C-P, O-P; C-R, O-R C-Fl, O-Fl: geometrical average MAP

14 NUS-WIDE A web image dataset contains 269648 images and 5018 tags.
Test on dataset with 1000 tags and 81 tags.

15

16

17 MS COCO It contains 123 thousand images of 80 objects types.
Training data has images and testing data has images. Most images have multiple objects.

18

19

20 PASCAL VOC 2007 Training data has 5011 images and testing data has 4952 images. Use AP and mAP to evaluate.

21 Label embedding The model effectively learns a joint label embedding

22

23 Attention Visualization

24 Conclusion and Future Work
Combines the advantages of the joint image/label embedding and label co-occurrence models by employing CNN and RNN Experimental results on several datasets show good performance Predicting small objects is still a challenge.

25 Reference: CNN-RNN: A Unified Framework for Multi-label Image Classification — Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu Questions?

26 Thank you all!


Download ppt "CNN-RNN: A Unified Framework for Multi-label Image Classification"

Similar presentations


Ads by Google