CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation
Xueying Bai, Jiankun Xu

Multi-label Image Classification
Co-occurrence dependency Higher-order correlation: one label can be predicted using the previous label Semantic redundancy: labels have overlapping meanings (cat and kitten)

Previous Models Multiple single-label classification
Fail to model the dependency between multiple labels Graphic model Large amount of parameters; Can not model higher-order correlation

RNN-CNN Model Learn the semantic redundancy and the co-occurrence dependencies Have an end-to-end training process Predict more objects that need contexts (higher-order correlation)

CNN-RNN Framework

Joint Embedding Model Label embedding: the embedding vector in a low-d Euclidian space in which embeddings of semantically similar labels are close to each other Image embedding: the embedding vector close to that of its associated labels in the same space Exploit semantic redundancy problem: share classification parameters

Model Diagram Output of CNN: Image embedding
Output of RNN (o(t)): new embedding including the information from previous label (to model higher order correlations)

Recurrent Neural Network

Inference Prediction Path Beam Search
Find top N labels in each time step as candidates Find top N prediction paths for each time (t+1)

Beam Search When comes to ‘End’: add to the candidate path set
Termination condition: probability of current intermediate paths is smaller than that of all candidate paths.

Experiments CNN module uses the 16 layers VGG network
Dimension of label embedding is 64 Dimension of LSTM RNN layer is 512 Test on Datasets: NUS-WIDE, MS COCO and VOC PASCAL 2007

Precision: correctly annotated labels/ generated labels
Evaluation Metric Precision: correctly annotated labels/ generated labels Recall: correctly annotated labels/ ground-truth labels C-P, O-P; C-R, O-R C-Fl, O-Fl: geometrical average MAP

NUS-WIDE A web image dataset contains 269648 images and 5018 tags.
Test on dataset with 1000 tags and 81 tags.

MS COCO It contains 123 thousand images of 80 objects types.
Training data has images and testing data has images. Most images have multiple objects.

PASCAL VOC 2007 Training data has 5011 images and testing data has 4952 images. Use AP and mAP to evaluate.

Label embedding The model effectively learns a joint label embedding

Attention Visualization

Conclusion and Future Work
Combines the advantages of the joint image/label embedding and label co-occurrence models by employing CNN and RNN Experimental results on several datasets show good performance Predicting small objects is still a challenge.

Reference: CNN-RNN: A Unified Framework for Multi-label Image Classification — Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu Questions?

Thank you all！

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

Similar presentations

Presentation on theme: "CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

Similar presentations

Presentation on theme: "CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation"— Presentation transcript:

Similar presentations

About project

Feedback