Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Object Context for Dense Captioning

Similar presentations


Presentation on theme: "Learning Object Context for Dense Captioning"— Presentation transcript:

1 Learning Object Context for Dense Captioning
Xiangyang Li1, Shuqiang Jiang1, Jungong Han2 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, CAS 2School of Computing and Communications, Lancaster University

2 Learning Object Context for Dense Captioning
a boy rides a horse the man wears a hat a young boy wearing a red shirt the man’s shirt is blue the saddle pad is red the man holds a cutting tool there is a dog Locate image regions Describe these regions with natural language

3 Learning Object Context for Dense Captioning
To generate rich and accurate descriptions for target caption regions, it is important to be aware of the contextual content. without context with context seem right should be a field of green grass grass near the horse

4 Learning Object Context for Dense Captioning
To generate rich and accurate descriptions for target caption regions, it is important to be aware of the contextual content. without context with context seem right should be a man wearing blue shirt catcher crouching behind hitter

5 Learning Object Context for Dense Captioning
Previous approaches: Utilize region appearance alone [1] Leverage region appearance and the whole image (as the context) [2] the man’s shirt is blue Box regression Caption generation Box regression + Caption generation [1] Johson et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning. In CVPR 2016. [2] Yang et al. Dense captioning with joint inference and visual context. In CVPR 2017.

6 Learning Object Context for Dense Captioning
Objects in images provide valuable cues for predicting locations and generating the target descriptions.

7 Learning Object Context for Dense Captioning
First, caption regions and objects have high overlaps in spatial locations.

8 Learning Object Context for Dense Captioning
Second, the descriptions for caption regions and objects have commonalities in semantic concepts. (a) Object instances (b) Vocabularies in the descriptions of caption regions

9 Learning Object Context for Dense Captioning
Modeling complementary object context for each caption region:

10 Learning Object Context for Dense Captioning
Object Context Encoding:

11 Learning Object Context for Dense Captioning
Region caption generation and localization: Complementary object context is decoded with a LSTM (COCD) Complementary object context is used as guidance (COCG) COCD COCG

12 Learning Object Context for Dense Captioning
Datasets: VG-COCO: the intersection of VG (Visual Genome) V1.2 and MS COCO VG V1.0 VG V1.2 Dataset Training Validation Test VG V1.0 77,398 5,000 VG V1.2 VG-COCO 38,080 2,489 2,476

13 Learning Object Context for Dense Captioning
Results on VG-COCO:

14 Learning Object Context for Dense Captioning
Results on VG-COCO:

15 Learning Object Context for Dense Captioning
Qualitative results:

16 Learning Object Context for Dense Captioning
Qualitative results:

17 Learning Object Context for Dense Captioning
Results on VG:

18 Learning Object Context for Dense Captioning
Visualization of the object context encoding LSTM:

19 Learning Object Context for Dense Captioning
Conclusion: We introduce a framework with an object context encoding LSTM module to explicitly learn complementary object context for locating and describing each caption region. The experimental results show the effectiveness of our proposed method, which transfers knowledge from detected objects to caption regions. Future work: Exploit high-level semantic information from objects to obtain more useful cues for locating and describing caption regions.

20 Thank you!


Download ppt "Learning Object Context for Dense Captioning"

Similar presentations


Ads by Google