Download presentation
Presentation is loading. Please wait.
Published bySurya Santoso Modified over 5 years ago
2
Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method to project different modalities into a common knowledge vector space for a united knowledge representation. We construct a large-scale muti-modal relationship library
3
Motivation
4
Framework
5
Bi-enhanced cross-modal knowledge representation
6
Visual Relationship Recognition
the input of this experiment is the image region containing visual relationship and the output is its relationship type extract all knowledge vectors from these relationship regions and use multi-class SVM to train the visual relationship recognition model
7
Zero-shot Multi-modal Retrieval
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.