Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method to project different modalities into a common knowledge vector space for a united knowledge representation. We construct a large-scale muti-modal relationship library
Motivation
Framework
Bi-enhanced cross-modal knowledge representation
Visual Relationship Recognition the input of this experiment is the image region containing visual relationship and the output is its relationship type extract all knowledge vectors from these relationship regions and use multi-class SVM to train the visual relationship recognition model
Zero-shot Multi-modal Retrieval