Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Cross-media Knowledge Transfer

Similar presentations


Presentation on theme: "Deep Cross-media Knowledge Transfer"— Presentation transcript:

1 Deep Cross-media Knowledge Transfer
CVPR 2018 Deep Cross-media Knowledge Transfer Xin Huang, and Yuxin Peng* Institute of Computer Science and Technology, Peking University, Beijing , China Hello everyone! Very glad to present our work in CVPR 2018, Deep Cross-media Knowledge Transfer. The authors are Xin Huang and Professor Yuxin Peng. We are from Peking University. I’m Xin Huang.

2 Cross-media Retrieval
Cross-media retrieval aims to perform retrieval across different media types, such as image, text, video, audio, and 3D model Results of various media types Image Text Video Audio 3D Model Multimedia data Query of any media type The topic of this paper is cross-media retrieval, which aims to perform retrieval across different media types. For example, by an image query of Golden Gate Bridge, the system can return relevant data of image, text, video, audio, and 3D model. Image (Golden Gate Bridge)

3 Motivation Key problem of cross-media retrieval: Different media types have inconsistent representations (i.e., Heterogeneity Gap) Deep network Common representation Color Texture Shape Image Cross-media similarity cannot be computed Heterogeneous The key problem of cross-media retrieval is that different media types have inconsistent representations, so their similarity cannot be directly computed. The current mainstream methods are to use deep learning for generating cross-media common representation, so the cross-media similarity can be computed. Cross-media similarity can be computed Text Word Frequency

4 Motivation Common challenge: Insufficient labeled cross-media data
Retrieval accuracy Usually relies on data amount Deep Network Needs large-scale data Cross-media data labeling Extremely labor-consuming Read text Watch video See image Listen to audio Observe 3D model Contradiction Existing methods face the challenge of insufficient labeled cross-media data. The retrieval accuracy usually relies on the amount of training data. However, it is extremely labor-consuming to label cross-media data. Imagine you are labeling the data of “Golden Gate Bridge”, you have to see image, read text, watch video, listen to audio, and even observe 3D model. This limits the performance of current cross-media retrieval methods. Limit the performance of existing cross-media retrieval methods

5 Inter-media correlation
Motivation Transfer learning is usually used to relieve the problem of insufficient training data However, existing transfer methods are mostly designed within the same media type Source domain Target domain Transfer Intra-media semantic knowledge transfer Inter-media correlation Correlation Source domain Target domain Transfer Transfer learning is usually used to relieve the problem of insufficient training data. However, existing transfer methods are mostly designed within the same media type. They pays little attention to knowledge transfer between two cross-media domains, including Intra-media semantic and Inter-media correlation knowledge. Therefore, we propose deep cross-media knowledge transfer (DCKT), to transfer from cross-media source to target domains. We propose deep cross-media knowledge transfer (DCKT), to transfer from cross-media source to target domains

6 Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Media-level Transfer MMD loss Separate Representation Adaptation Pairwise constraint loss Coexistence Information This is the overall architecture of DCKT, which has two transfer levels: First, media-level transfer is proposed to transfer intra-media knowledge between two domains, with MMD loss and pairwise constraint loss.

7 Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Correlation-level Transfer MMD loss Common Representation Adaptation Semantic loss Supervised Label Information Second, correlation-level transfer is proposed to align inter-media correlation of the two domains, including MMD loss and semantic loss

8 Correlation-level Transfer
Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Jointly transfer to boost retrieval accuracy of target domain In this way, knowledge in source domain can be jointly transferred to target domain for boosting retrieval accuracy. Media-level Transfer Correlation-level Transfer

9 Progressive Transfer Mechanism
Problem: Some hard samples in target domain are not so relevant to source domain, leading to negative transfer effect Network pre-training Source data Source cross-media retrieval model Domain consistency metric Target data Correlation prediction Source model For robust transfer and higher retrieval accuracy Cross-media domain gap is very vast, and hard transfer samples lead to negative transfer effect. We further propose Progressive Transfer Mechanism. We let the model trained in source domain serve as a teacher. Then we automatically select samples in target domain during transfer process. In this way, we gradually transfer from easy to hard samples, For robust transfer and higher retrieval accuracy Transfer sample selection Easy samples (High consistency) Hard samples (Low consistency ) Domain consistency ranking Sample selection probability 𝑃𝑟𝑜𝑏 𝑞 =𝛼[1− 𝑙𝑜𝑔 2 𝑚𝑎𝑥⁡(𝐴𝑃)− 𝐴𝑃 𝑞 𝑚𝑎𝑥⁡(𝐴𝑃)×𝑖𝑡𝑒𝑟 +1 ]

10 Experiment Dataset Source domain Target (retrieval) domain:
XMediaNet (self-constructed): The first large-scale cross-media dataset with 5 media types, and more than 100,000 instances Target (retrieval) domain: Wikipedia: 2,866 image/text pairs NUS-WIDE-10k:10,000 image/text pairs Pascal Sentences: 1,000 image/text pairs Next, experiment. The self-constructed XMediaNet dataset is used as the source domain, which is a large-scale dataset with more than 100 thousand instances. We adopt 3 widely-used, but small-scale datasets as target domain. Compared with the methods which are only trained with the target domain, DCKT can obtain an inspiring improvement. DCKT achieves the highest accuracy compared with state-of-the-art methods

11 Multimedia Information
CVPR 2018 Lab Homepage Github Homepage Thank you for your listening. Here are the QR codes of our lab home page and Github home page. Welcome to our websites for more information. Multimedia Information Processing Lab (MIPL)


Download ppt "Deep Cross-media Knowledge Transfer"

Similar presentations


Ads by Google