Deep Cross-media Knowledge Transfer

Slides:



Advertisements
Similar presentations
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Advertisements

ACM Multimedia th Annual Conference, October , 2004
Presented by Zeehasham Rasheed
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
DeepFont: Large-Scale Real-World Font Recognition from Images
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Introduction to Machine Learning, its potential usage in network area,
Naifan Zhuang, Jun Ye, Kien A. Hua
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Automatically Labeled Data Generation for Large Scale Event Extraction
Scalable Person Re-identification on Supervised Smoothed Manifold
Bridging Domains Using World Wide Knowledge for Transfer Learning
Evaluation Anisio Lacerda.
ALADDIN A Locality Aligned Deep Model for Instance Search
DeepFont: Identify Your Font from An Image
Bag-of-Visual-Words Based Feature Extraction
WSRec: A Collaborative Filtering Based Web Service Recommender System
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Saliency-guided Video Classification via Adaptively weighted learning
Multimedia Content-Based Retrieval
Cross Domain Distribution Adaptation via Kernel Mapping
Web 3.0: Semantic web Presentation by: Amardeep Singh Shakhon
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Personalized Social Image Recommendation
Introductory Seminar on Research: Fall 2017
Fast Preprocessing for Robust Face Sketch Synthesis
Deep Cross-Modal Hashing
Presenter: Hajar Emami
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.
Crossing the gap between multimedia data and semantics
Multimedia Information Retrieval
Presenter: Hajar Emami
Overview What is Multimedia? Characteristics of multimedia
Distributed Representation of Words, Sentences and Paragraphs
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Progressive Cross-media Correlation Learning
Exploring Matching Networks for Cross Language Information Retrieval
Introduction to Pattern Recognition
Similarity based on Shape and Appearance
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Web Mining Department of Computer Science and Engg.
Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.
Deep Robust Unsupervised Multi-Modal Network
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Dialogue State Tracking & Dialogue Corpus Survey
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Using Uneven Margins SVM and Perceptron for IE
Sensing Object Semantics for Interactive Multimedia Applications
Ying Dai Faculty of software and information science,
Heterogeneous convolutional neural networks for visual recognition
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Neural Modular Networks
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
LOGAN: Unpaired Shape Transform in Latent Overcomplete Space
Human-object interaction
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Week 7 Presentation Ngoc Ta Aidean Sharghi
Learning to Navigate for Fine-grained Classification
CVPR 2019 Poster.
The experiment based on hier-attention
Do Better ImageNet Models Transfer Better?
Presentation transcript:

Deep Cross-media Knowledge Transfer CVPR 2018 Deep Cross-media Knowledge Transfer Xin Huang, and Yuxin Peng* Institute of Computer Science and Technology, Peking University, Beijing 100871, China {huangxin_14@pku.edu.cn, pengyuxin@pku.edu.cn} Hello everyone! Very glad to present our work in CVPR 2018, Deep Cross-media Knowledge Transfer. The authors are Xin Huang and Professor Yuxin Peng. We are from Peking University. I’m Xin Huang.

Cross-media Retrieval Cross-media retrieval aims to perform retrieval across different media types, such as image, text, video, audio, and 3D model Results of various media types Image Text Video Audio 3D Model Multimedia data Query of any media type The topic of this paper is cross-media retrieval, which aims to perform retrieval across different media types. For example, by an image query of Golden Gate Bridge, the system can return relevant data of image, text, video, audio, and 3D model. Image (Golden Gate Bridge)

Motivation Key problem of cross-media retrieval: Different media types have inconsistent representations (i.e., Heterogeneity Gap) Deep network Common representation Color Texture Shape Image Cross-media similarity cannot be computed Heterogeneous The key problem of cross-media retrieval is that different media types have inconsistent representations, so their similarity cannot be directly computed. The current mainstream methods are to use deep learning for generating cross-media common representation, so the cross-media similarity can be computed. Cross-media similarity can be computed Text Word Frequency

Motivation Common challenge: Insufficient labeled cross-media data Retrieval accuracy Usually relies on data amount Deep Network Needs large-scale data Cross-media data labeling Extremely labor-consuming Read text Watch video See image Listen to audio Observe 3D model Contradiction Existing methods face the challenge of insufficient labeled cross-media data. The retrieval accuracy usually relies on the amount of training data. However, it is extremely labor-consuming to label cross-media data. Imagine you are labeling the data of “Golden Gate Bridge”, you have to see image, read text, watch video, listen to audio, and even observe 3D model. This limits the performance of current cross-media retrieval methods. Limit the performance of existing cross-media retrieval methods

Inter-media correlation Motivation Transfer learning is usually used to relieve the problem of insufficient training data However, existing transfer methods are mostly designed within the same media type Source domain Target domain Transfer Intra-media semantic knowledge transfer Inter-media correlation Correlation Source domain Target domain Transfer Transfer learning is usually used to relieve the problem of insufficient training data. However, existing transfer methods are mostly designed within the same media type. They pays little attention to knowledge transfer between two cross-media domains, including Intra-media semantic and Inter-media correlation knowledge. Therefore, we propose deep cross-media knowledge transfer (DCKT), to transfer from cross-media source to target domains. We propose deep cross-media knowledge transfer (DCKT), to transfer from cross-media source to target domains

Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Media-level Transfer MMD loss Separate Representation Adaptation Pairwise constraint loss Coexistence Information This is the overall architecture of DCKT, which has two transfer levels: First, media-level transfer is proposed to transfer intra-media knowledge between two domains, with MMD loss and pairwise constraint loss.

Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Correlation-level Transfer MMD loss Common Representation Adaptation Semantic loss Supervised Label Information Second, correlation-level transfer is proposed to align inter-media correlation of the two domains, including MMD loss and semantic loss

Correlation-level Transfer Network Architecture An end-to-end architecture with 2 transfer levels: Media-level transfer and Correlation-level transfer Jointly transfer to boost retrieval accuracy of target domain In this way, knowledge in source domain can be jointly transferred to target domain for boosting retrieval accuracy. Media-level Transfer Correlation-level Transfer

Progressive Transfer Mechanism Problem: Some hard samples in target domain are not so relevant to source domain, leading to negative transfer effect Network pre-training Source data Source cross-media retrieval model Domain consistency metric Target data Correlation prediction Source model For robust transfer and higher retrieval accuracy Cross-media domain gap is very vast, and hard transfer samples lead to negative transfer effect. We further propose Progressive Transfer Mechanism. We let the model trained in source domain serve as a teacher. Then we automatically select samples in target domain during transfer process. In this way, we gradually transfer from easy to hard samples, For robust transfer and higher retrieval accuracy Transfer sample selection Easy samples (High consistency) Hard samples (Low consistency ) … Domain consistency ranking Sample selection probability 𝑃𝑟𝑜𝑏 𝑞 =𝛼[1− 𝑙𝑜𝑔 2 𝑚𝑎𝑥⁡(𝐴𝑃)− 𝐴𝑃 𝑞 𝑚𝑎𝑥⁡(𝐴𝑃)×𝑖𝑡𝑒𝑟 +1 ]

Experiment Dataset Source domain Target (retrieval) domain: XMediaNet (self-constructed): The first large-scale cross-media dataset with 5 media types, and more than 100,000 instances Target (retrieval) domain: Wikipedia: 2,866 image/text pairs NUS-WIDE-10k:10,000 image/text pairs Pascal Sentences: 1,000 image/text pairs Next, experiment. The self-constructed XMediaNet dataset is used as the source domain, which is a large-scale dataset with more than 100 thousand instances. We adopt 3 widely-used, but small-scale datasets as target domain. Compared with the methods which are only trained with the target domain, DCKT can obtain an inspiring improvement. DCKT achieves the highest accuracy compared with state-of-the-art methods

Multimedia Information CVPR 2018 Lab Homepage Github Homepage Thank you for your listening. Here are the QR codes of our lab home page and Github home page. Welcome to our websites for more information. http://www.icst.pku.edu.cn/mipl Multimedia Information Processing Lab (MIPL)