Progressive Cross-media Correlation Learning

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Ming Yan, Jitao Sang, Tao Mei, ChangSheng Xu
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Presented by Zeehasham Rasheed
Spatial Pyramid Pooling in Deep Convolutional
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Kuan-Chuan Peng Tsuhan Chen
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
SHREC’16 Track: 3D Sketch-Based 3D Shape Retrieval
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Recent developments in object detection
Automatically Labeled Data Generation for Large Scale Event Extraction
Guillaume-Alexandre Bilodeau
Object detection with deformable part-based models
Deep Predictive Model for Autonomous Driving
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Saliency-guided Video Classification via Adaptively weighted learning
Learning Mid-Level Features For Recognition
An Empirical Study of Learning to Rank for Entity Search
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Personalized Social Image Recommendation
Compositional Human Pose Regression
Presenter: Chu-Song Chen
Deep Cross-Modal Hashing
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Distributed Representation of Words, Sentences and Paragraphs
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Bo Wang1, Yingfei Xiong2, Zhenjiang Hu3, Haiyan Zhao1,
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Outline Background Motivation Proposed Model Experimental Results
Deep Cross-media Knowledge Transfer
Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.
Deep Robust Unsupervised Multi-Modal Network
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Zhedong Zheng, Liang Zheng and Yi Yang
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Related Work in Camera Network Tracking
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Human-object interaction
Feature Selective Anchor-Free Module for Single-Shot Object Detection
Topic: Semantic Text Mining
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Visual Grounding 专题报告 Lejian Ren 4.23.
Deep Structured Scene Parsing by Learning with Image Descriptions
Eliminating Background-Bias for Robust Person Re-identification
Learning to Navigate for Fine-grained Classification
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Do Better ImageNet Models Transfer Better?
Presentation transcript:

Progressive Cross-media Correlation Learning IGTA 2018 Progressive Cross-media Correlation Learning Xin Huang, and Yuxin Peng* Institute of Computer Science and Technology, Peking University, Beijing 100871, China {huangxin_14@pku.edu.cn, pengyuxin@pku.edu.cn}

Outline Introduction Method Experiment Conclusion

Introduction What is cross-media retrieval? Single-media retrieval: retrieve the relevant results of the same media type with the query. (e.g., image → image and text → text) Cross-media retrieval: retrieve the relevant results of different media type from the query. (e.g., image → text and text → image) Recently, deep neural network has achieved great success in some artificial intelligence tasks. The deep convolutional neural network, which is usually trained by image classification task, has been proven to be a powerful building block for many multimedia applications. For Example, we could use the net to tag images for image retrieval. And we could use the DNN feature to do object detection in images or videos.

Common Representation current research hot spot Introduction Problem: “Heterogeneity Gap” different modalities have inconsistent representations Cross-modal common representation learning Learn projections to represent data of different modalities with the same type of “feature” Mainstream of existing methods Existing methods Traditional methods (mainly linear projections) DNN-based methods Only a few deaths from the fire are officially recorded, and deaths are Before the 1950s, Stanford "had the reputation of being a Cuninggim also charged that Stanford's religious policies were In the 1960s, the study of religion at Stanford focused not on academics, but on social and ethical issues like race and image text video audio Common Representation 现有方法主流是深度 current research hot spot

Introduction Motivation Key problem: cross-media correlation learning from data Existing methods indiscriminately take all data for training A bicycle chain is a roller chain that transfers power from the pedals to the drive-wheel of a bicycle, thus propelling it. ... Pygmy hippos share the same general form as a hippopotamus. ... with four short legs and four toes on each foot,... Complex Correlation The bird was named for its similarity in colouration to the European magpie; it was a common practice for early settlers ... At noon on 1 August, the US Third Army was activated under the command of Lieutenant General George … 现有方法主流是深度 Easy samples: easy to capture correlation, with clear cues Hard samples: rich semantic, but with misleading and noisy information Hard samples bring negative effect especially in the early period of model training!

Outline Introduction Method Experiment Conclusion Next, we will present the details of our method.

Method Progressive Cross-media Correlation Learning: Core idea: Training gradually from easy samples to hard samples, guided by a large-scale dataset Step1: Reference Model Training Large-scale data with general knowledge is much more reliable Reference model is a teacher to guide sample selection

Method Step1: Reference Model Training Hierarchical Correlation Learning Architecture Pairwise constraint: Coexistence cue Semantic constraint: Semantic consistency

Method Step3: Difficulty Assignment for Target Data Step2: Relevance Significance Metric Use reference model to generate common representation for target data Perform intra-media and inter-media retrieval in target data Evaluate relevance significance for each pair Intra-media relevance significance Inter-media relevance significance Step3: Difficulty Assignment for Target Data Intuitively, high relevance significance means easy sampls

Method Step4: Progressive Training of Target Model Select top-k instances with largest relevance significance Initialize Iter as 1 As Iter becomes large, more samples will be selected In late period of training, hard samples are considered to preserve diversity

Outline Introduction Method Experiment Conclusion

Experiment Compared methods: Totally 9 state-of-the-art methods: CCA [Hotelling, Biometrika 1936] CFA [Li et al., ACM-MM 2003] KCCA [Hardoon et al., Neural Computation 2004] Corr-AE [Feng et al., ACM MM 2014] JRL [Zhai et al., IEEE TCSVT 2014] LGCFL [Kang et al., IEEE TMM 2015] DCCA [Yan et al., CVPR 2015] CMDN [Peng et al., IJCAI 2016] Deep-SM [Wei et al., IEEE TCYB 2017]

Experiment Datasets: Reference data Target data XMediaNet dataset (constructed by our laboratory) The first large-scale cross-media datasets with 5 media types (text, image, audio, video and 3D model), 200 categories, and 100,000 instances. We focus on the scenario of image and text, so we choose the training set of image and text data from XMediaNet with 32,000 pairs. Target data Wikipedia Dataset: 2,866 image/text pairs with 10 high-level semantic categories, which is randomly split into a training set with 2,173 pairs, a testing set with 462 pairs, and a validation set with 231 pairs following NUS-WIDE-10k Dataset: Subset of NUS-WIDE dataset, which contains 10,000 image/text pairs of 10 semantic categories. Split into a training set with 8,000 pairs, a testing set with 1,000 pairs.

Experiment Results MAP scores on 2 datasets compared with existing methods PCCL outperforms all the compared methods on 2 datasets

Outline Introduction Method Experiment Conclusion

Conclusion Conclusion 4 steps Proposed approach Progressive Cross-media Correlation Learning (PCCL) Idea Use a large-scale cross-media dataset to guide the progressive sample selection on another small-scale dataset 4 steps 1: Reference Model Training 2: Relevance Significance Metric 3: Difficulty Assignment for Target Data 4: Progressive Training of Target Model Achieve accuracy improvement on 2 widely-used cross-media datasets

Cross-media Retrieval We have released XMedia dataset with 5 media types. This dataset and source codes of our related works: Interested in cross-media retrieval? Hope our recent overview would be helpful for you http://www.icst.pku.edu.cn/mipl/xmedia Beyond this work, we have … Yuxin Peng, Xin Huang, and Yunzhen Zhao, “An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges”, IEEE TCSVT, 2017. arXiv: 1704.02223.

IGTA 2018 Thank you! Github Homepage (Source Codes)