Cross Domain Distribution Adaptation via Kernel Mapping

Slides:



Advertisements
Similar presentations
Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun.
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Boosting Approach to ML
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Active Learning with Support Vector Machines
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Source-Selection-Free Transfer Learning
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Online Transfer Learning Algorithm ~ The Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS2009) Propose the first framework.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
CS 9633 Machine Learning Support Vector Machines
Recent Trends in Text Mining
Bridging Domains Using World Wide Knowledge for Transfer Learning
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Saisai Gong, Wei Hu, Yuzhong Qu
Introductory Seminar on Research: Fall 2017
Nonparametric Methods: Support Vector Machines
K Nearest Neighbor Classification
Minimax Probability Machine (MPM)
Discriminative Frequent Pattern Analysis for Effective Classification
COSC 4335: Other Classification Techniques
iSRD Spam Review Detection with Imbalanced Data Distributions
Deep Cross-media Knowledge Transfer
Knowledge Transfer via Multiple Model Local Structure Mapping
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
CAMCOS Report Day December 9th, 2015 San Jose State University
Machine Learning: Lecture 5
Presentation transcript:

Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong† Wei Fan‡ Jing Peng* Kun Zhang# Jiangtao Ren† Deepak Turaga‡ Olivier Verscheure‡ †Sun Yat-Sen University ‡IBM T. J. Watson Research Center *Montclair State University #Xavier University of Lousiana

Can We?

Standard Supervised Learning training (labeled)‏ test (unlabeled)‏ Classifier 85.5% In traditional learning situation, such as standard supervised learning, labeled and unlabeled data are assumed to come from the same domain. For example, in text categorization, our task is to tell whether a New York Times article comes from Business or Science section. A classifier “learns” from a collection of business and science articles from New Times website, then classifies unseen business or science articles from the same website. In such learning settings, high classification accuracy (e.g. 85.5%) can be achieved. New York Times New York Times

Labeled data not available! In Reality…… training (labeled)‏ test (unlabeled)‏ Classifier 64.1% However, in reality, things can be different. For example, it may be too costly to label a large amount of New York Times articles which are needed to train a good classifier. One may want to use those already available text data to classify these New York Times articles at hand. For example, business and science articles from Reuters. This is practical since the Reuters corpus is a well-known text classification data set. However, due to domain differences, some terms in Reuters may not appear in New York Times. Even they use the same terms, the distribution of the terms in two corpora could differ too. Labeled data not available! Reuters New York Times New York Times

Domain Difference->Performance Drop train test ideal setting Classifier NYT NYT 85.5% New York Times New York Times Such difference can lead to dramatic performance drop. Though the idea of using related domain to help the classification in another domain is appealing, it is characterized by several inherent difficulties realistic setting Classifier NYT Reuters 64.1% Reuters New York Times

Synthetic Example “two moons” and “two circles” have significantly different distributions.

Synthetic Example If we only use the labeled examples (highlighted in square) to construct a model (SVM polynomial kernel in this case), most unlabeled data are misclassified. [left figure] 2.If we simply borrow the labeled data from “two moons” to help learning a model on two circles, most of the unlabeled data are still misclassified. [right figure]

Main Challenge  Motivation Both the marginal and conditional distributions between target-domain and source-domain could be significantly different in the original space!! Could we remove those useless source-domain data? Could we find other feature spaces? How to get rid of these differences?

Main Flow Kernel Discriminant Analysis

Kernel Mapping Although data are very different in the original feature space, if one can find a proper mapping as the bridge, at least, the marginal distributions can be reasonably close in the new feature space.

Instances Selection Not all the examples from “two moons” are useful for “two circles”, but only those having similar functional relation or conditional probabilities can transfer knowledge across domains. 2.Each mapping can have its intrinsic bias and it is difficult to decide which mapping is the optimal. If we combine the predictions from different feature spaces, the result is expected to be better than using any single mapping.

Ensemble

Properties Kernel mapping can reduce the difference of marginal distributions between source and target domains. [Theorem 2] Both source and target domain after kernal mapping are approximately Gaussian. Cluster-based instances selection can select those data from source domain with similar conditional probabilities. [Cluster Assumption, Theorem 1] Error rate of the proposed approach can be bounded; [Theorem 3] Ensemble can further reduce the transfer risk. [Theorem 4]

Experiment – Data Set 20 News groups (Reuters) SyskillWebert Reuters First fill up the “GAP”, then use knn classifier to do classification 20 News groups (Reuters) comp comp.sys comp.graphics rec rec.sport rec.auto Target-Domain Source-Domain Experiment – Data Set Reuters 21758 Reuters news articles 20 News Groups 20000 newsgroup articles SyskillWebert HTML source of web pages plus the ratings of a user on those web pages from 4 different subjects All of them are high dimension (>1000)! First fill up the “GAP”, then use knn classifier to do classification SyskillWebert Target-Domain Sheep Biomedical Bands-recording Source-Domain Goats

Experiment -- Baseline methods Non-transfer single classifiers Transfer learning algorithm TrAdaBoost. Base classifiers: K-NN SVM NaiveBayes

Experiment -- Overall Performance kMapEnsemble -> 24 win, 3 lose! Dataset 1~9

Conclusion Domain transfer when margin and conditional distributions are different between two domains. Flow Step-1 Kernel mapping -- Bring two domains’ marginal distributions closer; Step-2 Cluster-based instances selection -- Make conditional distribution transferable; Step-3 Ensemble – Further reduce the transfer risk. Code and data available from the authors.