Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.

Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure ‡ † Sun Yat-Sen University ‡ IBM T. J. Watson Research Center *Montclair State University # Xavier University of Lousiana

Standard Supervised Learning New York Times training (labeled)‏ test (unlabeled)‏ Classifier 85.5% New York Times

In Reality …… New York Times training (labeled)‏ test (unlabeled)‏ Classifier 64.1% New York Times Labeled data not available! Reuters

Domain Difference->Performance Drop traintest NYT New York Times Classifier 85.5% Reuters NYT ReutersNew York Times Classifier 64.1% ideal setting realistic setting

Synthetic Example

Main Challenge  Motivation Both the marginal and conditional distributions between target-domain and source-domain could be significantly different in the original space!! How to get rid of these differences? Could we find other feature spaces? Could we remove those useless source-domain data?

Main Flow Kernel Discriminant Analysis

Kernel Mapping

Instances Selection

Ensemble

Properties Kernel mapping can reduce the difference of marginal distributions between source and target domains. [Theorem 2] Cluster-based instances selection can select those data from source domain with similar conditional probabilities. [Cluster Assumption] Error rate of the proposed approach can be bounded; [Theorem 3] Ensemble can further reduce the transfer risk. [Theorem 4] Both the target and source domain data follow appropriate Gaussian distributions in a Gaussian RKHS. It assumes that if two point x1 and x2 are close in the intrinsic geometry of q(x), then the conditionals r(y|x1) and r(y|x2) are similar.

Experiment – Data Set Reuters 21758 Reuters news articles 20 News Groups 20000 newsgroup articles SyskillWebert HTML source of web pages plus the ratings of a user on those web pages from 4 different subjects All of them are high dimension (>1000)! First fill up the “GAP”, then use knn classifier to do classification 20 News groups (Reuters) comp comp.sys comp.graphics rec rec.sport rec.auto Target-Domain Source-Domain First fill up the “GAP”, then use knn classifier to do classification SyskillWebert Target-Domain Sheep Biomedical Bands- recording Source-Domain Goats

Experiment -- Baseline methods Non-transfer single classifiers Transfer learning algorithm TrAdaBoost. Base classifiers: K-NN SVM NaiveBayes

Experiment -- Overall Performance Dataset 1~9 kMapEnsemble -> 24 win, 3 lose!

Conclusion Domain transfer when margin and conditional distributions are different between two domains. Flow Step-1 Kernel mapping -- Bring two domains ’ marginal distributions closer; Step-2 Cluster-based instances selection -- Make conditional distribution transferable; Step-3 Ensemble – Further reduce the transfer risk. Code and data available from the authors.

Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.

Similar presentations

Presentation on theme: "Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.

Similar presentations

Presentation on theme: "Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure."— Presentation transcript:

Similar presentations

About project

Feedback