Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.

Slides:



Advertisements
Similar presentations
Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Yue Han and Lei Yu Binghamton University.
Model generalization Test error Bias, variance and complexity
Minimum Redundancy and Maximum Relevance Feature Selection
Model Assessment, Selection and Averaging
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
What is Statistical Modeling
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Transfer Learning for WiFi-based Indoor Localization
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Experimental Evaluation
Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Today Evaluation Measures Accuracy Significance Testing
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Experimental Evaluation of Learning Algorithms Part 1.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Source-Selection-Free Transfer Learning
1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation : Yao-Min Huang Date : 09/15/2004.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CpSc 881: Machine Learning Evaluating Hypotheses.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.
NTU & MSRA Ming-Feng Tsai
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Data Science Credibility: Evaluating What’s Been Learned
Bridging Domains Using World Wide Knowledge for Transfer Learning
Cross Domain Distribution Adaptation via Kernel Mapping
Cold-Start Heterogeneous-Device Wireless Localization
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
On-going research on Object Detection *Some modification after seminar
iSRD Spam Review Detection with Imbalanced Data Distributions
Knowledge Transfer via Multiple Model Local Structure Mapping
Presentation transcript:

Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao Ren † ‡ IBM T. J. Watson Research Center ¶ Hong Kong University of Science and Technology † Sun Yat-Sen University

Transfer Learning: What is it Applications Definition “ source-domains ” to improve “ target-domain ” : short of labeled information. supervised unsupervised semi-supervised transfer learning 1. WiFi-based localization tracking [Pan et al'08] 2. Collaborative Filtering [Pan et al'10] 3. Activity Recognition [Zheng et al'09] 4. Text Classification [Dai et al'07] 5. Sentiment Classification [Blitzer et al ‘ 07] 6. Image Categorization [Shi et al ’ 10]

Application Indoor WiFi localization tracking AP is the access point of device. (Lx, Ly) is the coordinate of location. Transfer

Application Collaborative Filtering

Transfer Learning: How it works Data Selection Model Selection

Re-cast: Model and Data Selection (1) How to select the right transfer learning algorithms? (2) How to tune the optimal parameters? (3) How to choose the most helpful source-domain from a large pool of datasets?

Model & Data Selection Traditional Methods 1. Analytical techniques: AIC, BIC, SRM, etc. 2. k-fold cross validation

Model & Data Selection Issuses The estimation is not consistent. A model approximating is not necessarily close to The number of labeled data in target domain is limited and thus the directly estimation of is not reliable. Ideal Hypothesis

Model & Data Selection Model Selection Example Source Target If we choose the wrong model....

Model & Data Selection Data Selection Example Target If we choose the wrong source-domain....

Transfer Cross-Validation (TrCV) New criterion for transfer learning Hard to calculate in practice 1. The density ration between two domains 2. The difference between the conditional distribution estimated by model and the true conditional distribution. Reverse Validation Practical method: Transfer Cross-Validation (TrCV) How to calculate this difference with limited labeled data? Density Ratio Weighting

The selected model is an unbiased estimator to the ideal model is the expected loss to approximate is the model complexity We adopt an existing method KMM (Huang et al’07) for density ratio weighting Reverse Validation to estimate P t (y|x) – P(y|x,f) (next slide) Important property to choose the right model even when P(x) and P(y|x) are different

Reverse Validation The source-domain data in i-th fold The remaining data The predicted label of in i-th fold The true label of in i-th fold The unlabeled and labeled target-domain data

Properties The model selected by the proposed method has a generalization bound over target-domain data. [Theorem 1] The value of reverse validation is related to the difference between true conditional probability and model approximation. The confidence of TrCV has a bound. the accuracy estimated by TrCV the true accuracy of quantile point of the standard normal distribution The selected model is an unbiased estimator to the ideal one. [Lemma 1]

Experiment Data Set Wine Quality: two subsets related to red and white variants of the Portuguese “Vinho Verde” wine. For algorithm and parameters selection

Experiment Data Set Reuters-21578:the primary benchmark of text categorization formed by different news with a hierarchial structure. For algorithm and parameters selection

Experiment Data Set SyskillWebert: the standard dataset used to test web page ratings, generated by the HTML source of web pages plus the user rating. we randomly reserve “Bands- recording artists” as source-domain and the three others as target-domain data. For algorithm and parameters selection

Experiment Data Set 20-Newsgroup: primary benchmark of text categorization similar to Reuters For source-domain selection

Experiment Baseline methods SCV: standard k-fold CV on source-domain TCV: standard k-fold CV on labeled data from target- domain STV: building a model on the source-domain data and validating it on labeled target-domain data WCV: using density ratio weighting to reduce the difference of marginal distribution between two domains, but ignoring the difference in conditional probability.

Experiment Other settings Algorithms: –Naive Bayes(NB), SVM, C4.5, K-NN and NNge(Ng) –TrAdaBoost(TA): instances weighting [Dai et al.'07] –LatentMap(LM): feature transform [Xie et al.'09] –LWE : model weighting ensemble [Gao et al.'08] Evaluation: if one criterion can select the better model in the comparison, it gains a higher measure value. The accuracy and value of criteria (e.g TrCV, SCV, etc) The number of comparisions between models

Results Algorithm Selection 6 win and 2 lose!

Results Parameter Tuning 13 win and 3 lose!

Results Source-domain Selection No lose!

Results Parameter Analysis TrCV achieves the highest correlation value under different number of folds from 5 to 30 with step size 5.

Results Parameter Analysis When only a few labeled data(< 0.4 × |T|) can be obtained in the target-domain, the performance of TrCV is much better than both SVT and TCV.

Conclusion Model and data selection when margin and conditional distributions are different between two domains. Key points –Point-1 Density weighting to reduce the difference between marginal distributions of two domains; –Point-2 Reverse validation to measure how well a model approximates the true conditional distribution of target-domain. Code and data available from the authors –