Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Research Motivation (1) How to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain How to deal with the situation that the source domains are geographically separated with some privacy concerns
Ping Luo CIKM 08 Research Motivation (1) Motivating Examples –Web pages Classification Label Web pages from multiple different universities to find the course main page by text classification Different university with different terms to describe the course metadata –Video concept detection Generalize to models to detect semantic concepts from multiple source video data Common Features 1. Multiple source domains with different data distributions 2. Separated source domains
Ping Luo CIKM 08 Challenges and Contributions New Challenges - How to make good use of the distribution mismatch among multiple source-domains to promote the prediction performance on target-domain - Extend the consensus regularization to implement in a distributed manner, which modestly preserves privacy Contributions - Propose a consensus regularization based algorithm for transfer learning from multiple source-domains - Perform in a distributed and modest privacy-preserving manner
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Consensus Measuring (1) Example: three-class classification problem, three classifiers predict an instance x Minimal entropy, Maximal Consensus Maximal entropy, Minimal Consensus
Ping Luo CIKM 08 Consensus Measuring (2) Example: t wo-classes classification problem, three classifiers predict an instance x Due to computing complexity in the entropy, for 2-entry probability distribution vectors, we can simplify the consensus measure as:
Ping Luo CIKM 08 Logistical Regression [Davie et al, 2000] Logistic regression can be an approach to learn classification model for discrete outputs. Given: Training data set X, where X is any vector containing discrete or continuous random variables Discrete outputs Y, where Y is discrete value Maximize the following formula to obtain Model w: Classification:
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Problem Formulation (1) Given: Let be m source-domains of labeled data, and the l-th source-domain is represented by The unlabeled target-domain is denoted by Assume that are of different but closely related distributions Find: Train m classifiers covers the knowledge from the i-th source domain achieve high degree of consensus on their prediction results on the target domain
Ping Luo CIKM 08 Problem Formulation (2) Formulation: Adapt supervised learning framework with consensus regularization Output m models, which maximize: where is the probability of the hypotheses given the observed data set. is the consensus degree of the prediction results of these classifiers on the target domain
Ping Luo CIKM 08 Why Consensus Regularization (1) In this study we focus on binary classification problems with the labels 1 and -1, and the number of classifiers m = 3. The non-trivial classifier can be restated as:
Ping Luo CIKM 08 Why Consensus Regularization (2) Thus, minimizing the disagreement means to decrease the classification error.
Ping Luo CIKM 08 Consensus Regularization by Logistic Regression (1) The proposed consensus regularization framework outputs m logistic models, which minimize: For binary classification problem, the entropy based consensus measure C e can be equivalent with C s. Thus, the objective function can be rewritten as
Ping Luo CIKM 08 The partial differential of objective is, where A function of a local classifier and the data from the corresponding source domain. Thus, this function can be computed locally on each source domain. A function of all the local classifiers and the data from the target domain. Thus, this function can be computed on the target domain with all the classifiers. Consensus Regularization by Logistic Regression (2)
Ping Luo CIKM 08 Distributed Implementation of Consensus Regularization (1) In the distributed setting, the data notes contain source-domain data are used as slave nodes, denoted by, and the node contains target-domain is used as master node, called.
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Experimental Preparation (1) Data Preparation –Three source domains (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ), one target domain (A 4, B 4 ) –96 ( ) problem instances can be constructed for the experimental evaluation Baseline Algorithms –Distributed approach: Distributed Ensemble (DE), Distributed Consensus Regularization (CCR 3 ) –Centralized approach: Centralized Training (CT), Centralized Consensus Regularization (CCR) (eg. CCR 1 means m = 1), CoCC [Dai et al., KDD’07], TSVM [Joachims, ICML’99], SGT [Joachims, ICML’03] A 1 sci.crypt A 2 sic.electronics A 3 sci.med A 4 sci.space B 1 talk.guns B 2 talk.mideast B 3 talk.misc B 4 talk.religion
Ping Luo CIKM 08 Experimental Parameters and Metrics Note that, when parameterθ= 0, DE is equivalent to DCR, and CT is equivalent to CCR 1. Parameter setting –The range ofθis [0,0.25] –The parameters of CoCC, TSVM, SGT are the same as [Dai ea al., KDD’07 ] Experimental metrics Accuracy Convergence
Ping Luo CIKM 08 Experimental Results (1) Comparison of CCR 3, CCR 1, DE and CT are the best performance whenθis sampled in [0, 0.25]
Ping Luo CIKM 08 Experimental Results (2) The average performance comparison of CCR 3, CCR 1, DE and CT on 96 problem instances Comparison of TSVM, SGT, CoCC and CCR 3
Ping Luo CIKM 08 Experimental Results on Algorithm Convergence The algorithm almost converges after 20 iterations, which indicates that our algorithm owns a good property of convergence.
Ping Luo CIKM 08 More experiments (1) Note that, the original source-domains have much large distribution mismatch, but after merging, the distribution mismatch is greatly alleviated.
Ping Luo CIKM 08 More experiments (2) The experiments on image classification are also very promising
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Related Work (1) Transfer Learning Solve the fundamental problem of different distributions between the training and testing data. –Assume there are some labeled data from the target domain data Estimation of mismatch degree by Liao et al.[ICML’05] Boosting based learning by Dai et al.[ICML’07] Building generative classifiers by Smith et al.[KDD’07] Constructing information priors from source-domain and then encoding it to the model built by Raina et al.[ICML’06] –The data in target-domain are totally unlabeled Co-clustering based Classification by Dai et al.[KDD’07] Transductive Bridged-Refinement by Xing et al.[PKDD’07]
Ping Luo CIKM 08 Related Work (2) Self-Taught Learning Use a large amount of unlabeled data to improve performance of given classification task –Apply sparse coding to construct higher-level features using the unlabeled data by Raina et al.[ICML’07] Semi-supervised Classification –Entropy minimization by Grandvalet et al.[NIPS’05], which is a special case of our regularization framework when m = 1 Multi-View Learning –Co-training by Blum et al.[COLT’98] –Boosting mixture models by Grandvalet et al.[ICANN’01] –Co-regularization by Sindhwani et al.[ICML’05], which focus on two views only and does not have the effect of entropy minimization
Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions
Ping Luo CIKM 08 Conclusions Propose a consensus regularization framework for transfer learning by learning from multiple source-domains Maximize the likelihood of each model on its corresponding source domain Maximize the consensus degree of all the trained models Extend the algorithm to a distributed implementation Only some statistical values are shared between the source- domains and the target-domain, so it can modestly alleviate the privacy concerns Experiments on real-world text data sets show the effectiveness of our consensus regularization approach
Ping Luo CIKM 08 Q. & A. Acknowledgement