Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

Fuzhen Zhuang SDM 2010 Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Fuzhen Zhuang, Ping Luo,
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Co-clustering based classification for Out-of-domain Documents
Patch to the Future: Unsupervised Visual Prediction
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Ensemble Learning: An Introduction
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Presented by Zeehasham Rasheed
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Distributed Representations of Sentences and Documents
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Active Learning for Class Imbalance Problem
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Benk Erika Kelemen Zsolt
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Distributed Classification in Peer-to-Peer Networks Ping Luo, Hui Xiong, Kevin Lü, Zhongzhi Shi Institute of Computing Technology, Chinese Academy of Sciences.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Transductive Regression Piloted by Inter-Manifold Relations.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Classification Ensemble Methods 1
NTU & MSRA Ming-Feng Tsai
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Semi-Supervised Learning Using Label Mean
Bridging Domains Using World Wide Knowledge for Transfer Learning
Semi-Supervised Clustering
Boosting and Additive Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Deep Learning Hierarchical Representations for Image Steganalysis
Discriminative Frequent Pattern Analysis for Effective Classification
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
Presentation transcript:

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Research Motivation (1) How to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain How to deal with the situation that the source domains are geographically separated with some privacy concerns

Ping Luo CIKM 08 Research Motivation (1) Motivating Examples –Web pages Classification Label Web pages from multiple different universities to find the course main page by text classification Different university with different terms to describe the course metadata –Video concept detection Generalize to models to detect semantic concepts from multiple source video data Common Features 1. Multiple source domains with different data distributions 2. Separated source domains

Ping Luo CIKM 08 Challenges and Contributions New Challenges - How to make good use of the distribution mismatch among multiple source-domains to promote the prediction performance on target-domain - Extend the consensus regularization to implement in a distributed manner, which modestly preserves privacy Contributions - Propose a consensus regularization based algorithm for transfer learning from multiple source-domains - Perform in a distributed and modest privacy-preserving manner

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Consensus Measuring (1) Example: three-class classification problem, three classifiers predict an instance x Minimal entropy, Maximal Consensus Maximal entropy, Minimal Consensus

Ping Luo CIKM 08 Consensus Measuring (2) Example: t wo-classes classification problem, three classifiers predict an instance x Due to computing complexity in the entropy, for 2-entry probability distribution vectors, we can simplify the consensus measure as:

Ping Luo CIKM 08 Logistical Regression [Davie et al, 2000] Logistic regression can be an approach to learn classification model for discrete outputs. Given:  Training data set X, where X is any vector containing discrete or continuous random variables  Discrete outputs Y, where Y is discrete value Maximize the following formula to obtain Model w: Classification:

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Problem Formulation (1) Given:  Let be m source-domains of labeled data, and the l-th source-domain is represented by  The unlabeled target-domain is denoted by  Assume that are of different but closely related distributions Find:  Train m classifiers  covers the knowledge from the i-th source domain  achieve high degree of consensus on their prediction results on the target domain

Ping Luo CIKM 08 Problem Formulation (2) Formulation:  Adapt supervised learning framework with consensus regularization  Output m models, which maximize: where is the probability of the hypotheses given the observed data set. is the consensus degree of the prediction results of these classifiers on the target domain

Ping Luo CIKM 08 Why Consensus Regularization (1) In this study we focus on binary classification problems with the labels 1 and -1, and the number of classifiers m = 3. The non-trivial classifier can be restated as:

Ping Luo CIKM 08 Why Consensus Regularization (2) Thus, minimizing the disagreement means to decrease the classification error.

Ping Luo CIKM 08 Consensus Regularization by Logistic Regression (1) The proposed consensus regularization framework outputs m logistic models, which minimize: For binary classification problem, the entropy based consensus measure C e can be equivalent with C s. Thus, the objective function can be rewritten as

Ping Luo CIKM 08 The partial differential of objective is, where A function of a local classifier and the data from the corresponding source domain. Thus, this function can be computed locally on each source domain. A function of all the local classifiers and the data from the target domain. Thus, this function can be computed on the target domain with all the classifiers. Consensus Regularization by Logistic Regression (2)

Ping Luo CIKM 08 Distributed Implementation of Consensus Regularization (1) In the distributed setting, the data notes contain source-domain data are used as slave nodes, denoted by, and the node contains target-domain is used as master node, called.

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Experimental Preparation (1) Data Preparation –Three source domains (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ), one target domain (A 4, B 4 ) –96 ( ) problem instances can be constructed for the experimental evaluation Baseline Algorithms –Distributed approach: Distributed Ensemble (DE), Distributed Consensus Regularization (CCR 3 ) –Centralized approach: Centralized Training (CT), Centralized Consensus Regularization (CCR) (eg. CCR 1 means m = 1), CoCC [Dai et al., KDD’07], TSVM [Joachims, ICML’99], SGT [Joachims, ICML’03] A 1 sci.crypt A 2 sic.electronics A 3 sci.med A 4 sci.space B 1 talk.guns B 2 talk.mideast B 3 talk.misc B 4 talk.religion

Ping Luo CIKM 08 Experimental Parameters and Metrics Note that, when parameterθ= 0, DE is equivalent to DCR, and CT is equivalent to CCR 1. Parameter setting –The range ofθis [0,0.25] –The parameters of CoCC, TSVM, SGT are the same as [Dai ea al., KDD’07 ] Experimental metrics  Accuracy  Convergence

Ping Luo CIKM 08 Experimental Results (1) Comparison of CCR 3, CCR 1, DE and CT are the best performance whenθis sampled in [0, 0.25]

Ping Luo CIKM 08 Experimental Results (2) The average performance comparison of CCR 3, CCR 1, DE and CT on 96 problem instances Comparison of TSVM, SGT, CoCC and CCR 3

Ping Luo CIKM 08 Experimental Results on Algorithm Convergence The algorithm almost converges after 20 iterations, which indicates that our algorithm owns a good property of convergence.

Ping Luo CIKM 08 More experiments (1) Note that, the original source-domains have much large distribution mismatch, but after merging, the distribution mismatch is greatly alleviated.

Ping Luo CIKM 08 More experiments (2) The experiments on image classification are also very promising

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Related Work (1) Transfer Learning Solve the fundamental problem of different distributions between the training and testing data. –Assume there are some labeled data from the target domain data  Estimation of mismatch degree by Liao et al.[ICML’05]  Boosting based learning by Dai et al.[ICML’07]  Building generative classifiers by Smith et al.[KDD’07]  Constructing information priors from source-domain and then encoding it to the model built by Raina et al.[ICML’06] –The data in target-domain are totally unlabeled  Co-clustering based Classification by Dai et al.[KDD’07]  Transductive Bridged-Refinement by Xing et al.[PKDD’07]

Ping Luo CIKM 08 Related Work (2) Self-Taught Learning Use a large amount of unlabeled data to improve performance of given classification task –Apply sparse coding to construct higher-level features using the unlabeled data by Raina et al.[ICML’07] Semi-supervised Classification –Entropy minimization by Grandvalet et al.[NIPS’05], which is a special case of our regularization framework when m = 1 Multi-View Learning –Co-training by Blum et al.[COLT’98] –Boosting mixture models by Grandvalet et al.[ICANN’01] –Co-regularization by Sindhwani et al.[ICML’05], which focus on two views only and does not have the effect of entropy minimization

Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

Ping Luo CIKM 08 Conclusions Propose a consensus regularization framework for transfer learning by learning from multiple source-domains  Maximize the likelihood of each model on its corresponding source domain  Maximize the consensus degree of all the trained models Extend the algorithm to a distributed implementation  Only some statistical values are shared between the source- domains and the target-domain, so it can modestly alleviate the privacy concerns Experiments on real-world text data sets show the effectiveness of our consensus regularization approach

Ping Luo CIKM 08 Q. & A. Acknowledgement