Transfer Learning Task
Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset : B Year: 2006 Features: 96 Model ‘M’ Training Testing 60.9% ??
Transfer Learning Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.
Traditional Machine Learning vs. Transfer Source Task Knowledge Target Task Learning System Different Tasks Learning System Traditional Machine LearningTransfer Learning
Transfer Learning Definition Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where or
Transfer Definition Therefore, if either : Domain Differences Task Differences
Examples: Cancer Data Age Smoking AgeHeight Smoking
Examples: Cancer Data Task Source: Classify into cancer or no cancer Task Target: Classify into cancer level one, cancer level two, cancer level three
Settings of Transfer Learning Transfer learning settings Labelled data in a source domain Labelled data in a target domain Tasks Inductive Transfer Learning × √ Classification Regression … √ √ Transductive Transfer Learning √× Classification Regression … Unsupervised Transfer Learning ×× Clustering …
Questions to answer when transferring What to Transfer ? How to Transfer ? When to Transfer ? Instances ? Model ? Features ? Map Model ? Unify Features ? Weight Instances ? In which Situations
What to Transfer ?? Transfer learning approachesDescription Instance-transferTo re-weight some labeled data in a source domain for use in the target domain Feature-representation-transferFind a “good” feature representation that reduces difference between a source and a target domain or minimizes error of models Model-transferDiscover shared parameters or priors of models between a source domain and a target domain Relational-knowledge-transferBuild mapping of relational knowledge between a source domain and a target domain.
Inductive Transfer Learning (Instance-transfer) Assumption: the source domain and target domain data use exactly the same features and labels. Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting. Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.
Instance-transfer Assumptions: Source and Target task have same feature space: Marginal distributions are different: Not all source data might be helpful !
Algorithms: What to Transfer ? How to Transfer ? Instances Weight Instances
Algorithm: TrAdaBoost Idea: Iteratively reweight source samples such that: reduce effect of “bad” source instances encourage effect of “good” source instances Requires: Source task labeled data set Very small Target task labeled data set Unlabeled Target data set Base Learner
Our Case D1D1 M D2D2 % D 2 Transfer Learning
Self taught clustering Unsupervised transfer learning Co-clustering, no labelled data Feature based transfer learning Features are not the same Tasks may not be the same First applied on image clustering Key idea: found high level shared features, new feature representation
Self Taught Learning
Self taught learning
Latent Dirichlet Allocation (LDA) LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. Typically used for topic modeling Forums, twitter messages, text corpus Do not consider word order Can be viewed as a dimension reduction technique.