HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks
Motivation Model Training Data ? Test Data Assumptions: 1.Training and Test are from same distribution 2.Training and Test are in same feature space Not True in many real-world applications
Examples: Web-document Classification Model ? Physics Machine Learning Life Science
Examples: Web-document Classification Model ? Physics Machine Learning Life Science Content Change ! Assumption violated! Learn a new model
Learn new Model : 1.Collect new Labeled Data 2.Build new model Learn new Model : 1.Collect new Labeled Data 2.Build new model Reuse & Adapt already learned model !
Examples: Image Classification Model One Features Task One Task One
Examples: Image Classification Cars Motorcycles Task Two Features Task One Features Task Two Reuse Model Two
Traditional Machine Learning vs. Transfer Source Task Knowledge Target Task Learning System Different Tasks Learning System Traditional Machine LearningTransfer Learning
Questions ?
Transfer Learning Definition Notation: Domain : Feature Space: Marginal Probability Distribution: with Given a domain then a task is : Label SpaceP(Y|X)
Transfer Learning Definition Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where or
Transfer Definition Therefore, if either : Domain Differences Task Differences
Examples: Cancer Data Age Smoking AgeHeight Smoking
Examples: Cancer Data Task Source: Classify into cancer or no cancer Task Target: Classify into cancer level one, cancer level two, cancer level three
Quiz When doesn’t transfer help ? (Hint: On what should you condition?)
Questions ?
Questions to answer when transferring What to Transfer ? How to Transfer ? When to Transfer ? Instances ? Model ? Features ? Map Model ? Unify Features ? Weight Instances ? In which Situations
Algorithms: TrAdaBoost Assumptions: Source and Target task have same feature space: Marginal distributions are different: Not all source data might be helpful !
Algorithms: TrAdaBoost (Quiz) What to Transfer ? How to Transfer ? Instances Weight Instances
Algorithm: TrAdaBoost Idea: Iteratively reweight source samples such that: reduce effect of “bad” source instances encourage effect of “good” source instances Requires: Source task labeled data set Very small Target task labeled data set Unlabeled Target data set Base Learner
Algorithm: TrAdaBoost Weights Initialization Hypothesis Learning and error calculation Weights Update
Questions ?
Algorithms: Self-Taught Learning Problem Targeted is : 1.Little labeled data 2.A lot of unlabeled data Build a model on the labeled data Natural scenes Car Motorcycle
Algorithms: Self-Taught Learning Assumptions: Source and Target task have different feature space: Marginal distributions are different: Label Space is different:
Algorithms: Self-Taught Learning (Quiz) What to Transfer ? How to Transfer ? 1. Discover Features 2. Unify Features Features
Algorithms: Self-Taught Learning Framework: Source Unlabeled data set: Target Labeled data set: Natural scenes Build classifier for cars and Motorbikes
Algorithms: Self-Taught Learning Step One: Discover high level features from Source data by Regularization TermRe-construction Error Constraint on the Bases
Algorithm: Self-Taught Learning Unlabeled Data Set High Level Features
Algorithm: Self-Taught Learning Step Two: Project target data onto the attained features by Informally, find the activations in the attained bases such that: 1.Re-construction is minimized 2.Attained vector is sparse
Algorithms: Self-Taught Learning High Level Features
Algorithms: Self-Taught Learning Step Three: Learn a Classifier with the new features Target Task Source Task Learn new features (Step 1) Project target data (Step 2) Learn Model (Step 3)
Conclusions Transfer learning is to re-use source knowledge to help a target learner Transfer learning is not generalization TrAdaBoot transfers instances Self-Taught learning transfer unlabeled features
hmm