Transfer Learning Part I: Overview Sinno Jialin Pan Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Component Analysis (Review)
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Machine learning continued Image source:
An Overview of Machine Learning
1 Introduction to Transfer Learning for 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Transfer Learning for WiFi-based Indoor Localization
Presented by Zeehasham Rasheed
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
For Better Accuracy Eick: Ensemble Learning
Introduction to domain adaptation
Training and future (test) data follow the same distribution, and are in same feature space.
Selective Transfer Machine for Personalized Facial Action Unit Detection Wen-Sheng Chu, Fernando De la Torre and Jeffery F. Cohn Robotics Institute, Carnegie.
Adapting Natural Language Processing Systems to New Domains John Blitzer Joint with: Shai Ben-David, Koby Crammer, Mark Dredze, Fernando Pereira, Alex.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
1 Introduction to Transfer Learning (Part 2) For 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.
Regression Using Boosting Vishakh Advanced Machine Learning Fall 2006.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Modern Topics in Multivariate Methods for Data Analysis.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Transductive Regression Piloted by Inter-Manifold Relations.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
H. Lexie Yang1, Dr. Melba M. Crawford2
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Image Classification for Automatic Annotation
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Classification Ensemble Methods 1
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Transfer and Multitask Learning Steve Clanton. Multiple Tasks and Generalization “The ability of a system to recognize and apply knowledge and skills.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Outlier Processing via L1-Principal Subspaces
Introductory Seminar on Research: Fall 2017
Machine Learning Basics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Learning with information of features
Probabilistic Models with Latent Variables
INTRODUCTION TO Machine Learning
Feature space tansformation methods
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Word embeddings (continued)
Presentation transcript:

Transfer Learning Part I: Overview Sinno Jialin Pan Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore

Transfer of Learning Transfer of Learning A psychological point of view The study of dependency of human conduct, learning or performance on prior experience. [Thorndike and Woodworth, 1901] explored how individuals would transfer in one context to another context that share similar characteristics.  C++  Java  Maths/Physics  Computer Science/Economics

Transfer Learning Transfer Learning In the machine learning community The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains, which share some commonality. Given a target task, how to identify the commonality between the task and previous (source) tasks, and transfer knowledge from the previous tasks to the target one?

Fields of Transfer Learning Transfer learning for reinforcement learning. [Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009] Transfer learning for classification and regression problems. [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009] Focus!

Motivating Example I: Motivating Example I: Indoor WiFi localization -30dBm-70dBm-40dBm

Indoor WiFi Localization (cont.) Training Localization model Test Time Period A Localization model Test Time Period B ~1.5 meters ~6 meters Time Period A Average Error Distance S=(-37dbm,.., -77dbm), L=(1, 3) S=(-41dbm,.., -83dbm), L=(1, 4) … S=(-49dbm,.., -34dbm), L=(9, 10) S=(-61dbm,.., -28dbm), L=(15,22) S=(-37dbm,.., -77dbm), L=(1, 3) S=(-41dbm,.., -83dbm), L=(1, 4) … S=(-49dbm,.., -34dbm), L=(9, 10) S=(-61dbm,.., -28dbm), L=(15,22) S=(-37dbm,.., -77dbm) S=(-41dbm,.., -83dbm) … S=(-49dbm,.., -34dbm) S=(-61dbm,.., -28dbm) S=(-37dbm,.., -77dbm) S=(-41dbm,.., -83dbm) … S=(-49dbm,.., -34dbm) S=(-61dbm,.., -28dbm) Drop!

Indoor WiFi Localization (cont.) Training Test Device A Test Device B ~ 1.5 meters ~10 meters Device A S=(-37dbm,.., -77dbm), L=(1, 3) S=(-41dbm,.., -83dbm), L=(1, 4) … S=(-49dbm,.., -34dbm), L=(9, 10) S=(-61dbm,.., -28dbm), L=(15,22) S=(-37dbm,.., -77dbm) S=(-41dbm,.., -83dbm) … S=(-49dbm,.., -34dbm) S=(-61dbm,.., -28dbm) S=(-37dbm,.., -77dbm) S=(-41dbm,.., -83dbm) … S=(-49dbm,.., -34dbm) S=(-61dbm,.., -28dbm) S=(-33dbm,.., -82dbm), L=(1, 3) … S=(-57dbm,.., -63dbm), L=(10, 23 ) Localization model Drop! Average Error Distance

Difference between Tasks/Domains 8 Time Period ATime Period B Device B Device A

Motivating Example II: Motivating Example II: Sentiment Classification

Sentiment Classification (cont.) Training Test Electronics Test ~ 84.6% ~72.65% Sentiment Classifier Drop! Electronics Classification Accuracy ElectronicsDVD

Difference between Tasks/Domains 11 ElectronicsVideo Games (1) Compact; easy to operate; very good picture quality; looks sharp! (2) A very good game! It is action packed and full of excitement. I am very much hooked on this game. (3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp. (4) Very realistic shooting action and good plots. We played this and were hooked. (5) It is also quite blurry in very dark settings. I will never buy HP again. (6) The game is so boring. I am extremely unhappy and will probably never buy UbiSoft again.

A Major Assumption Training and future (test) data come from a same task and a same domain.  Represented in same feature and label spaces.  Follow a same distribution.

Training The Goal of Transfer Learning Training Classification or Regression Models DVD Source Tasks/Domains Device B Time Period B A few labeled training data Electronics Time Period A Device A Electronics Time Period A Device A Target Task/Domain

Notations Domain:Task:

Transfer Learning Heterogeneous Transfer Learning Transfer learning settings Inductive Transfer Learning Feature space Heterogeneous Tasks Single-Task Transfer Learning Identical Homogeneous Different Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task

Inductive Transfer Learning Tasks Single-Task Transfer Learning IdenticalDifferent Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task Assumption

Single-Task Transfer Learning Case 1 Case 2 Domain Adaption in NLP Sample Selection Bias / Covariate Shift Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches

Single-Task Transfer Learning Case 1 Sample Selection Bias / Covariate Shift Instance-based Transfer Learning Approaches Problem Setting Assumption

Single-Task Transfer Learning Single-Task Transfer Learning Instance-based Approaches Recall, given a target task,

Single-Task Transfer Learning Single-Task Transfer Learning Instance-based Approaches (cont.)

Assumption:

Single-Task Transfer Learning Single-Task Transfer Learning Instance-based Approaches (cont.) Sample Selection Bias / Covariate Shift [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]

Single-Task Transfer Learning Single-Task Transfer Learning Feature-based Approaches Case 2 Problem Setting Explicit/Implicit Assumption

Single-Task Transfer Learning Single-Task Transfer Learning Feature-based Approaches (cont.) How to learn ?  Solution 1: Encode domain knowledge to learn the transformation.  Solution 2: Learn the transformation by designing objective functions to minimize difference directly.

25 Single-Task Transfer Learning Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation ElectronicsVideo Games (1) Compact; easy to operate; very good picture quality; looks sharp! (2) A very good game! It is action packed and full of excitement. I am very much hooked on this game. (3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp. (4) Very realistic shooting action and good plots. We played this and were hooked. (5) It is also quite blurry in very dark settings. I will never_buy HP again. (6) The game is so boring. I am extremely unhappy and will probably never_buy UbiSoft again.

Common features 26 Single-Task Transfer Learning Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation (cont.) boring realistic hooked blurry sharp compactnever_buy good exciting Video game domain specific features Electronics Domain specific features blurry never_buy boring exciting sharp hooked compact realisticgood

Single-Task Transfer Learning Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation (cont.)  How to select good pivot features is an open problem.  Mutual Information on source domain labeled data  Term frequency on both source and target domain data.  How to estimate correlations between pivot and domain specific features?  Structural Correspondence Learning (SCL) [Biltzer etal. 2006]  Spectral Feature Alignment (SFA) [Pan etal. 2010] 27

Single-Task Transfer Learning Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge Target Source Latent factors Temperature Signal properties Building structure Power of APs

Single-Task Transfer Learning Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge Target Source Latent factors Temperature Signal properties Building structure Power of APs Cause the data distributions between domains different

Single-Task Transfer Learning Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge (cont.) Target Source Signal properties Principal components Noisy component Building structure

Single-Task Transfer Learning Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge (cont.) Learning by only minimizing distance between distributions may map the data to noisy factors. 31

Single-Task Transfer Learning Single-Task Transfer Learning Transfer Component Analysis [Pan etal., 2009] Main idea: the learned should map the source and target domain data to the latent space spanned by the factors which can reduce domain difference and preserve original data structure. 32 High level optimization problem

Single-Task Transfer Learning Single-Task Transfer Learning Maximum Mean Discrepancy (MMD) [Alex Smola, Arthur Gretton and Kenji Kukumizu, ICML-08 tutorial]

Single-Task Transfer Learning Single-Task Transfer Learning Transfer Component Analysis (cont.)

To maximize the data variance To minimize the distance between domains To preserve the local geometric structure  It is a SDP problem, expensive!  It is transductive, cannot generalize on unseen instances!  PCA is post-processed on the learned kernel matrix, which may potentially discard useful information. [Pan etal., 2008]

Single-Task Transfer Learning Single-Task Transfer Learning Transfer Component Analysis (cont.) Empirical kernel map Resultant parametric kernel Out-of-sample kernel evaluation

Single-Task Transfer Learning Single-Task Transfer Learning Transfer Component Analysis (cont.) To minimize the distance between domains Regularization on W To maximize the data variance

Inductive Transfer Learning Tasks Single-Task Transfer Learning IdenticalDifferent Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task Inductive Transfer Learning Multi-Task Learning Tasks are learned simultaneously Focus on optimizing a target task Assumption Problem Setting

Inductive Transfer Learning Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Modified from Multi-Task Learning Methods Target-Task-Driven Transfer Learning Methods Self-Taught Learning Methods

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods Feature-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Modified from Multi-Task Learning Methods Setting

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods Recall that for each task (source or target) Tasks are learned independently Motivation of Multi-Task Learning:  Can the related tasks be learned jointly?  Which kind of commonality can be used across tasks?

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches Assumption: If tasks are related, they should share similar parameter vectors. For example [Evgeniou and Pontil, 2004] Common part Specific part for individual task

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches (cont.)

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches (summary) A general framework: [Zhang and Yeung, 2010][Saha etal, 2010]

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches Assumption: If tasks are related, they should share some good common features. Goal: Learn a low-dimensional representation shared across related tasks.

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.) [Argyriou etal., 2007]

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.) Illustration

Inductive Transfer Learning Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.)

[Ando and Zhang, 2005] [Ji etal, 2008]

Inductive Transfer Learning Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Target-Task-Driven Transfer Learning Methods Self-Taught Learning Methods

Inductive Transfer Learning Inductive Transfer Learning Self-taught Learning Methods -- Feature-based approaches Motivation: There exist some higher-level features that can help the target learning task even only a few labeled data are given. Steps: 1, Learn higher-level features from a lot of unlabeled data from the source tasks. 2, Use the learned higher-level features to represent the data of the target task. 3, Training models from the new representations of the target task with corresponding labels.

Inductive Transfer Learning Inductive Transfer Learning Self-taught Learning Methods -- Feature-based approaches (cont.) Higher-level feature construction Solution 1: Sparse Coding [Raina etal., 2007] Solution 2: Deep learning [Glorot etal., 2011]

Inductive Transfer Learning Inductive Transfer Learning Target-Task-Driven Methods -- Instance-based approaches Intuition Assumption Part of the labeled data from the source domain can be reused after re-weighting Main Idea TrAdaBoost [Dai etal 2007] For each boosting iteration,  Use the same strategy as AdaBoost to update the weights of target domain data.  Propose a new mechanism to decrease the weights of misclassified source domain data.

Summary Inductive Transfer Learning Tasks Single-Task Transfer Learning IdenticalDifferent Feature-based Transfer Learning Approaches Instance-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Instance-based Transfer Learning Approaches

Some Research Issues  How to avoid negative transfer? Given a target domain/task, how to find source domains/tasks to ensure positive transfer.  Transfer learning meets active learning  Given a specific application, which kind of transfer learning methods should be used.

Reference  [Thorndike and Woodworth, The Influence of Improvement in one mental function upon the efficiency of the other functions, 1901]  [Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009]  [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009]  [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]  [Biltzer etal.. Domain Adaptation with Structural Correspondence Learning, EMNLP 2006]  [Pan etal., Cross-Domain Sentiment Classification via Spectral Feature Alignment, WWW 2010]  [Pan etal., Transfer Learning via Dimensionality Reduction, AAAI 2008]

Reference (cont.)  [Pan etal., Domain Adaptation via Transfer Component Analysis, IJCAI 2009]  [Evgeniou and Pontil, Regularized Multi-Task Learning, KDD 2004]  [Zhang and Yeung, A Convex Formulation for Learning Task Relationships in Multi-Task Learning, UAI 2010]  [Saha etal, Learning Multiple Tasks using Manifold Regularization, NIPS 2010]  [Argyriou etal., Multi-Task Feature Learning, NIPS 2007]  [Ando and Zhang, A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, JMLR 2005]  [Ji etal, Extracting Shared Subspace for Multi-label Classification, KDD 2008]

Reference (cont.)  [Raina etal., Self-taught Learning: Transfer Learning from Unlabeled Data, ICML 2007]  [Dai etal., Boosting for Transfer Learning, ICML 2007]  [Glorot etal., Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, ICML 2011]

Thank You