Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Bellwork If you roll a die, what is the probability that you roll a 2 or an odd number? P(2 or odd) 2. Is this an example of mutually exclusive, overlapping,
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Decision Tree Evolution using Limited number of Labeled Data Items from Drifting Data Streams Wei Fan 1, Yi-an Huang 2, and Philip S. Yu 1 1 IBM T.J.Watson.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
Addition Facts
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Bio-mimetic Control Research Center, RIKEN Guided learning from images using an uncertain granular model and bio-mimicry of the human fovea Jonathan Rossiter.
ZMQS ZMQS
Photo Composition Study Guide Label each photo with the category that applies to that image.
Galit Haim, Ya'akov Gal, Sarit Kraus and Michele J. Gelfand A Cultural Sensitive Agent for Human-Computer Negotiation 1.
O X Click on Number next to person for a question.
© S Haughton more than 3?
Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
Twenty Questions Subject: Twenty Questions
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Machine Learning: Intro and Supervised Classification
Area under curves Consider the curve y = f(x) for x  [a, b] The actual area under the curve is units 2 The approximate area is the sum of areas.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Properties of Exponents
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 First EMRAS II Technical Meeting IAEA Headquarters, Vienna, 19–23 January 2009.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Week 1.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
FIND THE AREA ( ROUND TO THE NEAREST TENTHS) 2.7 in 15 in in.
O X Click on Number next to person for a question.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Active Learning for Class Imbalance Problem
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Universit at Dortmund, LS VIII
Experimental Evaluation of Learning Algorithms Part 1.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Modern Topics in Multivariate Methods for Data Analysis.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Cross Domain Distribution Adaptation via Kernel Mapping
Knowledge Transfer via Multiple Model Local Structure Mapping
Presentation transcript:

Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise ask and dont stretch it

2 Standard Supervised Learning New York Times training (labeled) test (unlabeled) Classifier New York Times 85.5%

3 In Reality…… New York Times training (labeled) test (unlabeled) New York Times Labeled data are insufficient! 47.3% How to improve the performance?

4 Solution I : Active Learning New York Times training (labeled) test (unlabeled) Classifier New York Times Label Domain Expert $ Labeling Cost 83.4%

5 Solution II : Transfer Learning Reuters Out-of-domain training (labeled) In-domain test (unlabeled) Transfer Classifier New York Times No guarantee transfer learning could help! Accuracy drops Significant Differences 82.6%?? 43.5%

6 Motivation Active Learning: –Labeling cost Transfer Learning: –Domain difference risk Both have disadvantages, what to choose?

7 Active Learner choose Proposed Solution (AcTraK) Reuters Transfer Classifier Domain Expert Label Unreliable Decision Function Reliable, label by the classifier Classification Result Test Labeled Training Classifier Unlabeled in-domain Training Data out- domain training (labeled)

8 Transfer Classifier MoMo M L+ M L- L+L+ L-L- + - X: In-domain unlabeled 1.Classify X by out-of-domain M o : P(L+|X, M o ) and P(L-|X, M o ). 2.Classify X by mapping classifiers M L+ and M L- : P(+|X, M L+ ) and P(+|X, M L- ). 3.Then the probability for X to be + is: T(X) = P(+|X) = P(L+|X, M o ) × P(+|X, M L+ ) + P(L-|X, M o ) ×P(+|X, M L- ) Out-of-domain dataset (labeled) In-domain labeled (few) P(L+|X, M o ) P(L-|X, M o ) P(+|X, M L+ ) P(+|X, M L- ) Train MoMo L+L+ L-L- In-domain labeled (very few) M L+ M L- Train L + = { (x,y=+/-)|M o (x)=L+ } the true in-domain label may be either- or + -/L--/L+ +/L-+/L+ In- domain Label Transfer M o mapping

9 Active Learner Our Solution (AcTraK) Reuters Transfer Classifier Domain Expert Label Unreliable Decision Function Reliable, label by the classifier Classification Result Test Labeled Training Classifier unlabeled Training Data outdomain training (labeled)

when prediction by transfer classifier is unreliable, ask domain experts 10 Decision Function Transfer Classifier In the following, ask the domain expert to label the instance, not the transfer classifier: a) Conflictb) Low in confidence c) Few labeled in-domain examples

11 Decision Function a) Conflict? b) Confidence?c) Size? Decision Function: Label by Transfer Classifier Label by Domain Expert R : random number [0,1] AcTraK asks the domain expert to label the instance with probability of T(x): prediction by the transfer classifier M L (x): prediction given by the in-domain classifier

12 It can reduce domain difference risk. - According to Theorem 2, the expected error is bounded.Theorem 2 It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.Theorem 3 Properties

13 Theorems expected error of the transfer classifier Maximum size

14 Data Sets –Synthetic data sets –Remote Sensing: data collected from regions with a specific ground surface condition data collected from a new region –Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup) Comparable Models –Inductive Learning model: AdaBoost, SVM –Transfer Learning model: TrAdaBoost (ICML07) –Active Learning model: ERS (ICML01) Experiments setup

15 Experiments on Synthetic Datasets In-domain: 2 labeled training & testing 4 out domain labeled training

16 Experiments on Real World Dataset Evaluation metric: Compared with transfer learning on accuracy. Compared with active learning on IEA (Integral Evaluation on Accuracy).

17 1. Comparison with Transfer Learner 2. Comparison with Active Learner 20 Newsgroup comparison with active learner ERS

18 Actively Transfer Domain Knowledge –Reduce domain difference risk: transfer useful knowledge (Theorem 2) –Reduce labeling cost: query domain experts only when necessary (Theorem 3) Conclusions