Download presentation
Presentation is loading. Please wait.
Published byJocelyn McGee Modified over 9 years ago
1
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department
2
Machine learning: study of algorithms that ① improve performance P ② on some task T ③ using experience E Well defined learning task:
3
Learning to recognize targets in images:
4
Learning to classify text documents:
5
Learning to build forecasting models:
6
Growth of Machine Learning Machine learning is preferred approach to ① Speech processing ② Computer vision ③ Medical diagnosis ④ Robot control ⑤ News articles processing ⑥ … This machine learning niche is growing ① Improved machine learning algorithms ② Lots of data available ③ Software too complex to code by hand ④ …
7
Learning Given Least squares methods Learning focuses on minimizing :approximation error H :estimation error
8
Main Challenge: 1.Transfer learning 2.High Dimensional (4000 features) 3.Overlapping (<80% features are the same) 4.Solution with performance bounds Transfer Learning with Applications to Text Classification
9
Standard Supervised Learning New York Times training (labeled) test (unlabeled) Classifier 85.5% New York Times
10
In Reality…… New York Times training (labeled) test (unlabeled) Classifier 64.1% New York Times Labeled data not available! Reuters
11
Domain Difference Performance Drop traintest NYT New York Times Classifier 85.5% Reuters NYT ReutersNew York Times Classifier 64.1% ideal setting realistic setting
12
High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 Challenges: High dimensionality. more than training examples Euclidean distance becomes meaningless
13
Why Dimension Reduction? DMAX DMIN
14
Curse of Dimensionality Dimensions
15
Dimensions
16
High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 Challenges: High dimensionality. more than training examples Euclidean distance becomes meaningless Feature sets completely overlapping? No. Some less than 80% features are the same. Marginally not so related? Harder to find transferable structures Proper similarity definition.
17
PAC (Probably Approximately Correct) learning requirement Training and test distributions must be the same
18
Transfer between high dimensional overlapping distributions Overlapping Distributions Data from two domains may not come from the same part of space; potentially overlap at best.
19
Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not come from the same part of space; potentially overlap at best. B0.09?0.1+1 C0.01?0.3 xyzlabel
20
Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not come from the same part of space; potentially overlap at best. B0.09?0.1+1 C0.01?0.3 xyzlabel
21
Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not be lying on exactly the same space, but at most an overlapping one. B0.09?0.1+1 C0.01?0.3 xyzlabel
22
Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not be lying on exactly the same space, but at most an overlapping one. B0.09?0.1+1 C0.01?0.3 xyzlabel
23
Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions
24
Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions A?10.2+1 B0.09?0.1+1 C0.01?0.3 f1f2f3label
25
Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions A?10.2+1 B0.09?0.1+1 C0.01?0.3 f1f2f3label
26
Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions A?10.2+1 B0.09?0.1+1 C0.01?0.3 f1f2f3label Hard to predict correctly
27
Overlapping Distributions Use the union of all features and fill in missing values with “zeros”? Transfer between high dimensional overlapping distributions
28
Overlapping Distributions Use the union of all features and fill in missing values with “zeros”? Transfer between high dimensional overlapping distributions A010.2+1 B0.0900.1+1 C0.0100.3 f1f2f3label
29
Overlapping Distribution Use the union of all features and fill in the missing values with “zeros”? Transfer between high dimensional overlapping distributions A010.2+1 B0.0900.1+1 C0.0100.3 f1f2f3label Does it helps?
30
Transfer between high dimensional overlapping distributions
31
D 2 { A, B} = 0.0181 > D 2 {A, C} = 0.0101
32
Transfer between high dimensional overlapping distributions D 2 { A, B} = 0.0181 > D 2 {A, C} = 0.0101 A is mis-classified as in the class of C, instead of B
33
Transfer between high dimensional overlapping distributions When one uses the union of overlapping and non-overlapping features and replaces missing values with “zero”, distance of two marginal distributions p(x) can become asymptotically very large as a function of non-overlapping features: becomes a dominant factor in similarity measure.
34
High dimensionality can underpin important features Transfer between high dimensional overlapping distributions
36
The “blues” are closer to the “greens” than to the “reds”
37
LatentMap: two step correction Missing value regression Bring marginal distributions closer Latent space dimensionality reduction Further bring marginal distributions closer Ignore non-important noisy and “error imported features” Identify transferable substructures across two domains.
38
Predict missing values (recall the previous example) Missing Value Regression
39
Predict missing values (recall the previous example) Missing Value Regression
40
Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature
41
Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression
42
Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression
43
Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125
44
Predcit missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125 A is correctly classified as in the same class as B
45
Dimensionality Reduction
46
Missing Values
47
Dimensionality Reduction Overlapping Features Missing Values
48
Dimensionality Reduction Missing Values Filled Overlapping Features Missing Values
49
Dimensionality Reduction Missing Values Filled Overlapping Features Missing Values Word vector Matrix
50
Dimensionality Reduction Project the word vector matrix to the most important and inherent sub-space
51
Dimensionality Reduction Project the word vector matrix to the most important and inherent sub-space
52
Dimensionality Reduction Project the word vector matrix to the most important and inherent sub-space Low dimensional representation
53
Solution (high dimensionality) recall the previous example
54
Solution (high dimensionality) recall the previous example
55
Solution (high dimensionality) recall the previous example The blues are closer to the greens than to the reds
56
Solution (high dimensionality) recall the previous example
57
Solution (high dimensionality) The blues are closer to the reds than to the greens recall the previous example
58
Properties It can bring the marginal distributions of two domains closer. - Marginal distributions are brought closer in high- dimensional space( section 3.2 ) - Two marginal distributions are further minimized in low dimensional space. ( theorem 3.2 ) It brings two domains conditional distributions closer. - Nearby instances from two domains have similar conditional distributions ( section 3.3 ) It can reduce domain transfer risk - The risk of nearest neighbor classifier can be bounded in transfer learning settings. ( theorem 3.3 )
59
Experiment (I) Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987)
60
Experiment (I) Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987) First fill up the “GAP”, then use knn classifier to do classification 20 News groups comp comp.sys comp.graphics rec rec.sport rec.auto Out-Domain In-Domain
61
Experiment (I) Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987) Baseline methods naïve Bayes, logistic regression, SVMs Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0
62
Experiment (I) Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups Reuters 21758 Reuters news articles Baseline methods naïve Bayes, logistic regression, SVM Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0 Try to justify the two steps in our framework
63
Learning Tasks
64
Experiment (II) 10 win 1 loss Overall performance
65
Experiment (III) knnReg: Missing values filled but without SVD Compared with knnReg 8 win 3 loss pLatentMap: SVD but without filling missing values Compared with pLatentMap 8 win 3 loss
66
Conclusion Problem: High dimensional overlapping domain transfer -– text and image categorization Step 1: Missing values filling up --- Bring two domains’ marginal distributions closer Step 2: SVD dimension reduction --- Further b ring two marginal distributions closer (Theorem 3.2) --- Cluster points from two domains, making conditional distribution transferable. (Theorem 3.3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.