Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department

Machine learning: study of algorithms that ① improve performance P ② on some task T ③ using experience E Well defined learning task:

Learning to recognize targets in images:

Learning to classify text documents:

Learning to build forecasting models:

Growth of Machine Learning Machine learning is preferred approach to ① Speech processing ② Computer vision ③ Medical diagnosis ④ Robot control ⑤ News articles processing ⑥ … This machine learning niche is growing ① Improved machine learning algorithms ② Lots of data available ③ Software too complex to code by hand ④ …

Learning Given Least squares methods Learning focuses on minimizing :approximation error H :estimation error

Main Challenge: 1.Transfer learning 2.High Dimensional (4000 features) 3.Overlapping (<80% features are the same) 4.Solution with performance bounds Transfer Learning with Applications to Text Classification

Standard Supervised Learning New York Times training (labeled)‏ test (unlabeled)‏ Classifier 85.5% New York Times

In Reality…… New York Times training (labeled)‏ test (unlabeled)‏ Classifier 64.1% New York Times Labeled data not available! Reuters

Domain Difference  Performance Drop traintest NYT New York Times Classifier 85.5% Reuters NYT ReutersNew York Times Classifier 64.1% ideal setting realistic setting

High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 Challenges: High dimensionality. more than training examples Euclidean distance becomes meaningless

Why Dimension Reduction? DMAX DMIN

Curse of Dimensionality Dimensions

Dimensions

High Dimensional Data Transfer High Dimensional Data: Text Categorization Image Classification The number of features in our experiments is more than 4000 Challenges: High dimensionality. more than training examples Euclidean distance becomes meaningless Feature sets completely overlapping? No. Some less than 80% features are the same. Marginally not so related? Harder to find transferable structures Proper similarity definition.

PAC (Probably Approximately Correct) learning requirement  Training and test distributions must be the same

Transfer between high dimensional overlapping distributions Overlapping Distributions Data from two domains may not come from the same part of space; potentially overlap at best.

Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not come from the same part of space; potentially overlap at best. B0.09?0.1+1 C0.01?0.3 xyzlabel

Transfer between high dimensional overlapping distributions Overlapping Distribution A?10.2+1 Data from two domains may not be lying on exactly the same space, but at most an overlapping one. B0.09?0.1+1 C0.01?0.3 xyzlabel

Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions

Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions A?10.2+1 B0.09?0.1+1 C0.01?0.3 f1f2f3label

Problems with overlapping distributions Overlapping features alone may not provide sufficient predictive power Transfer between high dimensional overlapping distributions A?10.2+1 B0.09?0.1+1 C0.01?0.3 f1f2f3label Hard to predict correctly

Overlapping Distributions Use the union of all features and fill in missing values with “zeros”? Transfer between high dimensional overlapping distributions

Overlapping Distributions Use the union of all features and fill in missing values with “zeros”? Transfer between high dimensional overlapping distributions A010.2+1 B0.0900.1+1 C0.0100.3 f1f2f3label

Overlapping Distribution Use the union of all features and fill in the missing values with “zeros”? Transfer between high dimensional overlapping distributions A010.2+1 B0.0900.1+1 C0.0100.3 f1f2f3label Does it helps?

Transfer between high dimensional overlapping distributions

D 2 { A, B} = 0.0181 > D 2 {A, C} = 0.0101

Transfer between high dimensional overlapping distributions D 2 { A, B} = 0.0181 > D 2 {A, C} = 0.0101 A is mis-classified as in the class of C, instead of B

Transfer between high dimensional overlapping distributions When one uses the union of overlapping and non-overlapping features and replaces missing values with “zero”, distance of two marginal distributions p(x) can become asymptotically very large as a function of non-overlapping features: becomes a dominant factor in similarity measure.

High dimensionality can underpin important features Transfer between high dimensional overlapping distributions

The “blues” are closer to the “greens” than to the “reds”

LatentMap: two step correction Missing value regression Bring marginal distributions closer Latent space dimensionality reduction Further bring marginal distributions closer Ignore non-important noisy and “error imported features” Identify transferable substructures across two domains.

Predict missing values (recall the previous example) Missing Value Regression

Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature

Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression

Predict missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125

Predcit missing values (recall the previous example) Missing Value Regression 1. Project to overlapped feature 2. Map from z to x Relationship found by regression D { img(A’), B} = 0.0109 < D {img(A’), C} = 0.0125 A is correctly classified as in the same class as B

Dimensionality Reduction

Missing Values

Dimensionality Reduction Overlapping Features Missing Values

Dimensionality Reduction Missing Values Filled Overlapping Features Missing Values

Dimensionality Reduction Missing Values Filled Overlapping Features Missing Values Word vector Matrix

Dimensionality Reduction Project the word vector matrix to the most important and inherent sub-space

Dimensionality Reduction Project the word vector matrix to the most important and inherent sub-space Low dimensional representation

Solution (high dimensionality) recall the previous example

Solution (high dimensionality) recall the previous example The blues are closer to the greens than to the reds

Solution (high dimensionality) recall the previous example

Solution (high dimensionality) The blues are closer to the reds than to the greens recall the previous example

Properties It can bring the marginal distributions of two domains closer. - Marginal distributions are brought closer in high- dimensional space( section 3.2 ) - Two marginal distributions are further minimized in low dimensional space. ( theorem 3.2 ) It brings two domains conditional distributions closer. - Nearby instances from two domains have similar conditional distributions ( section 3.3 ) It can reduce domain transfer risk - The risk of nearest neighbor classifier can be bounded in transfer learning settings. ( theorem 3.3 )

Experiment (I)‏ Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987)

Experiment (I)‏ Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987) First fill up the “GAP”, then use knn classifier to do classification 20 News groups comp comp.sys comp.graphics rec rec.sport rec.auto Out-Domain In-Domain

Experiment (I)‏ Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups (simulated auto racing, simulated aviation, real autos, and real aviation) Reuters 21758 Reuters news articles (1987) Baseline methods naïve Bayes, logistic regression, SVMs Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0

Experiment (I)‏ Data Sets 20 News Groups 20000 newsgroup articles SRAA (simulated real auto aviation) 73128 articles from 4 discussion groups Reuters 21758 Reuters news articles Baseline methods naïve Bayes, logistic regression, SVM Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0 Try to justify the two steps in our framework

Learning Tasks

Experiment (II)‏ 10 win 1 loss Overall performance

Experiment (III)‏ knnReg: Missing values filled but without SVD Compared with knnReg 8 win 3 loss pLatentMap: SVD but without filling missing values Compared with pLatentMap 8 win 3 loss

Conclusion Problem: High dimensional overlapping domain transfer -– text and image categorization Step 1: Missing values filling up --- Bring two domains’ marginal distributions closer Step 2: SVD dimension reduction --- Further b ring two marginal distributions closer (Theorem 3.2) --- Cluster points from two domains, making conditional distribution transferable. (Theorem 3.3

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Similar presentations

Presentation on theme: "Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Similar presentations

Presentation on theme: "Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department."— Presentation transcript:

Similar presentations

About project

Feedback