Modern Topics in Multivariate Methods for Data Analysis
Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis
Semi-Supervised Learning This is an extension to supervised learning. We have two sets of data: Motivation: labeled data is sometimes hard to obtain. Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
An example from Mars Data Analysis Digital Elevation Map Geomorphic Map Martian landscape Manually drawn geomorphic map of this landscape Geomorphic map shows landforms chosen and defined by a domain expert.
Segmentation
Segmentation: Results. Displayed on an elevation background segments homogeneous in slope, curvature and flood.
Classification: Labeling. A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges Labeled segments.
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 How do we approach semi-supervised learning?
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with No Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data
Graph-Based Models Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
How can we learn from unlabeled data at all? The answer lies in the set of assumptions about the unlabeled data distribution. If assumptions are right, an advantage can be obtained using unlabeled data But a decrease in performance is possible if assumptions are incorrect. Assumptions in Semi-Supervised Learning
Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis
The goal is to transfer knowledge gathered from previous experience. Also called Inductive Transfer or Learning to Learn. Example: Invariant transformations across tasks. Transfer Learning
Motivation for transfer learning Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time. The difference is that now source and target domains can be completely different. Motivation Transfer Learning
Traditional Approach to Classification DB1DB2DBn Learning System
Transfer Learning DB1DB2 DB new Learning System Knowledge Source domain Target domain
Transfer Learning Scenarios: 1.Labeling in a new domain is costly. DB1 (labeled) Classification of Patients G1 DB2 (unlabeled) Classification of Patients G2
Transfer Learning Scenarios: 2. Data is outdated. Model created with one survey but a new survey is now available. Survey 1 Learning System Survey 2 ?
Input nodes Internal nodes Output nodes LeftStraightRight Functional Transfer: Multitask Learning
Train in Parallel with Combined Architecture Figure obtained from Brazdil, et. Al. Metalearning: Applications to Data Mining, Chapter 7, Springer, 2009.
Knowledge of Parameters Assume prior distribution of parameters Source domain Learn parameters and adjust prior distribution Target domain Learn parameters using the source prior distribution.
P(y|x) = P(x|y) P(y) / P(x) Parameter Similarity Task A Parameter A Task B Parameter B ~ A Assume hyper-distribution with low variance. Assume Parameter Similarity
Knowledge of Parameters Find coefficients w s using SVMs Find coefficients w T using SVMs initializing the search with w s
Feature Transfer Feature Transfer: Target domain Source domain Shared representation across tasks Minimize Loss-Function( y, f(x)) The minimization is done over multiple tasks (multiple regions on Mars).
Feature Transfer Identify common Features to all tasks
Instance Transfer Learning Instance Transfer: Learning System Target domain Source domain Filter samples Larger target dataset New program called TrAdaboost
Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis
Active learning is part of the field of supervised learning. We have labeled and unlabeled data. The novel idea is that we can choose which examples to label during learning. It is also called “Query Learning”. Labeled Data Unlabeled Data Select examples Active Learning
Types of Active Learning: 1.Query Synthesis. The learner can request an example from anywhere in the instance space. It is only appropriate with small finite domains. Some examples may have no meaning. Active Learning
Types of Active Learning: 2. Stream-Based Selective Sampling Instances are drawn from the input space according to a distribution, and the learner can decide to discard it or not. For example, one can only choose examples from regions of uncertainty. Active Learning
Types of Active Learning: 3. Pool-Based Sampling Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Active Learning
Sampling Based on Uncertainty Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, % accuracy 90% accuracy
Uncertainty: Sampling Based on Uncertainty
Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis
Few labeled examples, labeling is expensive, many unlabeled examples Semi-Supervised Similar classification tasks but there is indication that the distributions have changed Transfer Learning Few training examples, labeling is expensive Active Learning Summary