Modern Topics in Multivariate Methods for Data Analysis.

Slides:



Advertisements
Similar presentations
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Active Learning of Binary Classifiers
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Instance Based Learning
Active Learning with Support Vector Machines
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Introduction to domain adaptation
Incorporating Unlabeled Data in the Learning Process
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Machine Learning CS 165B Spring 2012
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Issues with Data Mining
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
Active Learning for Class Imbalance Problem
by B. Zadrozny and C. Elkan
Semisupervised Learning A brief introduction. Semisupervised Learning Introduction Types of semisupervised learning Paper for review References.
Machine Learning CSE 681 CH2 - Supervised Learning.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.
Universit at Dortmund, LS VIII
EVALUATING TRANSFER LEARNING APPROACHES FOR IMAGE INFORMATION MINING APPLICATIONS Surya Durbha*, Roger King, Nicolas Younan, *Indian Institute of Technology(IIT),
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.
Gaussian Processes Li An Li An
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Graph-based WSD の続き DMLA /7/10 小町守.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
New Trends In Machine Learning and Data Science Ricardo Vilalta Dept
Bridging Domains Using World Wide Knowledge for Transfer Learning
Data Mining, Neural Network and Genetic Programming
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Introductory Seminar on Research: Fall 2017
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
Reuben Feinman Research advised by Brenden Lake
Presentation transcript:

Modern Topics in Multivariate Methods for Data Analysis

Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

Semi-Supervised Learning This is an extension to supervised learning. We have two sets of data: Motivation: labeled data is sometimes hard to obtain. Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

An example from Mars Data Analysis Digital Elevation Map Geomorphic Map Martian landscape Manually drawn geomorphic map of this landscape Geomorphic map shows landforms chosen and defined by a domain expert.

Segmentation

Segmentation: Results. Displayed on an elevation background segments homogeneous in slope, curvature and flood.

Classification: Labeling. A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges Labeled segments.

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 How do we approach semi-supervised learning?

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with No Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

Graph-Based Models Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

How can we learn from unlabeled data at all? The answer lies in the set of assumptions about the unlabeled data distribution. If assumptions are right, an advantage can be obtained using unlabeled data But a decrease in performance is possible if assumptions are incorrect. Assumptions in Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

The goal is to transfer knowledge gathered from previous experience. Also called Inductive Transfer or Learning to Learn. Example: Invariant transformations across tasks. Transfer Learning

Motivation for transfer learning Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time. The difference is that now source and target domains can be completely different. Motivation Transfer Learning

Traditional Approach to Classification DB1DB2DBn Learning System

Transfer Learning DB1DB2 DB new Learning System Knowledge Source domain Target domain

Transfer Learning Scenarios: 1.Labeling in a new domain is costly. DB1 (labeled) Classification of Patients G1 DB2 (unlabeled) Classification of Patients G2

Transfer Learning Scenarios: 2. Data is outdated. Model created with one survey but a new survey is now available. Survey 1 Learning System Survey 2 ?

Input nodes Internal nodes Output nodes LeftStraightRight Functional Transfer: Multitask Learning

Train in Parallel with Combined Architecture Figure obtained from Brazdil, et. Al. Metalearning: Applications to Data Mining, Chapter 7, Springer, 2009.

Knowledge of Parameters Assume prior distribution of parameters Source domain Learn parameters and adjust prior distribution Target domain Learn parameters using the source prior distribution.

P(y|x) = P(x|y) P(y) / P(x) Parameter Similarity Task A  Parameter A Task B  Parameter B ~ A Assume hyper-distribution with low variance. Assume Parameter Similarity

Knowledge of Parameters Find coefficients w s using SVMs Find coefficients w T using SVMs initializing the search with w s

Feature Transfer Feature Transfer: Target domain Source domain Shared representation across tasks Minimize Loss-Function( y, f(x)) The minimization is done over multiple tasks (multiple regions on Mars).

Feature Transfer Identify common Features to all tasks

Instance Transfer Learning Instance Transfer: Learning System Target domain Source domain Filter samples Larger target dataset New program called TrAdaboost

Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

Active learning is part of the field of supervised learning. We have labeled and unlabeled data. The novel idea is that we can choose which examples to label during learning. It is also called “Query Learning”. Labeled Data Unlabeled Data  Select examples Active Learning

Types of Active Learning: 1.Query Synthesis. The learner can request an example from anywhere in the instance space. It is only appropriate with small finite domains. Some examples may have no meaning. Active Learning

Types of Active Learning: 2. Stream-Based Selective Sampling Instances are drawn from the input space according to a distribution, and the learner can decide to discard it or not. For example, one can only choose examples from regions of uncertainty. Active Learning

Types of Active Learning: 3. Pool-Based Sampling Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Active Learning

Sampling Based on Uncertainty Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, % accuracy 90% accuracy

Uncertainty: Sampling Based on Uncertainty

Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

Few labeled examples, labeling is expensive, many unlabeled examples  Semi-Supervised Similar classification tasks but there is indication that the distributions have changed  Transfer Learning Few training examples, labeling is expensive  Active Learning Summary