Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.

Slides:



Advertisements
Similar presentations
Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Machine learning continued Image source:
Boosting Approach to ML
Yue Han and Lei Yu Binghamton University.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Reduced Support Vector Machine
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Sparse vs. Ensemble Approaches to Supervised Learning
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.
Radial Basis Function Networks
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Semisupervised Learning A brief introduction. Semisupervised Learning Introduction Types of semisupervised learning Paper for review References.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Linear Models for Classification
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Ensemble Methods in Machine Learning
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Cross Domain Distribution Adaptation via Kernel Mapping
PEBL: Web Page Classification without Negative Examples
K Nearest Neighbor Classification
COSC 4335: Other Classification Techniques
Knowledge Transfer via Multiple Model Local Structure Mapping
Concave Minimization for Support Vector Machine Classifiers
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Presentation transcript:

Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao Ren † † Sun Yat-Sen University ‡ IBM T. J. Watson Research Center *Montclair State University 1.Go beyond transfer learning to sample selection bias and uncertainty mining 2.Unified framework 3.One single solution: supervised case

2 Standard Supervised Learning New York Times training (labeled) test (unlabeled) Classifier New York Times 85.5%

3 Sample Selection Bias New York Times training (labeled) test (unlabeled) Classifier New York Times 85.5% Have a different word vector distribution August: a lot about typhoon in Taiwan September: a lot about US Open 78.5%

Uncertainty Data Mining Training Data: –Both feature vectors and class labels contain noise (usually Gaussian) –Common for data collected from sensor network Testing data: –Feature vector contain noises

Summary Traditional supervised learning: –Training and testing data follow the identical distribution Transfer learning: –from different domains Sample selection bias: –from same domain but distribution is different such as, missing not at random Uncertain data mining: –data contains noise In other words: in all three cases, training and testing data are from different distributions. Traditionally, each problem is handled separately.

Main Challenge Could one solve these different but similar problems under a uniform framework? With the same solution? Universal Learning

is the subsets of X that are the support of some hypothesis in a fixed hypothesis space ([Blitzer et al, 2008] The distance between two distributions ([Blitzer et al, 2008]

How to Handle Universal Learning? Most traditional classifiers could not guarantee the performance when training and test distributions are different. Could we find one classifier under weeker assumption? Graph Transduction?

Advantage of Graph Transduction Weaker assumption that the decision boundary lies on the low density regions of the unlabeled data. Two-Gaussians vs. Two-arcs

Just Graph Transduction? Sample Selection: which samples? “Un-smooth label”(more examples in low density region) and “class imbalance” problems ([Wang et al, 2008]) may mislead the decision boundary to go through the high density regions. Bottom part closest red square More red square than blue square

Maximum Margin Graph Transduction In margin-terms, unlabeled data with low margin are likely misclassified! Bad sample Good sample

Main Flow Predict the labels of unlabeled data Lift the unlabeled data margin Maximize the unlabeled data margin

Properties Adaptive Graph Transduction can be bounded Training error in terms of approximating the ideal hypothesis Error of the ideal hypothesis Emprical distance between training and test distribution

Properties If one classifier has larger unlabeled data margin, it will make the training error smaller (recall last theorem) Average ensemble is likely to achieve larger margin

Experiment – Data Set Transfer Learning –Reuters: Reuters news articles –SyskillWebert: HTML source of web pages plus the ratings of a user on those web pages First fill up the “GAP”, then use knn classifier to do classification Reuters org org.subA org.subB place place.subA place.subB Target-Domain Source-Domain First fill up the “GAP”, then use knn classifier to do classification SyskillWebert Target-Domain Sheep Biomedical Bands- recording Source-Domain Goats

Experiment – Data Set Sample Selection Bias Correction –UCI data set: Ionosphere, Diabetes, Haberman, WDBC 1.Randomly select 50% of the features, and then sort the data set according to each selected features; 2.we attain top instances from every sorted list as training set; Feature 1 Feature 2 Uncertainty Mining –Kent Ridge Biomedical Repository: high dimensional, low sample size (HDLSS) Generate two different Gaussian Noises and add them into training and test set

Experiment -- Baseline methods Original graph transduction algorithm ([Zhu, 2005]) –Using the entire training data set –Variation: choosing a randomly selected sample whose size is equal to the one chosen by MarginGraph CDSC: transfer learning approach ([Ling et al, 2008]) –find a mapping space which optimizes over consistency measure between the out-domain supervision and in- domain intrinsic structure BRSD-BK/BRSD-DB: bias correction approach ([Ren et al, 2008]) –discover structure and re-balance using unlabeled data

Performance--Transfer Learning

Perform best on 5 of 6 data sets!

Performance--Sample Selection Bias Accuracy: Best on all 4 data sets! AUC: Best on 2 of 4 data sets.

Performance--Uncertainty Mining Accuracy: Best on all 4 data sets! AUC: Best on all 4 data sets!

Margin Analysis MarginBase is the base classifier of MarginGraph in each iteration. LowBase is a “minimal margin classifier” which selects samples for building a classifier with minimal unlabeled data margin. LowGraph is the averaging ensemble of LowBase.

Maximal margin is better than minimal margin Ensemble is better than any single classifiers

Conclusion Cover different formulations where the training and test set are drawn from related but different distributions. Flow –Step-1 Sample selection -- Select labeled data from different distribution which could maximize the unlabeled data margin –Step-2 Label Propagation -- Label the unlabeled data –Step-3 Ensemble -- Further lift the unlabeled data margin Code and data available from