Bridged Refinement for Transfer Learning XING Dikan, DAI Wenyua, XUE Gui-Rong, YU Yong Shanghai Jiao Tong University

Slides:

Advertisements

Similar presentations

Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.

Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Albert Gatt Corpora and Statistical Methods Lecture 13.

Imbalanced data David Kauchak CS 451 – Fall 2013.

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

Boosting Approach to ML

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.

An Overview of Machine Learning

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

Visual Recognition Tutorial

On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.

COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Simple Neural Nets For Pattern Classification

Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

An Illustrative Example

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.

Radial Basis Function Networks

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen

Active Learning for Class Imbalance Problem

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Chapter 8: Confidence Intervals

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

Universit at Dortmund, LS VIII

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED Machine Learning and Failure Prediction in Hard Disk Drives Dr. Amit Chattopadhyay Director.

Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Bridging Domains Using World Wide Knowledge for Transfer Learning

Support Feature Machine for DNA microarray data

Semi-Supervised Clustering

Constrained Clustering -Semi Supervised Clustering-

Classification of unlabeled data:

Overview of Supervised Learning

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.

Text Categorization Berlin Chen 2003 Reference:

Perceptron Learning Rule

Perceptron Learning Rule

Perceptron Learning Rule

Continuous Curriculum Learning for RL

Presentation transcript:

Bridged Refinement for Transfer Learning XING Dikan, DAI Wenyua, XUE Gui-Rong, YU Yong Shanghai Jiao Tong University

Outline Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Conclusion

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Conclusion

Motivation spamming: Whether a given mail is a spam or not. – Training Data – Test Data Pop music basketball football classic music Mailbox:

Motivation New events always occur. news in 2006, commercial or politics news in 2007, commercial or politics Solution ? – Labeling new data again and again -- costly Therefore, … We try to utilize those old labeled data but take the shift of distribution into consideration. [Transfer useful information]

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Some other solutions

Problem We want to solve a classification problem. The set of target categories is fixed. Main difference from traditional classification: – The training data and test data are governed by two slightly different distributions. We do not need labeled data in the new test data distribution.

Illustrative Example sports music +: normal mail -: spam mail

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Some other solutions

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Some other solutions

Assumption P(c|d) doesn’t changes: P train (c|d) = P test (c|d) Since – The set of target categories is fixed. – Each target category is definite. P(c|d i ) ~ P(c|d j ), when d i ~ d j. ~ means “similar”, “close to each other” Consistency – Mutual Reinforcement Principle

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Some other solutions

Method: Refinement UConf c : scores of a base classifier, coarse-gained (Unrefined Confidence score of category c) M: adjacent matrix. M ij = 1 if d i is a neighbor of d j (then row L1 normalized). RConf c : Refined Confidence score of category c. Mutual reinforcement principle yields: RConf c = α M RConf c + (1-α) UConf c where α is a trade-off coefficient.

Method: Refinement Refinement can be regarded as reaching a consistency under the mixture distribution. Why not try to reach a consistency under the distribution of the test data?

Illustrative Example

Overview Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Some other solutions

Method: Bridged Refinement Bridged Refinement – Refine towards the mixture distribution – Refine towards the target distribution.

Outline Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Conclusion

Experiment Data set Base classifiers Different refinement styles Performance Parameter sensitivity

Experiment: Data set Source – SRAA Simulated autos (simauto) Simulated aviation (simaviation) Real autos (realauto) Real aviation (realaviation) – 20 Newsgroup Top level categories: rec, talk, sci, comp – Reuters Top level categories: org, places, people

Experiment: Data set Re-construction – 11 data sets PositiveNegative Training Data Test Data

Experiment: Base classifier Supervised – Generative model: Naïve Bayes classifier – Discriminative model: Support vector machines Semi-supervised: – Transductive support vector machines

Experiment: Refinement Style No refinement (base) One step – Refine directly on the test distribution (Test) – Refine on the mixture distribution only (Mix) Two steps – Bridged Refinement (Bridged)

Performance: On SVM Base Test Mix Bridged Test (2 nd ), Mix(3 rd ) v.s. Base (1 st ) Test (2 nd ) v.s. Bridged (1 st ) ： – Different start point

Performance: NB and TSVM

Parameter: K Whether di is regarded as a neighbor of dj is decided by checking whether di is in dj’s k-nearest neighbor set.

Parameter: α Error rate Vs. Different alpha

Convergence The refinement formula can be solved in a close manner or an iterative manner.

Outline Motivation Problem Solution – Assumption – Method – Improvement and Final Solution Experiment Conclusion

Task: Transfer useful information from training data to the same classification task of the test data, while training and test data are governed by two different distributions. Approach: Take the mixture distribution as a bridge and make a two-step refinement.

Thank you Please ask in slow and simple English

Backup 1: Tranductive The boundary after either step of refinement are actually never calculated explicitly. It is hidden in the refined labels of each data points. I draw it in the examples explicitly is for a clearer illustration only.

Backup 2: n-step One important problem left unsolved by us: – How to describe a distribution \lembda D_train + (1-\lembda) D_test ? – One solution is sampling in a generative manner. But this makes the result depends on each random number picked up in the generative process. It may cause the solution not very stable and hard to repeat.

Backup 3: Why mutual reinforcement principle ? If d_j has a high confidence to be in category c, then d_i, the neigbhor of d_j should also receive a high confidence score.