Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Slides:

Advertisements

Similar presentations

Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie Wei Fan Jing Peng* Olivier Verscheure Jiangtao Ren Sun Yat-Sen.

Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.

Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren Xiaoxiao Shi Wei Fan Philip S. Yu.

Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing Jiangtao Ren 1 Xiaoxiao Shi 1 Wei Fan 2 Philip S. Yu 2 1.

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.

Machine learning continued Image source:

Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.

Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Lecture 21: Spectral Clustering

Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis.

Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.

On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Clustering Evaluation April 29, Today Cluster Evaluation – Internal We don’t know anything about the desired labels – External We have some information.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,

Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.

Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Source-Selection-Free Transfer Learning

Bridged Refinement for Transfer Learning XING Dikan, DAI Wenyua, XUE Gui-Rong, YU Yong Shanghai Jiao Tong University

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.

Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.

Efficient Semi-supervised Spectral Co-clustering with Constraints

Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.

Today Cluster Evaluation Internal External

Semi-Supervised Clustering

Cross Domain Distribution Adaptation via Kernel Mapping

Jianping Fan Dept of CS UNC-Charlotte

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Supervised vs. unsupervised Learning

Spectral Clustering Eric Xing Lecture 8, August 13, 2010

Knowledge Transfer via Multiple Model Local Structure Mapping

Three steps are separately conducted

Low-Rank Sparse Feature Selection for Patient Similarity Learning

Presentation transcript:

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center 3 Hong Kong University of Science and Technology 4 Sun Yat-sen University 1.Unsupervised 2.Can use data with different classes to help. How so?

22 What is Transfer Learning? New York Times training (labeled) test (unlabeled) Classifier New York Times 85.5% Standard Supervised Learning

33 New York Times training (labeled) test (unlabeled) New York Times Labeled data are insufficient! 47.3% How to improve the performance? In Reality… What is Transfer Learning?

44 Reuters Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier New York Times 82.6% Not necessary from the same domain and do not follow the same distribution

5 Reuters Source domain training (labeled) Target domain test (unlabeled) Transfer Classifier New York Times 82.6% Since they are from different domains, they may have different class labels! Labels: Markets Politics Entertainment Blogs …… Labels: World U. S. Fashion Style Travel …… How to transfer when class labels are different? in number and meaning Transfer across Different Class Labels

6 Two Main Categories of Transfer Learning Unsupervised Transfer Learning –Do not have any labeled data from the target domain. –Use source domain to help learning. –Question: is it better than clustering? Supervised Transfer Learning –Have limited number of labeled examples from target domain –Is it better than not using any source data example?

7 Two sub-problems: –(1) What and how to transfer, since we can not explicitly use P(x|y) or P(y|x) to build the similarity among tasks (class labels ‘y’ have different meanings)? –(2) How to avoid negative transfer since the tasks may be from very different domains? Negative Transfer: when the tasks are too different, transfer learning may hurt learning accuracy. Transfer across Different Class Labels

8 The proposed solution (1) What and How to transfer? –Transfer the eigensapce Eigenspace: space expended by a set of eigen vectors. Dataset exhibits complex cluster shapes  K-means performs very poorly in this space due bias toward dense spherical clusters. In the eigenspace (space given by the eigenvectors), clusters are trivial to separate. -- Spectral Clustering

9

10 (2) How to avoid negative transfer? –A new clustering-based KL Divergence to reflect distribution differences. –If distributions are too different (KL is large), automatically decrease the effect from source domain. The proposed solution Traditional KL Divergence Need to solve P(x), Q(x) for every x, which is normally difficult to obtain. To get the Clustering-based KL divergence: (1) Perform Clustering on the combined dataset. (2) Calculate the KL divergence by some basic statistical properties of the clusters. See Example.Example

11 An Example P Q C1 C2 Clustering S(P’, C1) S(Q’, C1) S(P’, C2) S(Q’, C2) Combined Dataset For example, S(P’, C) means “the portion of examples in P that are contained in cluster C ”. = 0.5 the portion of examples in P that are contained in cluster C1 the portion of examples in Q that are contained in cluster C1 = 0.5 =5/9 =4/9 the portion of examples in P that are contained in cluster C2 the portion of examples in Q that are contained in cluster C2 E(P)=8/15 E(Q)=7/15 P’(C1)=3/15 Q’(C1)=3/15 P’(C2)=5/15 Q’(C2)=4/15 KL=0.0309

12 Objective Function Objective: Find an eigenspace that well separates the target data –Intuition: If the source data is similar to the target data, make good use of the source eigenspace; –Otherwise, keep the original structure of the target data. Prefer Source Eigenspace Prefer Original Structure Balanced by R(L; U) More similar of distributions, less is R(L; U), more the function will rely on source eigenspace TL Traditional Normalized Cut Penalty Term

13 How to construct constraint TL and Tu? Principle: –To construct TL --- it is directly derived from the “must-link” constraint (the examples with the same label should be together). –To construct TU --- (1) Perform standard spectral clustering (e.g., Ncut) on U. (2) the examples in the same cluster should be together , 2, 4 should be together (blue); 3, 5, 6 should be together (red) , 2, 3 should be together; 4, 5, 6 should be together

14 How to construct constraint TL and Tu? Construct the constraint matrix M=[m1, m2, …, mr]’ For example, , -1, 0, 0, 0, 0 1, 0, 0, -1, 0, 0 0, 0, 1, 0, -1, 0 …… T ML = 1 and 2 1 and 4 3 and 5

15 Experiment Data sets

16 Experiment data sets

17 Text Classification Comp1 VS Rec1 1: comp2 VS Rec2 2: 4 classes (Graphics, etc) 3: 3 classes (crypt, etc) 1: org2 VS People2 2: 3 classes (Places, etc) 3: 3 classes (crypt, etc) Org1 VS People1

18 Image Classification Homer VS Real Bear Cartman VS Fern 1: Superman VS Teddy 2: 3 classes (cartman, etc) 3: 4 classes (laptop, etc) 1: Superman VS Bonsai 2: 3 classes (homer, etc) 3: 4 classes (laptop, etc)

19 Parameter Sensitivity

20 Problem: Transfer across tasks with different class labels Two sub-problems: (1) What and How to transfer? Transfer the eigenspace. (2) How to avoid negative transfer? Propose an effective clustering-based KL Divergence; if KL is large, or distributions are too different, decrease the effect from source domain. Conclusions

21 Thanks! Datasets and codes:

22 # Clusters? Condition for Lemma 1 to be valid: In each cluster, the expected values of the target and source data are about the same. >If Adaptively Control the #Clusters to guarantee Lemma 1 valid! --Stop bisecting clustering when there is only target/source data in the cluster, or where is close to 0.

23 Optimization Let Algorithm flow Then,