Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Co-clustering based classification for Out-of-domain Documents
Patch to the Future: Unsupervised Visual Prediction
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Chapter 1: Introduction to Pattern Recognition
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Color Transfer in Correlated Color Space Xuezhong Xiao, Computer Science & Engineering Department, Shanghai Jiao Tong University Lizhuang Ma., Computer.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Introduction to domain adaptation
Unsupervised object discovery via self-organisation Presenter : Bo-Sheng Wang Authors: Teemu Kinnunen, Joni-Kristian Kamarainen, Lasse Lensu, Heikki Kälviäinen.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Translated Learning Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty- Second Annual Conference.
1 Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Selected Applications of Transfer Learning
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
1 Introduction to Transfer Learning (Part 2) For 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Transfer Learning Part I: Overview Sinno Jialin Pan Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
1 Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Large-scale Deep Unsupervised Learning using Graphics Processors
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Source-Selection-Free Transfer Learning
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Multi-Task Learning for Boosting with Application to Web Search Ranking Olivier Chapelle et al. Presenter: Wei Cheng.
Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
An Efficient Greedy Method for Unsupervised Feature Selection
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Consensus Group Stable Feature Selection
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.
1 Learning Techniques for Big Data Redundant features – Group lasso – Feature selection.
Unsupervised Streaming Feature Selection in Social Media
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
PAC-Bayesian Analysis of Unsupervised Learning Yevgeny Seldin Joint work with Naftali Tishby.
Semi-Supervised Clustering
Learning Mid-Level Features For Recognition
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Object Recognition by Parts
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Lecture 22 Clustering (3).
Introduction:- Instructional technology has become an important part of teaching and learning within the classroom as well as working with fully online.
Three steps are separately conducted
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Object Recognition by Parts
Presentation transcript:

Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai Jiao Tong University ‡ Hong Kong University of Science and Technology

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Clustering Clustering relies on the sufficiency of data?

When the data are sparse, …

Can sparse data be clustered well?  Sometimes, it is possible. A good data representation can make the clustering much easier.

A good representation may help transformation

Can Transfer Learning help? auxiliary data target data

Transfer Learning can help auxiliary data target data separate the auxiliary data via transformation

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Problem Definition  Target Data  a small collection; to be clustered  Auxiliary Data  a large amount of; be irrelevant to the target data  Objective  enhance the clustering performance on the target data by making use of the auxiliary unlabeled data  Self-taught Clustering/Transfer Unsupervised Learning

Self-taught Learning training data auxiliary data test data (Raina et al., ICML 2007)

Self-taught Clustering auxiliary data target data

Transfer Learning training data auxiliary data test data (Raina et al., ICML 2006)

Transfer Unsupervised Learning auxiliary data target data

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Self-taught Clustering via Co-clustering  Target Data  a small collection; to be clustered  Auxiliary Data  a large amount of; be irrelevant to the target data co-clustering

Objective Function  Notations  X = {x 1,..., x n } and Y ={y 1,..., y m } : X and Y correspond to the target and auxiliary data.  Z = {z 1,..., z k } corresponds to the feature space of both target and auxiliary data.  represent the clusterings on X, Y and Z  Object function for self-taught clustering: Information Theoretic Co-clustering (Dhillon et al., KDD 2003)

Optimization choose the best for z to minimize

Optimization  Clustering functions (iteratively) target data auxiliary data features

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Data Sets  Caltech-256 image corpus  20 categories  eyeglass, sheet-music, airplane, ostrich, fern, starfish, guitar, laptop, hibiscus, ketch, cake, harp, car-side, tire, frog, cd, comet, vcr, diamond- ring, and skyscraper  For each clustering task,  The data from the corresponding categories are used as target unlabeled data.  The data from the remaining categories are used as the auxiliary unlabeled data.

 Entropy  The entropy for a cluster is defined as  The total entropy is defined as the weighted sum of the entropy with respect to all the clusters Evaluation Criterion

Experimental Results

Outline  Motivation  Self-taught Clustering  Transfer Unsupervised Learning  Algorithm  Experiments  Conclusion

Conclusion  We investigate the transfer unsupervised learning problem, called self-taught clustering.  Use irrelevant auxiliary unlabeled data to help the target clustering.  We develop a co-clustering based self-taught clustering algorithm.  Two co-clusterings are performed simultaneously between target data and features, and between auxiliary data and features.  The two co-clusterings share a common feature clustering.  The experiments show that our algorithm can improve clustering performance using irrelevant auxiliary unlabeled data.

Question?