Three steps are separately conducted The Chinese Univ. of Hong Kong University of Bristol Supervised Self-taught Learning: Actively Transferring Knowledge from Unlabeled Data Kaizhu Huang1, Zenglin Xu2, Irwin King2 , Michael R. Lyu2, and Colin Campbell1 1 Department of Engineering Mathematics University of Bristol, UK {K.Huang, C.Campbell}@bristol.ac.uk 2 Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu, king, lyu}@cse.cuhk.edu.hk Background Semi-supervised Learning (SSL)- Unlabeled data share the same set of categories as the labeled data. Transfer Learning (TL) – Supervised Learning with an additional labeled data set which is very similar to the training data Self-taught Learning (STL) – Unlabeled data could be random data unnecessarily sharing the same categories as the labeled data motivations & Framework Proposed Supervised STL Framework Steps of STL Learning basis by sparse coding from unlabeled data ( which are even randomly downloaded data) Representing labeled data by the basis obtained in step 1 Learning a classification function based on certain algorithms Sparse Coding Classifier learning Problems of STL Advantages of SSTL Three steps are separately conducted Irrelevant Basis may be extracted and could hurt the classification Basis selection is interacted with the classifier learning in a supervised fashion. Only useful basis will be extracted! Experiment results Data Repositories: 4 Subsets from WebKB 3, Reuters-21578 4, and Ohsumed Setup 1. Training data: 4 or 10 labeled samples randomly chosen for each category. Remaining data are considered as test data. 2. 1000 webpages searched by GOOGLE as unlabeled data 3. Randomly run the training and test 10 times. The average accuracies are returned as the final performance. Parameters are tuned via cross validation. Contribution & Conclusion The first study that performs Self-taught learning in a supervised way Able to learn basis and classification function simultaneously Iterative optimization with convergence guaranteed Significantly improve the classification accuracy of STL IJCNN 2009, Atlanta, U.S.A. June 14-19, 2009