Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: 11451 Group:

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
Pattern Recognition and Machine Learning: Kernel Methods.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Frustratingly Easy Domain Adaptation
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Pattern Recognition and Machine Learning
Simple Neural Nets For Pattern Classification
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Ensemble Learning: An Introduction
Unsupervised Learning
Presented by Zeehasham Rasheed
Distributed Representations of Sentences and Documents
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
Radial Basis Function Networks
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Active Learning for Class Imbalance Problem
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presented by Tienwei Tsai July, 2005
Machine Learning CSE 681 CH2 - Supervised Learning.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Universit at Dortmund, LS VIII
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Linear Models for Classification
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Data Mining and Text Mining. The Standard Data Mining process.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Semi-Supervised Clustering
Cross Domain Distribution Adaptation via Kernel Mapping
Learning with information of features
iSRD Spam Review Detection with Imbalanced Data Distributions
Knowledge Transfer via Multiple Model Local Structure Mapping
Text Categorization Berlin Chen 2003 Reference:
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group: 6 Published by: Himanshu Bhatt, Deepali Semwal Shourya Roy

Introduction Supervised machine learning classifications assume both training and test data are sampled from same domain or distribution (iids). Performance degrades for test data from different domain. Problems of difference in features across domains, and absence of labelled data in test domain So an Iterative similarity based adaptation algorithm proposed in the paper to address these issues. A classifier learnt on one domain with sufficient labelled training data is applied to a different test domain with no labelled data.

Iterative algorithm starts with a shared feature representation of source and target domains. To adapt, it Iteratively learns domain specific features from the unlabelled target domain data, similarity between two domains is incorporated in similarity aware manner.

Different stages of proposed algorithm

Philosophy and Features of the algorithm Gradual transfer of knowledge from source to target domain while considering similarity between two domains. Ensemble of two classifier used. Transfer occurs within the ensemble where a classifier learned on shared representation transforms unlabeled test data into pseudo labeled data to learn domain specific classifier.

Salient Features: 1.Common Feature Space Representation Want to find a good feature representation which minimizes the divergence between the source and target domains and classification error. Structured Correspondence learning feature representation transfer approach used which derives a transformation matrix Q that gives a shared representation between the source and target domains. This approach(SCL) aims to learn the co-occurrence between features expressing similar meaning in different domains

Common Feature Space Representation Contd. ●Principal Predictor space is created using some top K eigenvectors of the matrix. Features are then projected form different domains onto this predictor space for shared feature space representation. ●Algorithm generalizes to different shared representations.

2. Iteratively building Target domain labelled data Hypothesis: Certain target domain instances are more similar to source domain instances than the rest. Target: To create (Pseudo)labelled data in target domains. Idea: A classifier trained on suitably chosen source domain instances will be able to categorize similar target domain instances confidently. Hence we get labelled data in target domain from there. Only a few can be confidently labelled, so the process is iterated and in next iteration, ensemble output is considered. Again this adds to pseudo labelled data and process continues till all instances are exhausted or ends on some criteria.

3. Domain similarity based aggregation Dissimilarity hinders the better shared space representation and performance of classification. If not similar enough then in this “transfer learning technique” the knowledge passed may result in “Negative learning”. So we measure Domain similarity using “Cosine similarity measure”

Notations

Algorithm

Algorithm continues

Algorithm continues.

Algorithm in a nutshell

Confidence of Prediction: α i for the i th instance is measured as the distance from the decision boundary(Hsu et. al., 2003) as follows: Here, R is the unnormalized output from the SVM classifier, V is the weight vector for the support vectors and |V| = V T V. Each iteration shifts the weights in ensemble from the classifier learnt on shared representation to target domain classifier.

Results and Datasets Efficacy of proposed algorithm was evaluated on different datasets for cross domain text classification (Blitz et. al, 2007) The performance of the experiment evaluated on two-class classification task and reported in classification accuracy terms.

Datasets Amazon Review data taken with four different domains, Books, DVDs, Kitchen appliances and Electronics. Each domain comprised of 1000 positives and 1000 negative reviews and in all experiments 1600 labelled and 1600 unlabelled reviews are taken in source and target domains, and performance is reported on the non overlapping 400 reviews from target domain. 20 Newsgroups dataset which was a collection of approximately 20,000 documents evenly partitioned across 20 newsgroups is used.

Datasets contd. Third dataset was a real world dataset comprising of tweets about products and services. Coll1 about gaming, Coll2 about Microsoft products and Coll3 about mobile support. Each collection has 218 positive and negative tweets and these tweets are collected based on user-defined keywords captured in a listening engine which then crawled the social media and fetched comments matching the keywords.

Results and Analysis After the datasets were preprocessed, the experiments were conducted with the SVM and radial basis function kernel as the constituent classifiers of the ensemble classifier. Performance mainly affected due to dissimilarity of domains resulting in negative transfer and feature divergence. In the experiments, the maximum number of iterations were set to 30. Target specific classifier were found to have more weight after the iterations. On an average the weights converged to w s = 0.22 and W t =0.78 at the end.

Amazon Dataset results and Analysis: The above table shows comparison of performance of individual classifiers and ensemble for training on books domain and test across different domains. C s and C t performed on different domains before iterating learning process. The ensemble has better accuracy than individual classifiers. This further validated that target specific features are more discriminative than the shared features in classifying target domain instances.

Effects of different components of algorithm on Amazon review dataset studied: Effect of learning target specific features: Results from the algorithm show that iteratively kerning target specific feature representation( slow opposed to one shot transfer) yields better performance across different domain classification.

Effects of different components of algorithm on Amazon review dataset studied: Besides looking at the learning of target specific features, Effect of similarity on performance, Effect of varying threshold and Effects of using different shared representation was studied and the Proposed algorithm was found to perform better.

Results on 20 Newsgroup dataset For classification, they divided the data into six different datasets and the top two categories in each was picked as the two classes. The data was further segregated based on sub-categories are combined, where each sub-category was considered as a different domain. 4/5th of source and target data was used for shared representation and results were reported on 1/5th of the test data. Since different domain are crafted out from the subcategories of the same dataset, domains were exceedingly similar and therefore, the baseline accuracy was relatively better than that on other two datasets

Results on 20 Newsgroup dataset The proposed algorithm still yielded an improvement of at least 10.8% over baseline accuracy. The proposed algorithm also out performed the other adaptation approaches.

Results on Real World Data The proposed algorithm iteratively learned discriminitive target specific features from the data and translated it to an improvement of atleast 6.4% and 3.5% over baseline and the SCL respectively as per their experiment conducted.

Conclusion…!