Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

SI/EECS 767 Yang Liu Apr 2,  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
GermanPolarityClues A Lexical Resource for German Sentiment Analysis
Farag Saad i-KNOW 2014 Graz- Austria,
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Distributed Representations of Sentences and Documents
Scalable Text Mining with Sparse Generative Models
Mining and Summarizing Customer Reviews
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Multiclass object recognition
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Bo Pang , Lillian Lee Department of Computer Science
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Hedge Detection with Latent Features SU Qi CLSW2013, Zhengzhou, Henan May 12, 2013.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
CSC 594 Topics in AI – Text Mining and Analytics
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Network Community Behavior to Infer Human Activities.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
iSRD Spam Review Detection with Imbalanced Data Distributions
Concave Minimization for Support Vector Machine Classifiers
Presentation transcript:

Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language Technologies: North American Chapter of the ACL 2009 (NAACLHLT09)

Agenda Intrudoction Semi-supervised mincuts Experiments and evaluation Conclusion

Subjective content Any expression of a private state such as opinion or belief. Typical research topics – Identification of subjective content. – Determination of its polarity.

Identification of subjective content Automatic identification of subjective content often relies on word indicators. – Unigrams (Pang et al., 2002) – Predetermined sentiment lexica (Wilson et al., 2005) Word-based indicators can be misleading. – Irony and negation can reverse subjectivity or polarity indications. – Different word senses of a single word can actually be of different subjectivity or polarity.

Subjectivity ambiguous A word that has at least one subjective and at least one objective sense such as positive. – Positive, electropositive—having a positive electric charge. (objective) – Plus, positive—involving advantage or good. (subjective) Over 30% of words in dataset are subjective- ambiguous.

Research goal Propose a semi-supervised approach based on minimum cut in a lexical relation graph to assign subjectivity (subjective/objective) labels to word senses. This algorithm outperforms supervised minimum cuts and standard supervised, non-graph classification algorithms such as SVM. The semi-supervised approach achieves the same results as the supervised framework with less than 20% of the training data.

Minimum cuts S T

Subjectivity Recognition using Mincuts Religious—extremely scrupulous and conscientious. Scrupulous—having scruples; arising from a sense of right and wrong; principled. Flicker—a momentary flash of light.

Selecting unlabeled data Scan each sense in dataset, and collect all senses related to it via one of WordNet relations.

Weighting the edges to the classification vertices Classify the unlabeled and test vertices using SVM with the labeled data as training data.

Features Lexical features (L) – A bag-of-words representation of the sense glosses with sop word filtering. Relation features (R) – How many relations of the type the sense has to senses in the subjective or objective part of the training set. – A binary feature describe whether a noun/verb sense is a hyponym of the an element of the 7 subjective verb and noun senses which are close to the root in WordNet’s hypernym tree. Monosemous feature (M) – If a monosemous word is subjective, obviously its single sense is subjective.

Weighting the WordNet relations

Relation weights

Classification with Mincuts

Datasets Micro-WNOp – 298 words with 703 objective and 358 subjective WordNet senses. Wiebe and Mihalcea (2006) – 60 words with 236 objective and 92 subjective WordNet senses. – Only noun and verb senses.

Results

Accuracy for different part-of-speech

Accuracy with different sizes of training data

Performance with different sizes of unlabeled data Use a subset of the 10 relations to generate the unlabeled data and edges between example vertices.

Performance with different sizes of unlabeled data Use all the 10 relations to generate the unlabeled data, then randomly select subsets of them.

Learning curve with different sizes of unlabeled data

Conclusion The proposed semi-supervised approach is significantly better than a standard supervised classification framework as well as a supervised Mincut. Achieves a 40% reduction in error rates (from 25% to 15%). Achieves the same results as the supervised framework with less than 20% of the training data. Outperforms the previous state-of-the-art approaches (Wiebe & Mihalcea, 2006). – F-measure 56% vs. 52%