Towards a Robust Metric of Opinion Kamal Nigam, Matthew Hurst Intelliseek Applied Research Center AAAI Spring Symposium on Exploring Attitude and Affect.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.
Statistical Issues in Research Planning and Evaluation
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Evaluating Hypotheses
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Presented by Zeehasham Rasheed
Chapter 10 - Part 1 Factorial Experiments.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Mining and Summarizing Customer Reviews
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Predicting Winning Price In Real Time Bidding With Censored Data Tejaswini Veena Sambamurthy Weicong Chen.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Unit 5: Hypothesis Testing
Erasmus University Rotterdam
Improving a Pipeline Architecture for Shallow Discourse Parsing
Aspect-based sentiment analysis
iSRD Spam Review Detection with Imbalanced Data Distributions
Introduction to Sentiment Analysis
Presentation transcript:

Towards a Robust Metric of Opinion Kamal Nigam, Matthew Hurst Intelliseek Applied Research Center AAAI Spring Symposium on Exploring Attitude and Affect in Text 2004

Abstract This paper describes an automated system for detecting polar expressions about a topic of interest. The two elementary components are  a shallow NLP polar language extraction system  a machine learning based topic classifier. Based on these components, discuss how to measure the overall sentiment about a particular topic as expressed in online messages authored by many different people. The goal of research is to create text analysis techniques that facilitate real-world market research over first-person commentary form the internet.

Introduction The general approach is:  Segment the corpus into individual expression (sentence level)  Use a general-purpose polar language module and a topic classifier to identify individual polar expressions about the topic of interest.  Aggregate these individual expressions into a single score.

Class of Polar Expressions Focus on 2 general aspects of expression.  Opinion:: Statements used by the author to communicate opinion reflect a personal state. I love this car. I hate the BrightScreen LCD’s resolution.  negative opinion.  Evaluative factual:: objective but describe what is generally held to be a desirable or undesirable state. The screen is broken. (describing an undesirable state.) My BrightScreen LCD had many dead pixels.  negative oriented. context and time dependent.

Definition of Polar Expressions  The first dimension is explicit versus implicit language direct statements I like it. subjective evaluative language It is good. Implicit expression, on the other hand, involve sarcasm, irony, idiom and other deeper cultural referents. Ex: It’s great if you like dead batteries.  The next dimension is which an opinion is being expressed. This may be an established ‘real world’ concept, or it may be a hypothetical ‘possible worlds’ concept. Ex :: I love my new car I am searching for the best possible deal

Recognizing Polar Language A third aspect that concerns us is that of modality, conditionals and temporal aspects of the language used.  “I might enjoy it” have a clear evaluative element (enjoy it) but do not express a definite opinion.  “If it were larger...” describes a condition Matter of attribution  “I think you will like it.” it might mean the author likes it and assumes I have the same taste, and so on.

Definition of Polar Expressions A polar phrase extraction system was implemented. A lexicon is developed which is tuned to the domain being explored. Each item in lexicon is a pairing or a word and its POS.

Recognizing Polar Language The interpretation phase then carries out two tasks: Tokenization step:  This car is really great.  {this, car, is, really, great}  {this_DT, car_NN, is_VB, really_RR, great_JJ}  {this_DT, car_NN, is_VB, really_RR, great_JJ;+} the elevation of semantic information from lower constituents to higher, applying negation logic where appropriate, and assembling larger constituents from smaller. Rules are applied in a certain order

Recognizing Polar Language Chunking phase  {(this DT) DET, (car NN) BNP, (is VB) BVP,(really RR, great JJ;+) BADJP}. Chunked input processed to form high-order groupings of a limited set of syntactic patterns. The simple syntactic patterns used to combine semantic features are:  Predicative modification (it is good),  Attributive modification (a good car),  Equality (it is a good car), and  Polar clause (it broke my car).  Negation of the following types are captured by the system: Verbal attachment (it is not good, it isn’t good), Adverbial negatives (I never really liked it, it is never any good), Determiners (it is no good), and Superordinate scope (I don’t think they made their best offer).

Topic Detection in Online Message Winnow classifier (Littlestone 1988; Blum 1997; Dagan, Karov, & Roth 1997), an online learning algorithm that finds a linear separator between the class of documents that are topical and the class of documents that are irrelevant.  Documents are modeled with the standard bag-of-words representation that discards the ordering of words and notices only whether or not a word occurs in a document.  Cw(x) = 1if word w occurs in document x and cw(x) = 0 otherwise.  fw is the weight for feature w. If h(x) > V then the classifier predicts topical, and otherwise predicts irrelevant. The basic Winnow algorithm proceeds as: Initialize all fw to 1. For each labeled document x in the training set,calculate h(x). If the document is topical, but Winnow predicts irrelevant, update each weight fw where cw(x) = 1 by: fw *= 2 If the document is irrelevant, but Winnow predicts topical,update each weight fw where cw(x) = 1 by: fw/= 2

If a document is judged by the classifier to be irrelevant, we predict that all sentences in that document are also irrelevant. If a document is judged to be topical, then we further examine each sentence in that document. Given each sentence and our text classifier, we simply form a bag- of-words representation of the sentence as if an entire document consisted of that single sentence. We then run the classifier on the derived pseudo-document. If the classifier predicts topical, then we label the sentence as topical, otherwise we label the sentence as irrelevant. Distribution between training and testing samples? Adapting a given document classifier into a high precision/low recall sentence classifier.  Topical training documents is relatively small.  A sentence is shorter than average documents, the words in the sentence need to be very topical in order for the classifier to predict positive.  Leads directly to a loss in recall.

Intersection of Topic and Polarity Asserted that a sentence judged to be polar and also judged to be topical is indeed expressing polarity about the topic.

Experimental Testbed Automotive domain messages from online resources (usenet, online message boards, etc.) Trained topic classifier and prepared polarity lexicon. Hand-labeled a separate random sample of 822 messages for topicality, 88(11%) were topical.  hand labeled all 1298 sentences in the 88 topical messages for topicality, polarity (positive and/or negative)  For the 7649 sentences in messages that were not topical, every sentence was automatically labeled as topically irrelevant, and thus containing no topical polarity either. Evaluate the polarity module in isolation using the 1298 sentences with the complete polarity labels. Use the full dataset for evaluating topic and its combination with polarity. To evaluate the difficulty of the task for humans, we had a second labeler repeat the hand-labeling on the 1298 sentences.

Experimental Result Sentences predicted as topical positive: – The B&W display is great in the sun. – Although I really don’t care for a cover, I like what COMPANY-A has done with the rotating screen, or even better yet, the concept from COMPANY-B with the horizontally rotating screen and large foldable keyboard. – At that time, superior screen. Sentences predicted as topical negative: – Compared to the PRODUCT’s screen this thing is very very poor. – I never had a problem with the PRODUCT-A, but did encounter the ”Dust/Glass Under The Screen Problem” associated with PRODUCT-B. – broken PRODUCT screen – It is very difficult to take a picture of a screen. – In multimedia I think the winner is not that clear when you consider that PRODUCT-A has a higher resolution screen than PRODUCT-B and built in camera.

Experimental Result Precision is very similar to human agreement. Recall is significantly lower than human performance.  Indirect language (implicit polar language,definition) Recall of negative polarity is quite a bit lower than positive one.  Confirms observation that language used for negative commentary is much more varied than positive.

Experimental Result The recall of sentence level is lower than message level. Add anaphora resolution to improve recall. Used the truth data to test basic assumption for correlating topic and polar language. Assumes that any expression of polarity in a topical sentence is expressing polarity about the topic itself. Topical sentences contain positive polarity:91%(90/99) about the topic, while negative polarity:80%(60/75) It is upper bound.

Experimental Result The precision for both positive and negative topical extraction is lower than for positive and negative extraction in isolation.

Metric for Topic and Polarity: Discussion How to use the polarity and topic modules to compile an aggregate score for a topic based on expressions contained in the data. Envision an aggregate topical orientation metric to be a function of  The total number of expressions in a domain  The true frequency of expressions that are topical  The true frequency of topical expressions that are positively oriented for that topic  The true frequency of topical expressions that are negatively oriented for that topic Simplistic metric might be just the ratio of positive to negative expression about a topic. The metric should not only support a single estimation of orientation, but also include some confidence in its measure. This will allow us to compare two or more competing topics of interest and say with a quantifiable probability that one topic is more favorably received than another.

Metric for Topic and Polarity: Discussion Estimate true frequencies of topic and polarity as an exercise in Bayesian statistics. The model we propose has a set of parameters that are fixed for each domain and topic  With probability Ptopic any expression will be written about the topic of interest.  With probability Ppos|topic any topical expression will be positive about the topic  With probability Pneg|topic any topical expression will be negative about the topic.

Metric for Topic and Polarity: Discussion Observe expressions by seeing the output of our topic and polarity modules, cloud the data:  With probability Ptopic,falsepos we observe a true irrelevant expression as a topical one  With probability Ptopic,falseneg we observe a true topical one as being irrelevant  With probability Ppos,falsepos we observe a positive topical expression when there is none.  With probability Ppos,falseneg we do not observe a positive topical expression when there really is one.  With probability Pneg,falsepos we observe a negative topical expression when there is none.  With probability Pneg,falseneg we do not observe a negative topical expression when there really is one. Using this explicit generative process of the data, we can use our observed data with standard statistical techniques to estimate the true Ptopic, Ppos|topic, and Pneg|topic.

Conclusion and Future work There are a number of necessary steps required to complete the system and to improve the performance  Improve polarity recognition  Improve recall for topic  Implement and test the metric described above.