Sentiment and Opinion Sep18, 2012 Analysis of Social Media Seminar William Cohen.

Slides:



Advertisements
Similar presentations
Trends in Sentiments of Yelp Reviews Namank Shah CS 591.
Advertisements

Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
GermanPolarityClues A Lexical Resource for German Sentiment Analysis
Distant Supervision for Emotion Classification in Twitter posts 1/17.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Manual Subjectivity Analysis. EUROLAN July 30, Preliminaries What do we mean by subjectivity? The linguistic expression of somebody’s emotions,
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
LINGUISTICA GENERALE E COMPUTAZIONALE SENTIMENT ANALYSIS.
Everything you need to know in order to set up your Reader’s Notebook
Text Categorization Moshe Koppel Lecture 8: Bottom-Up Sentiment Analysis Some slides adapted from Theresa Wilson and others.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
A Brief Overview. Contents Introduction to NLP Sentiment Analysis Subjectivity versus Objectivity Determining Polarity Statistical & Linguistic Approaches.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Subjectivity Annotation Update Josef Ruppenhofer Jan Wiebe.
1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
A Holistic Lexicon-Based Approach to Opinion Mining
Mining and Summarizing Customer Reviews
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
FUNDAMENTALS OF WRITING March 24, Today Continue summaries Introduction to Assignment 1.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Word Sense and Subjectivity Jan Wiebe Rada Mihalcea University of Pittsburgh University of North Texas.
Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William.
Sentiment Detection Naveen Sharma( ) PrateekChoudhary( ) Yashpal Meena( ) Under guidance Of Prof. Pushpak Bhattacharya.
Introduction to Critical Thinking Developing Critical Thinking Skills.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Preparing for the TAKS ESSAY. Content / Ideas This is the heart of the paper--what the writer has to say. It should be a topic that is important to.
Ms. Carlino’s English Class. For a paragraph to make sense, you need two things! 1. Unity – each sentence supports the main idea 2. Coherence – All sentences.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
Communicative and Academic English for the EFL Professional.
Opinion Observer: Analyzing and Comparing Opinions on the Web
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
Annotating Opinions in the World Press Theresa Wilson and Janyce Wiebe University of Pittsburgh Intelligent Systems Program and Department of Computer.
7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Sentiment and Opinion Sep13, 2012 Analysis of Social Media Seminar William Cohen.
CS3730 Fall 2008 Subjectivity and Sentiment Analysis Lecture (Day 2): Introduction to linguistic subjectivity.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Test Taking Skills Make sure you prove what you know!
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
The Toulmin Method. Why Toulmin…  Based on the work of philosopher Stephen Toulmin.  A way to analyze the effectiveness of an argument.  A way to respond.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Sentiment analysis algorithms and applications: A survey
Sentiment Analysis Seminar Social Media Mining University UC3M
Manual Subjectivity Analysis
Memory Standardization
THE QUESTIONS—SKILLS ANALYSE EVALUATE INFER UNDERSTAND SUMMARISE
Aspect-based sentiment analysis
An Overview of Concepts and Selected Techniques
Presentation transcript:

Sentiment and Opinion Sep18, 2012 Analysis of Social Media Seminar William Cohen

First assignment: due Friday Go to Create an account for yourself –use andrew id Go to your user page –Your real name & a link to your home page –Preferably a picture –Who you are and what you hope to get out of the class (Let me know if you’re just auditing) –Any special skills you have, research interests that you have, related projects you have been or might be working on, etc.

Outline Announcements Recap –With a little more on word senses More discussion: what exactly is subjectivity, sentiment and polarity? –Annotating a corpus for subjectivity –Fine-grained sentiment for reviews More distinctions: –Agreement and discourse 3

In our previous episode… 4

5 Motivations: sentiment common… Analysis : modeling & learning Communication, Language People Networks Social Media

6 …and important… Product review mining: What features of the ThinkPad T43 do customers like and which do they dislike? Review classification: Is a review positive or negative toward the movie? Tracking sentiments toward topics over time: Is anger ratcheting up or cooling down? Etc. [These are all ways to summarize one sort of content that is common on blogs, bboards, newsgroups, etc. –W]

…and non-trivial 7

What units do we attach sentiment to? Individual words (“nice”, “comfortable”) Phrases (“slow service”) Sentences? Documents? … ? 8

9ICWSM Hatzivassiloglou & McKeown 1997 Build a graph of adjectives linked by the same or different semantic orientation (determined by conjunctions)… nice handsome terrible comfortable painful expensive fun scenic

10ICWSM Hatzivassiloglou & McKeown 1997 …and a clustering algorithm partitions the adjectives into two subsets nice handsome terrible comfortable painful expensive fun scenic slow +

Jan - ICWSM Word senses Senses

Jan - ICWSM Senses Is this polar?

Jan - ICWSM Non-subjective senses of brilliant 1.Method for identifying brilliant material in paint - US Patent In a classic pasodoble, an opening section in the minor mode features a brilliant trumpet melody, while the second section in the relative major begins with the violins.

ICWSM Subjective Sense Examples His alarm grew Alarm, dismay, consternation – (fear resulting form the awareness of danger) –Fear, fearfulness, fright – (an emotion experiences in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight)) He was boiling with anger Seethe, boil – (be in an agitated emotional state; “The customer was seething with anger”) –Be – (have the quality of being; (copula, used with an adjective or a predicate noun); “John is rich”; “This is not a good answer”) SNSN SNSN

ICWSM Objective Sense Examples The alarm went off Alarm, warning device, alarm system – (a device that signals the occurrence of some undesirable event) –Device – (an instrumentality invented for a particular purpose; “the device is small enough to wear on your wrist”; “a device intended to conserve water” The water boiled Boil – (come to the boiling point and change from a liquid to vapor; “Water boils at 100 degrees Celsius”) –Change state, turn – (undergo a transformation or a change of position or action)

ICWSM Objective Senses: Observation We don’t necessarily expect phrases/sentences containing objective senses to be objective –Will someone shut that darn alarm off? –Can’t you even boil water? Subjective, but not due to alarm and boil

ICWSM Objective Sense Definition When the sense is used in a text or conversation, we don’t expect it to express subjectivity and, if the phrase/sentence containing it is subjective, the subjectivity is due to something else.

18 Later/related work: –LIWC, General Inquirer, other hand-built lexicons –Turney & Littman, TOIS 2003: Similar performance with 100M word corpus and PMI – higher accuracy better if you allow abstention on 25% of the “hard” cases. –Kamps et al, LREC 04: Determine orientation by graph analysis of Wordnet (distance to “good”, “bad” in graph determined by synonymy relation) –SentiWordNet, Esuli and Sebastiani, LREC 06: Similar to Kamps et al, also using a BOW classifier and WordNet glosses (definitions). Hatzivassiloglou & McKeown 1997

What units do we attach sentiment to? Individual words (“nice”, “comfortable”) Phrases (“slow service”) Sentences? Documents? … ? 19

20 Turney 2002 Goal: classify reviews as “positive” or “negative”. –Epinions “[not] recommended” as given by authors. Method: –Find (possibly) meaningful phrases from review (e.g., “bright display”, “inspiring lecture”, …), based on POS patterns, like ADJ NOUN –Estimate “semantic orientation” of each candidate phrase Based on pointwise mutual information: Altavista counts of phrase’s cooccurrence with “excellent”, “poor” –Assign overall orentation of review by averaging orentation of the phrases in the review

21

22 Pang et al EMNLP 2002

23 Pang & Lee EMNLP 2004

24 Methods: 2002 Movie review classification as pos/neg. Method one: count human-provided polar words (sort of like Turney): –Eg, “love, wonderful, best, great, superb, still, beautiful” vs “bad, worst, stupid, waste, boring, ?, !” gives 69% accuracy on 700+/700- movie reviews Method two: plain ‘ol text classification –Eg, Naïve Bayes bag of words: 78.7; SVM-lite “set of words”: 82.9 was best result –Adding bigrams and/or POS tags doesn’t change things much.

25 Pang & Lee EMNLP 2004 Can you capture the discourse in the document? –Expect longish runs of subjective text and longish runs of objective text. –Can you tell which is which? Idea: –Classify sentences as subjective/objective, based on two corpora: short biased reviews, and IMDB plot summaries. –Smooth classifications to promote longish homogeneous sections. –Classify polarity based on the K “most subjective” sentences

What units do we attach sentiment to? Individual words (“nice”, “comfortable”) Phrases (“slow service”) Sentences? Documents? … ? 26

Outline Announcements Recap –With a little more on word senses More discussion: what exactly is subjectivity, sentiment and polarity? –Annotating a corpus for subjectivity –Fine-grained sentiment for reviews More distinctions: –Agreement and discourse 27

Manual and Automatic Subjectivity and Sentiment Analysis Jan Wiebe Josef Ruppenhofer Swapna Somasundaran University of Pittsburgh

29 Everyone knows that dragons don't exist. But while this simplistic formulation may satisfy the layman, it does not suffice for the scientific mind. The School of Higher Neantical Nillity is in fact wholly unconcerned with what does exist. Indeed, the banality of existence has been so amply demonstrated, there is no need for us to discuss it any further here. The brilliant Cerebron, attacking the problem analytically, discovered three distinct kinds of dragon: the mythical, the chimerical, and the purely hypothetical. They were all, one might say, nonexistent, but each nonexisted in an entirely different way... - Stanislaw Lem, “The Cyberiad”

30 Preliminaries What do we mean by subjectivity? The linguistic expression of somebody’s emotions, sentiments, evaluations, opinions, beliefs, speculations, etc. –Wow, this is my 4th Olympus camera. –Staley declared it to be “one hell of a collection”. –Most voters believe that he's not going to raise their taxes

31 Corpus Annotation Wiebe, Wilson, Cardie 2005 Annotating Expressions of Opinions and Emotions in Language Leaving aside what’s possible, what sort of inferences about sentiment, opinion, etc would we like to be able to make?

32 Overview Fine-grained: expression-level rather than sentence or document level –The photo quality was the best that I have seen in a camera. Annotate –expressions of opinions, evaluations, emotions –material attributed to a source, but presented objectively

33 Overview Fine-grained: expression-level rather than sentence or document level –The photo quality was the best that I have seen in a camera. Annotate –expressions of opinions, evaluations, emotions, beliefs –material attributed to a source, but presented objectively

34 Overview Opinions, evaluations, emotions, speculations are private states. They are expressed in language by subjective expressions. Private state: state that is not open to objective observation or verification. Quirk, Greenbaum, Leech, Svartvik (1985). A Comprehensive Grammar of the English Language.

35 Overview Focus on three ways private states are expressed in language –Direct subjective expressions –Expressive subjective elements –Objective speech events

36 Direct Subjective Expressions Direct mentions of private states The United States fears a spill-over from the anti-terrorist campaign. Private states expressed in speech events “I fear electoral fraud,” Tsvangirai said. Fear is a private state Fear is a private state but not of the author

37 Direct Subjective Expressions Direct mentions of private states The United States fears a spill-over from the anti-terrorist campaign. Private states expressed in speech events “We foresaw electoral fraud but not daylight robbery,” Tsvangirai said. This implies a private state, so it’s not direct.. Fear is a private state

38 Expressive Subjective Elements [ Banfield 1982 ] “We foresaw electoral fraud but not daylight robbery,” Tsvangirai said The part of the US human rights report about China is full of absurdities and fabrications Compare: “ We foresaw difficulties with the electoral process but not to this extent”, Tsvangirai said. The part of the US human rights report about China contains many statements that we were unable to verify. Understood as implying certain mental state

39 Objective Speech Events Material attributed to a source, but presented as objective fact The government, it added, has amended the Pakistan Citizenship Act 10 of 1951 to enable women of Pakistani descent to claim Pakistani nationality for their children born to foreign husbands. [What does this have to do with opinion? You need it to sort out who has opinions about what… -W]

An example… 40

41 Nested Sources “The report is full of absurdities,’’ Xirao-Nima said the next day. (Writer)

42 Nested Sources “The report is full of absurdities,’’ Xirao-Nima said the next day. (Writer, Xirao-Nima)

43 Nested Sources “The report is full of absurdities,’’ Xirao-Nima said the next day. (Writer Xirao-Nima)

44 “The report is full of absurdities,” Xirao-Nima said the next day. Objective speech event anchor: the entire sentence source: implicit: true Direct subjective anchor: said source: intensity: high expression intensity: neutral attitude type: negative target: report Expressive subjective element anchor: full of absurdities source: intensity: high attitude type: negative Attributes: The anchor is the linguistic expression—the stretch of text—that tells us that there is a private state. [Where to ‘hang’ the annotation’ -W] The source is the person to whom the private state is attributed. Note that this can be a chain of people. The target is the content of the private state or what the private state is about. Attitude type: If not specified, it is to be understood as neutral but can be set to positive or negative as required. Intensity records the intensity of “the private state as a whole.”

Another example… 45

ICWSM “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities.

ICWSM “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (Writer)

ICWSM “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima)

ICWSM “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima, US)

ICWSM “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima, US) (writer, Xirao-Nima) (Writer)

ICWSM Objective speech event anchor: the entire sentence source: implicit: true Objective speech event anchor: said source: Direct subjective anchor: fears source: intensity: medium expression intensity: medium … “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities.

52 Corpus (version 2) English language versions of articles from the world press (187 news sources) Themes of the instructions: –No rules about how particular words should be annotated. –Don’t take expressions out of context and think about what they could mean, but judge them as they are used in that sentence. Kappa around 0.7 – 0.8.

53 Reasons for fine-grain annotation and analysis Turney, Pang et al: document D is about a known product P D, sentiment refers to P D. Life is more complicated: –“The part of the US human rights report about China is full of absurdities and fabrications”: What is “absurd & fabricated”? The part, the US, the report, or China? For sentiment about products we want to know what is good or bad: there are usually tradeoffs –Huge screen  very heavy –Very fast  really expensive

Outline Announcements Recap –With a little more on word senses More discussion: what exactly is subjectivity, sentiment and polarity? –Annotating a corpus for subjectivity –Fine-grained sentiment for reviews More distinctions: –Agreement and discourse 54

55

ICWSM Hu & Liu 2004 Mining Opinion Features in Customer Reviews Here: explicit product features only, expressed as nouns or compound nouns Use association rule mining technique rather than symbolic or statistical approach to terminology Extract associated items (item-sets) based on support (>1%) I think this technique basically amounts to taking frequent ngrams, after they do the pruning - W Sample- one of many papers

ICWSM Hu & Liu 2004 Feature pruning –compactness “I had searched for a digital camera for 3 months.” “This is the best digital camera on the market” “The camera does not have a digital zoom” –redundancy/overlap manual ; manual mode; manual setting Feature expansion –For sentences with opinion words and no features, add NP closest to each opinion word

ICWSM Hu & Liu 2004 For sentences with frequent feature, extract nearby adjective as “effective opinion” for Based on opinion words, gather infrequent features (N, NP nearest to an opinion adjective) –The salesman was easy going and let me try all the models on display.

ICWSM Hu & Liu 2004 Semantic orientation of words –Propogate labels for a set of 30 seeds through WordNet using synonymy and antonymy Opinion sentences: opinion word + feature Semantic orientation of sentences –Flip word polarity if there are nearby negations –Go with the majority of opinion words –Break ties with majority of words that are part of “effective opinions” i.e., adjective closest to a feature

ICWSM Hu & Liu 2004 Summary: –Feature identification: 72-80% recall/precision on 500 reviews from five domains. –Opinion sentence extraction (opinion word + feature): 60-80% recall/precision –Sentence-level orientation accuracy: 73-95% Comment: 80% on each step does not mean you’re done… -W

61

Outline Announcements Recap –With a little more on word senses More discussion: what exactly is subjectivity, sentiment and polarity? –Annotating a corpus for subjectivity –Fine-grained sentiment for reviews More distinctions: –Agreement and discourse 62

63 Everyone knows that dragons don't exist. But... - Stanislaw Lem, “The Cyberiad”

64 (General) Subjectivity Types [Wilson 2008] Other (including cognitive) Note: similar ideas: polarity, semantic orientation, sentiment

ICWSM PDTB [In that suit, the SEC accused Mr. Antar of engaging in a "massive financial fraud" to overstate the earnings of Crazy Eddie, Edison, N.J., over a three-year period. ARG1] IMPLICIT_CONTRAST [ Through his lawyers, Mr. Antar has denied allegations in the SEC suit and in civil suits previously filed by shareholders against Mr. Antar and others. ARG2] Contrast between the SEC accusing Mr. Antar of something, and his denying the accusation

66 Subjectivity In that suit, the SEC [[accused SENTIMENT- NEG] Mr. Antar of engaging in a "massive financial fraud" to overstate the earnings of Crazy Eddie, Edison, N.J. ARGUING-POS], over a three-year period. Through his lawyers, Mr. Antar [has denied AGREE-NEG] allegations in the SEC suit and in civil suits previously filed by shareholders against Mr. Antar and others. Two attitudes combined into one large disagreement between two parties

ICWSM Subjectivity In that suit, the SEC [[accused SENTIMENT-NEG] Mr. Antar of engaging in a "massive financial fraud" to overstate the earnings of Crazy Eddie, Edison, N.J. ARGUING-POS], over a three-year period. Through his lawyers, Mr. Antar [has denied AGREE- NEG] allegations in the SEC suit and in civil suits previously filed by shareholders against Mr. Antar and others. Subjectivity: arguing-pos and agree-neg with different sources; Hypothesis: common with contrast. Help recognize the implicit contrast.

68

69 George Orwell

70

71

72

73

Where do we look for…? Sentiment/Subjectivity Individual words (“nice”, “comfortable”) Phrases (“slow service”) Sentences Documents Genres –RottenTomatos vs IMBD plot summaries Coherence Between words –Cooccurence –Relations in WordNet Between sentences –Proximity –Discourse structure Between documents –Hyperlinks, references to entities –Agreement/disagreement 74