Lecture: Sentiment Analysis

Slides:



Advertisements
Similar presentations
IB Oral Presentation Presentation dates: January-February (tentative)
Advertisements

Distant Supervision for Emotion Classification in Twitter posts 1/17.
Problem Semi supervised sarcasm identification using SASI
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Active ReadingStrategies. Reader Reception Theory emphasizes that the reader actively interprets the text based on his or her particular cultural background.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Unit 1 Task 4 Barriers To Communication Jackson Coltman.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Basic concepts of language learning & teaching materials.
A WALK TO THE JETTY From “Annie John” BY Jamaica Kincaid
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Lesson # 2 – Writing Process Unit # 1: Introduction to World History.
Communication  Process of creating meaning though symbolic interaction  Process of sending/receiving messages Verbal Nonverbal Characteristics of Communication.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Exploratory Research and Proper Problem Definition Lecture 3.
Emotions, Attitudes & Job Satisfaction
  Determine how the attitudes of both the writers and the characters reflect about the ideas of their day Recognizing Historical Details EventNameHistorical.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Lecture: Sentiment Analysis Krista Lagus Statistical Natural Language Processing course at Aalto
Copyright ©2016 Pearson Education, Inc. 5-1 Essentials of Organizational Behavior 13e Stephen P. Robbins & Timothy A. Judge Chapter 5 Personality and Values.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
COMMUNICATION ENGLISH III September 27/28 th 2012.
General Notes on Stylistics
Unit 1 Exploring Data: Distributions
Critical thinking for assignments to get a better grade
What are Learning Outcomes and how to create good Learning Outcomes
REPORT WRITING.
Sentiment Analysis of Twitter Messages Using Word2Vec
Classifications of Software Requirements
Presenter: Jia-Kuan Lin Advisor: Chung-Hsien Wu
Sentiment analysis algorithms and applications: A survey
Meeting the Assessment Criteria in the RE Agreed Syllabus.
Markers’ comments and suggestions 2017
Ethical Decision Making
IB Assessments CRITERION!!!.
Intro to Research Methods
Content analysis, thematic analysis and grounded theory
Automatic Hedge Detection
MID-SEM REVIEW.
Persuasive Writing.
Chapter 4 Emotions and Moods
Qualitative Research.
Understanding the rhetorical situation
Putting Knowledge into Practice
Proportion of Original Tweets
Orientation and Training
iSRD Spam Review Detection with Imbalanced Data Distributions
Romeo and Juliet (And A Christmas Carol)
Organizational Behavior
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
Introduction to Text Analysis
Mental/Emotional Health
Module C REPRESENTATION AND TEXT
Introduction to Sentiment Analysis
Information Retrieval
Section B: Scripted Piece. The Crucible
Presentation transcript:

Lecture: Sentiment Analysis Krista Lagus Statistical Natural Language Processing course at Aalto 8.3.2017

Concepts related to sentiments Affect, feeling, sensation, emotion, sentiment, opinion, attitude? MIND CONCRETE WORLD sensations, emotions, moods thoughts opinions beliefs attitudes intentions sentiments e.g. a tweet actions e.g. buying a product RELATED TO: Topic, Object, Event, Person, Situation

Context of an expression INDIVIDUAL CONTEXT SOCIAL CONTEXT State of mind: e.g. tired Objectives (of communication), Expectations (regarding how to reach objectives) Model of the world Model of the “rules” of social interaction Language model Model of the audience Time (timestamp) Position in digital space (forum or subforum, tags in tweets) Relationship to other expressions (links, responses) Expressions of interest or of sentiment (# Likes, thumbs, re-tweets, shares) Historical context of the Author, of other participants

Sentiment Analysis Topic: The object of discussion, the general theme. Sentiment: expression (e.g. in writing) of one’s feeling, opinion, attitude, involving some polarity Sentiment analysis: The analysis of sentiments in typically written expressions: phrases or messages The set of sentiments is decided in advance. Typically detecting only polarity: +/– or positive/neutral/negative a CLASSIFICATION PROBLEM

Emotion Detection Emotion: E.g. “basic emotions” according to some emotional theory: happy, sad, angry, disgusted, surprised, fearful, … Emotion detection: Detecting the emotion conveyed by an individual expression: a phrase or a message. CLASSIFICATION PROBLEM Differences to sentiment analysis: involves a wider set of emotions than just polarity May be based on also other signals than text/speech, such as EEG, EKG, stress measurements, prosody within speech signal, video analysis of movements (speed of walking, style of walking) Some challenges: How do language expressions relate to emotions, which may or may not be expressed, and may or may not be consciously experienced? Which set of emotions / what emotional theory to use? How to differentiate between talking about an emotion “When I feel angry I tend to shout at my partner” vs. having an emotion “I am so angry I could hit someone”, “Fuck you!”

Opinion Mining Opinion: One’s subjective stance in relation to something, an entity or event or situation. Opinion is always about something. Related to: judgement, attitude, thoughts on some matter. Opinion mining: The detection of opinions about something from texts. May be used as a synonym for sentiment analysis. Slight differences to sentiment analysis: Often the objects of interest are given: e.g. a particular company or product or a set of presidential candidates. The set of possible opinions may be unknown in advance.

Uses of Sentiment Analysis Understanding or Diagnostic purposes: To discover or understand what the situation is Controlling or manipulative purposes: To help someone get to the desired result by affecting the state of another (willing or unwilling, knowing or unknowing)

Possible uses: Understanding Customer feedback analysis to understand better customer happiness and unhappiness about products offered National state of mind analysis: e.g. Citizen’s Mindscapes initiative Workplace atmosphere analysis Citizen’s democracy: “passive polling” Detecting individual risks (e.g. onset of depression, propensity to commit suicide, opinions about a possible future employer) Detecting national or international risks: e.g. school bombings, terrorist attacks, mass migration

Possible uses: Manipulation Affecting an individual’s actions by affecting their state of mind E.g. social study made by Facebook on how the sentiments expressed in an individual’s FB posts were changed based on by affecting the sentiments in their news feed (positive/negative) Igniting a decision-to-buy (direct marketing) Igniting a decision-to-vote (direct political marketing) Igniting a decision-to-donate igniting a terrorist attack Control or manipulation on a general level: If we do this campaign, how do people in general react? Political campaigns, marketing efforts, national public opinion towards a desired outcome Igniting national unrest or international mass migration ETHICAL CONSIDERATIONS CANNOT BE AVOIDED. WE MUST BE AWARE AND TAKE A STANCE ON WHICH ARE ETHICAL USES OF THIS TECHNOLOGY.

Ethical aspects in Sentiment Analysis Some questions that may help in uncovering the ethical aspects of research and application of these methods To whom does the information collected go: to the individuals in question, to public officials etc. To what purposes will the information be used? In whose interests is the collection of the information? Do the individuals know that their input is being analyzed? Is there consent by the individuals for analyzing their input? How is security of the information handled (storage, unintended consequences, possible risks) Are there vulnerable individuals or groups affected? What potential outcomes are there from detecting certain information for the individual and in general? What laws, policies or general social agreements are in place, related to this? What risks are there?

Classification approach Example: EmpaTweet Tries to detect 7 emotions, in data from 14 topics (tweets) Probabilistic topic model (Latent Dirichlet Allocation) A number of other features: Wordnet synsets etc.

Emotions on data sets

EmpaTweet system

Steps: Data preparation Selection of sentiments / emotional categories / emotion theory Data source selection & data collection Decide the length of textual segment to classify (e.g. a tweet, a sentence, an utterance, a comment, a paragraph, a document) Preprocessing of text (e.g. white space removal) Data annotation: Training & test set annotation by human annotators with emotional categories (cross-check by several annotators for a subset of data to determine inter-annotator-consistency)

EmpaTweet topics

EmpaTweet tweets

Steps: Classification Feature extraction for classifiers: e.g. n-grams, ?! Special characters, morphological analyses, POS tags, topics from modeling, synonyms (e.g. wordnet), semantic categories Optional: Feature selection: Select the most informative features for each emotion to reduce number of parameters to learn in the classifier Learn classifier(s): E.g. Naïve Bayes or a set of Binary SVMs for detecting each emotion (resulting in a single multi- class classifier). Use training data set here Measure success by calculating either Classification Accuracy or Precision & Recall & F-measure for each emotion category. Use test data set here

EmpaTweet results

Lexical heuristic (vocabulary-based) approach Instead of human annotators marking a training & test set with emotions, concentrate on designing the feature set using some lexical heuristics: Start with the emotion categories, and expand each into a vocabulary of emotion words describing that emotion E.g. recognize emotions for a large preliminary data set based on the emoticons they use:  (happy)  (sad) etc. One may use human interviewees, dictionaries, and data mining in creating the lexical heuristics and subsequent tentatively annotated data set EITHER Apply statistical principles to turn these heuristics into detection features. Consider the frequency of the words as well as different senses (meanings) of each word, and how common each sense is. OR use heuristics to select a preliminary data set, then apply feature selection & classifier on the tentatively tagged data to improve the set of features & overall classifier performance

Comparison Classification approach Lexical heuristic approach Few existing emotionally annotated data sets in most languages Data set annotation is a lot of work, and depends on the emotional awareness of the humans that do it. Ease of measuring performance Ease of improving method (different features or different classifiers) Can bring new knowledge about the expression of various emotions in real social contexts Classification relies on rather reliable, very specific data (annotation of actual expressions) Lack of generalizability: Performance may be very specific to the particular training data set. If training data does not match intended application, may not generalize well Lexical heuristic approach Emotional vocabularies exist for many languages No need for lengthy human data annotation process With no annotated data, more difficult to assess performance Quality of heuristics: May entail incorrect heuristics (e.g. that using the word “hate” means that someone is feeling hateful)

Semeval data set SemEval task: Sentiment Analysis in Twitter Data sets and task description: http://alt.qcri.org/semeval2014/task9/ Task: Detect polarity: positive/negative/neutral Alternative tasks: (1) polarity of word or phrase-in-context, or (2) polarity of message The following datasets are available for training and development: training: 9,728 Twitter messages development: 1,654 Twitter messages (can be used for training as well) development-test #1: 3,814 Twitter messages (CANNOT be used for training) development-test #2: 2,094 SMS messages  (CANNOT be used for training)

Other resources SentiWordNet: Lexical (i.e. word-based) resource for sentiment analysis and opinion mining in English. Based on WordNet. http://sentiwordnet.isti.cnr.it Returns the polarity: positive/negative/objective values for each word In WordNet.