Lecture: Sentiment Analysis

Lecture: Sentiment Analysis
Krista Lagus Statistical Natural Language Processing course at Aalto

Concepts related to sentiments
Affect, feeling, sensation, emotion, sentiment, opinion, attitude? MIND CONCRETE WORLD sensations, emotions, moods thoughts opinions beliefs attitudes intentions sentiments e.g. a tweet actions e.g. buying a product RELATED TO: Topic, Object, Event, Person, Situation

Context of an expression
INDIVIDUAL CONTEXT SOCIAL CONTEXT State of mind: e.g. tired Objectives (of communication), Expectations (regarding how to reach objectives) Model of the world Model of the “rules” of social interaction Language model Model of the audience Time (timestamp) Position in digital space (forum or subforum, tags in tweets) Relationship to other expressions (links, responses) Expressions of interest or of sentiment (# Likes, thumbs, re-tweets, shares) Historical context of the Author, of other participants

Sentiment Analysis Topic: The object of discussion, the general theme.
Sentiment: expression (e.g. in writing) of one’s feeling, opinion, attitude, involving some polarity Sentiment analysis: The analysis of sentiments in typically written expressions: phrases or messages The set of sentiments is decided in advance. Typically detecting only polarity: +/– or positive/neutral/negative a CLASSIFICATION PROBLEM

Emotion Detection Emotion: E.g. “basic emotions” according to some emotional theory: happy, sad, angry, disgusted, surprised, fearful, … Emotion detection: Detecting the emotion conveyed by an individual expression: a phrase or a message. CLASSIFICATION PROBLEM Differences to sentiment analysis: involves a wider set of emotions than just polarity May be based on also other signals than text/speech, such as EEG, EKG, stress measurements, prosody within speech signal, video analysis of movements (speed of walking, style of walking) Some challenges: How do language expressions relate to emotions, which may or may not be expressed, and may or may not be consciously experienced? Which set of emotions / what emotional theory to use? How to differentiate between talking about an emotion “When I feel angry I tend to shout at my partner” vs. having an emotion “I am so angry I could hit someone”, “Fuck you!”

Opinion Mining Opinion: One’s subjective stance in relation to something, an entity or event or situation. Opinion is always about something. Related to: judgement, attitude, thoughts on some matter. Opinion mining: The detection of opinions about something from texts. May be used as a synonym for sentiment analysis. Slight differences to sentiment analysis: Often the objects of interest are given: e.g. a particular company or product or a set of presidential candidates. The set of possible opinions may be unknown in advance.

Uses of Sentiment Analysis
Understanding or Diagnostic purposes: To discover or understand what the situation is Controlling or manipulative purposes: To help someone get to the desired result by affecting the state of another (willing or unwilling, knowing or unknowing)

Possible uses: Understanding
Customer feedback analysis to understand better customer happiness and unhappiness about products offered National state of mind analysis: e.g. Citizen’s Mindscapes initiative Workplace atmosphere analysis Citizen’s democracy: “passive polling” Detecting individual risks (e.g. onset of depression, propensity to commit suicide, opinions about a possible future employer) Detecting national or international risks: e.g. school bombings, terrorist attacks, mass migration

Possible uses: Manipulation
Affecting an individual’s actions by affecting their state of mind E.g. social study made by Facebook on how the sentiments expressed in an individual’s FB posts were changed based on by affecting the sentiments in their news feed (positive/negative) Igniting a decision-to-buy (direct marketing) Igniting a decision-to-vote (direct political marketing) Igniting a decision-to-donate igniting a terrorist attack Control or manipulation on a general level: If we do this campaign, how do people in general react? Political campaigns, marketing efforts, national public opinion towards a desired outcome Igniting national unrest or international mass migration ETHICAL CONSIDERATIONS CANNOT BE AVOIDED. WE MUST BE AWARE AND TAKE A STANCE ON WHICH ARE ETHICAL USES OF THIS TECHNOLOGY.

Ethical aspects in Sentiment Analysis
Some questions that may help in uncovering the ethical aspects of research and application of these methods To whom does the information collected go: to the individuals in question, to public officials etc. To what purposes will the information be used? In whose interests is the collection of the information? Do the individuals know that their input is being analyzed? Is there consent by the individuals for analyzing their input? How is security of the information handled (storage, unintended consequences, possible risks) Are there vulnerable individuals or groups affected? What potential outcomes are there from detecting certain information for the individual and in general? What laws, policies or general social agreements are in place, related to this? What risks are there?

Classification approach Example: EmpaTweet
Tries to detect 7 emotions, in data from 14 topics (tweets) Probabilistic topic model (Latent Dirichlet Allocation) A number of other features: Wordnet synsets etc.

Emotions on data sets

EmpaTweet system

Steps: Data preparation
Selection of sentiments / emotional categories / emotion theory Data source selection & data collection Decide the length of textual segment to classify (e.g. a tweet, a sentence, an utterance, a comment, a paragraph, a document) Preprocessing of text (e.g. white space removal) Data annotation: Training & test set annotation by human annotators with emotional categories (cross-check by several annotators for a subset of data to determine inter-annotator-consistency)

EmpaTweet topics

EmpaTweet tweets

Steps: Classification
Feature extraction for classifiers: e.g. n-grams, ?! Special characters, morphological analyses, POS tags, topics from modeling, synonyms (e.g. wordnet), semantic categories Optional: Feature selection: Select the most informative features for each emotion to reduce number of parameters to learn in the classifier Learn classifier(s): E.g. Naïve Bayes or a set of Binary SVMs for detecting each emotion (resulting in a single multi- class classifier). Use training data set here Measure success by calculating either Classification Accuracy or Precision & Recall & F-measure for each emotion category. Use test data set here

EmpaTweet results

Lexical heuristic (vocabulary-based) approach
Instead of human annotators marking a training & test set with emotions, concentrate on designing the feature set using some lexical heuristics: Start with the emotion categories, and expand each into a vocabulary of emotion words describing that emotion E.g. recognize emotions for a large preliminary data set based on the emoticons they use:  (happy)  (sad) etc. One may use human interviewees, dictionaries, and data mining in creating the lexical heuristics and subsequent tentatively annotated data set EITHER Apply statistical principles to turn these heuristics into detection features. Consider the frequency of the words as well as different senses (meanings) of each word, and how common each sense is. OR use heuristics to select a preliminary data set, then apply feature selection & classifier on the tentatively tagged data to improve the set of features & overall classifier performance

Comparison Classification approach Lexical heuristic approach
Few existing emotionally annotated data sets in most languages Data set annotation is a lot of work, and depends on the emotional awareness of the humans that do it. Ease of measuring performance Ease of improving method (different features or different classifiers) Can bring new knowledge about the expression of various emotions in real social contexts Classification relies on rather reliable, very specific data (annotation of actual expressions) Lack of generalizability: Performance may be very specific to the particular training data set. If training data does not match intended application, may not generalize well Lexical heuristic approach Emotional vocabularies exist for many languages No need for lengthy human data annotation process With no annotated data, more difficult to assess performance Quality of heuristics: May entail incorrect heuristics (e.g. that using the word “hate” means that someone is feeling hateful)

Semeval data set SemEval task: Sentiment Analysis in Twitter
Data sets and task description: Task: Detect polarity: positive/negative/neutral Alternative tasks: (1) polarity of word or phrase-in-context, or (2) polarity of message The following datasets are available for training and development: training: 9,728 Twitter messages development: 1,654 Twitter messages (can be used for training as well) development-test #1: 3,814 Twitter messages (CANNOT be used for training) development-test #2: 2,094 SMS messages (CANNOT be used for training)

Other resources SentiWordNet: Lexical (i.e. word-based) resource for sentiment analysis and opinion mining in English. Based on WordNet. Returns the polarity: positive/negative/objective values for each word In WordNet.

Lecture: Sentiment Analysis

Similar presentations

Presentation on theme: "Lecture: Sentiment Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture: Sentiment Analysis

Similar presentations

Presentation on theme: "Lecture: Sentiment Analysis"— Presentation transcript:

Similar presentations

About project

Feedback