Download presentation
Presentation is loading. Please wait.
Published byDion Mathis Modified over 9 years ago
1
Distant Supervision for Emotion Classification in Twitter posts 1/17
2
Natural language and text processing to identify and extract subjective information Classifying the polarity of a given text as positive, negative or neutral In general: to discover how people feel about a particular topic 2/17
3
Customers To research products before purchasing Marketers To research public opinion of their company or products Analyze customer satisfaction Organizations Gather critical feedback in newly released products 3/17
4
Earlier studies relied on predefined datasets, typically keyword-based Determening the emotion is subjective The words can be ambiguous 4/17
5
An attempt to exploit the widespread use of emoticons and other emotional content They are treated as noisy labels to obtain very large training sets Machine learning algorithms (Naïve Bayes, MaxENT and SVM) have accuracy above 80% when trained with emoticon data 5/17
6
Web application with a purpose to discover sentiment of a brand, product or topic on Twitter 6/17
7
Machine learning classifiers Keyword-based Naive Bayes MaxENT SVM Feature Extractors Unigrams Bigrams Unigrams and bigrams Unigrams with part of speech tags 7/17
8
As a baseline, a publicly available list of keywords is used For each tweet, the number of positive and negative keywords is counted The classifier return the polarity with the higher count 8/17
9
Multinomial Naïve Bayes model is used Class c is assigned to tweet d, where In this formula, f represents a feature and n i (d) represents the count of feature f i found in tweet d. There are a total of m features 9/17
10
Feature-based models Features like bigrams and phrases can be added In this formula, c is the class, d is the tweet, and lambda is a weight vector. The weight vectors decide the significance of a feature in classification 10/17
11
Input data are two sets of vectors of size m where each entry in the vector corresponds to the presence of a feature E.g. Unigram feature extractor – a feature is a word found in a tweet If the feature is present – value 1 If not – value 0 11/17
12
Analysis is done using Twitter API In the API, a query for „:)“ returns tweets with positive emotion and a query for „:(„ returns tweets with negative emotion 12/17
13
The training data is post-processed with filters: Emoticons are stripped off for training purposes MaxENT and SVM have better accuracies without them Tweets with both positive and negative emoticons are removed i.e. „I’m turning 30 today :( but I still get birthday presents! :)“ Retweets are removed The same tweet shouldn’t be counted twice Tweets with „:P“ are removed They usually don’t represent any distinct emotion Replicated tweets are removed 13/17
14
Unigram feature extractor The simplest way to retrieve features Results are similar to Pang and Lee’s work on different classifiers on movie reviews Bigram feature extractor Used for negation phrases like „not good“ or „not bad“ Downside: bigrams are very sparse and accuracy can drop for both MaxENT and SVM Unigrams and bigrams Accuracy improved for Naive Bayes and MaxENT Decline in accuracy for SVM Parts of speech The same word may have many different meaning Over as a verb may have a negative connotation Over can be a noun, without an emotion at all POS tags aren’t much of a use 14/17
15
Semantics Djokovic beats Federer :) The sentiment is positive for Djokovic, negative for Federer Domain-specific tweets Classifiers could perform better if limited to particular domains (such as movies) Handling neutral tweets Internationalization There are lots of tweet about the same subject in lost of different languages Utilizing emoticon data in the set Emoticons are stipped out and classifiers could perform better if they were included 15/17
16
On a tweet that says Djokovic beats Federer, one cannot extract the sentiment of the tweet To be precise, semantics could be a solution If (user.isFrom(Serbia)) then sentiment := positive else if (user.isFrom(Switzerland)) then sentiment := negative Using semantics, we can gather more information, than just by reading keywords 16/17
17
17/17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.