Download presentation
Presentation is loading. Please wait.
Published byFay Benson Modified over 9 years ago
1
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining Sentiment Analysis Methods
2
Key component of a new wave of applications that explore social network data Summary of public opinion about: politics, products, services (e.g. a new car, a movie), etc. Monitor social network data (in real-time) Common as polarity analysis (positive or negative) Sentiment Analysis on Social Networks
3
Which method to use? There are several methods proposed for different contexts There are several popular methods Validations based on examples, comparisons with baseline, with use of limited datasets There is not a proper comparison among methods Advantages? Disadvantages? Limitations? Sentiment Analysis Methods
4
Compare 8 popular sentiment analysis methods Focus on the task of detecting polarity: positive vs. negative Combine methods Deploy the methods in a system --- www.ifeel.dcc.ufmg.brwww.ifeel.dcc.ufmg.br This talk
5
Ifeel System & Conclusions Methods & Methodology Comparing & Combining
6
Extracted from instant messages services Skype, MSN, Yahoo Messages, etc. Grouped as positive and negative Emoticons
7
Lexical method (paid software) Allows to optimize the lexical dictionary -> we used the default Measures various emotional, cognitive, and structural components We only consider sentiment-relevant categories such as positivity, negativity Linguistic Inquiry and Word Count (LIWC)
8
Lexical approach based on the WordNet dictionary Groups words in synonyms Detects positivity, negativity, and neutrality of texts SentiWordNet
9
Lexical method adapted from a psychometric scale Consists of a dictionary of adjectives associated to sentiments Positive: Joviality, assurance, serenity, and surprise Negative: Fear, sadness, guilt, hostility, shyness and fatigue PANAS-t
10
Uses a well-known lexical dictionary namely Affective Norms for English Words (ANEW) Produces a scale of happiness 1 (extremely happy) to 9 (extremely unhappy) We consider [1..5) for negative and [5..9] for positive Happiness Index
11
Combines 9 supervised machine learning methods Estimates the strength of positive and negative sentiment in a text We used the trained model provided by the authors SentiStrengh
12
Machine learning method, trained with Naïve Bayes’ model Trained model implemented as a python library Classify tweets in JSON format for positive, negative, neutral and unsure SAIL/AIL Sentiment Analyzer (SASA)
13
Extract cognitive and affective information using natural language processing techniques Uses the affective categorization model Hourglass of Emotions Provides an approach that classify messages as positive and negative SenticNet
14
Comparison of coverage and prediction performance across different datasets Dataset 1: human labeled About 12,000 messages labeled with Amazon Mechanical Turk: Twitter, MySpace, YouTube and Digg comments, BBC and Runners World forums Dataset 2: unlabeled Complete snapshot from Twitter (collected in 2009) ~2 billion tweets Extracted tragedies, disasters, movie releases, and political events Focus on the English messages Methodology
15
Ifeel System & Conclusions Methods & Methodology Comparing & Combining
16
What is the coverage of each method?
17
Coverage vs. Prediction Performance Emoticons: best prediction and worst coverage SentiStrenght: second in prediction and third in coverage
18
Prediction Performance across datasets TwitterMySpaceYoutubeBBCDiggRunners World PANAS-t0.6430.9580.7370.3960.4760.698 Emoticons0.9290.9520.9480.3590.9390.947 SASA0.7500.7100.7540.3460.5020.744 SenticNet0.7570.8840.8100.2510.4240.826 SentiWordNet0.7210.8370.7890.3840.4560.780 SentiStrength0.8430.9150.8940.5320.6320.778 Happiness Index0.7740.9250.8210.2460.3930.832 LIWC0.6900.8620.7310.3770.5850.895 Strong variations across datasets
19
Prediction Performance across datasets TwitterMySpaceYoutubeBBCDiggRunners World PANAS-t0.6430.9580.7370.3960.4760.698 Emoticons0.9290.9520.9480.3590.9390.947 SASA0.7500.7100.7540.3460.5020.744 SenticNet0.7570.8840.8100.2510.4240.826 SentiWordNet0.7210.8370.7890.3840.4560.780 SentiStrength0.8430.9150.8940.5320.6320.778 Happiness Index0.7740.9250.8210.2460.3930.832 LIWC0.6900.8620.7310.3770.5850.895 Worst performance for datasets containing formal text
20
Polarity Analysis Detected only positive Sentiments! Methods tend to detect more positive sentiments Positive as positive is usually greater than negative as negative Even disasters were classified predominantly as positive
21
Combines 7, of the 8 methods analyzed Emoticons, SentiStrength, Happiness Index, SenticNet, SentiWordNet, PANAS-t, SASA Removed LIWC (paid method) Weights are distributed according to the rank of prediction performance: Higher weight for the method with highest F-measure Emoticon received weight 7 and PANAS-t 1 Combined Method
22
Best coverage and second in prediction performance 4 methods combined are sufficient
23
Ifeel System & Conclusions Methods & Methodology Comparing & Combining
24
Example for: “Feeling too happy today :)“ Deploys all methods, except LIWC Allows to evaluate an entire file Allows to change parameters on the methods iFeel (Beta version) www.ifeel.dcc.ufmg.br www.ifeel.dcc.ufmg.br
25
We compare 8 popular sentiment analysis methods for detecting polarity No method had the best results in all analysis Prediction performance largely varies according to the dataset Most methods are biased towards positivity We propose a combined method Achieves high coverage and high prediction performance Ifeel: methods deployed and easily available Future work: Compare others methods like POMS and EMOLEX Conclusions
26
Questions? www.dcc.ufmg.br/~fabricio www.ifeel.dcc.ufmg.br fabricio@dcc.ufmg.br Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.