Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.

Slides:



Advertisements
Similar presentations
GermanPolarityClues A Lexical Resource for German Sentiment Analysis
Advertisements

Farag Saad i-KNOW 2014 Graz- Austria,
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Identifying Sarcasm in Twitter: A Closer Look
Distant Supervision for Emotion Classification in Twitter posts 1/17.
SentiStrength: Sentiment Strength Detection in MySpace and Twitter Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Scalable Text Mining with Sparse Generative Models
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
by B. Zadrozny and C. Elkan
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Sheng-Tun Li a,b,*, Fu-Ching Tsai a 2013, KBS A fuzzy conceptualization model for.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Breaking the News: First Impressions Matter on Online News
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Prediction of Influencers from Word Use Chan Shing Hei.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Poorva Potdar Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Class Imbalance in Text Classification
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
LIWC2001 Diane Fitzpatrick Jennelle Franz. LIWC20012 LIWC2001 Linguistic Inquiry and Word Count Built-in dictionary (but can input own) Built-in dictionary.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect? Alexander Osherenko, Elisabeth André University of Augsburg.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Lecture: Sentiment Analysis Krista Lagus Statistical Natural Language Processing course at Aalto
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
More than words: Social network’s text mining for consumer brand sentiments Expert Systems with Applications 40 (2013) 4241–4251 Mohamed M. Mostafa Reporter.
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
IDENTIFYING GREAT TEACHERS THROUGH THEIR ONLINE PRESENCE Evanthia Faliagka, Maria Rigou, Spiros Sirmakessis.
Experience Report: System Log Analysis for Anomaly Detection
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Lecture: Sentiment Analysis
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Sentiment Analysis: The Emotionality of Discourse .
Sentiment analysis algorithms and applications: A survey
Grey Sentiment Analysis
Sentiment analysis tools
MID-SEM REVIEW.
Weichuan Dong Qingsong Liu Zhengyong Ren Huanyang Zhao
Proportion of Original Tweets
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
iSRD Spam Review Detection with Imbalanced Data Distributions
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Introduction to Sentiment Analysis
Austin Karingada, Jacob Handy, Adviser : Dr
Presentation transcript:

Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining Sentiment Analysis Methods

 Key component of a new wave of applications that explore social network data  Summary of public opinion about:  politics, products, services (e.g. a new car, a movie), etc.  Monitor social network data (in real-time)  Common as polarity analysis (positive or negative) Sentiment Analysis on Social Networks

 Which method to use?  There are several methods proposed for different contexts  There are several popular methods  Validations based on examples, comparisons with baseline, with use of limited datasets  There is not a proper comparison among methods  Advantages? Disadvantages? Limitations? Sentiment Analysis Methods

 Compare 8 popular sentiment analysis methods  Focus on the task of detecting polarity: positive vs. negative  Combine methods  Deploy the methods in a system This talk

Ifeel System & Conclusions Methods & Methodology Comparing & Combining

 Extracted from instant messages services  Skype, MSN, Yahoo Messages, etc.  Grouped as positive and negative Emoticons

 Lexical method (paid software)  Allows to optimize the lexical dictionary -> we used the default  Measures various emotional, cognitive, and structural components  We only consider sentiment-relevant categories such as positivity, negativity Linguistic Inquiry and Word Count (LIWC)

 Lexical approach based on the WordNet dictionary  Groups words in synonyms  Detects positivity, negativity, and neutrality of texts SentiWordNet

 Lexical method adapted from a psychometric scale  Consists of a dictionary of adjectives associated to sentiments  Positive: Joviality, assurance, serenity, and surprise  Negative: Fear, sadness, guilt, hostility, shyness and fatigue PANAS-t

 Uses a well-known lexical dictionary namely Affective Norms for English Words (ANEW)  Produces a scale of happiness  1 (extremely happy) to 9 (extremely unhappy)  We consider [1..5) for negative and [5..9] for positive Happiness Index

 Combines 9 supervised machine learning methods  Estimates the strength of positive and negative sentiment in a text  We used the trained model provided by the authors SentiStrengh

 Machine learning method, trained with Naïve Bayes’ model  Trained model implemented as a python library  Classify tweets in JSON format for positive, negative, neutral and unsure SAIL/AIL Sentiment Analyzer (SASA)

 Extract cognitive and affective information using natural language processing techniques  Uses the affective categorization model Hourglass of Emotions  Provides an approach that classify messages as positive and negative SenticNet

 Comparison of coverage and prediction performance across different datasets  Dataset 1: human labeled  About 12,000 messages labeled with Amazon Mechanical Turk:  Twitter, MySpace, YouTube and Digg comments, BBC and Runners World forums  Dataset 2: unlabeled  Complete snapshot from Twitter (collected in 2009) ~2 billion tweets  Extracted tragedies, disasters, movie releases, and political events  Focus on the English messages Methodology

Ifeel System & Conclusions Methods & Methodology Comparing & Combining

What is the coverage of each method?

Coverage vs. Prediction Performance  Emoticons: best prediction and worst coverage  SentiStrenght: second in prediction and third in coverage

Prediction Performance across datasets TwitterMySpaceYoutubeBBCDiggRunners World PANAS-t Emoticons SASA SenticNet SentiWordNet SentiStrength Happiness Index LIWC  Strong variations across datasets

Prediction Performance across datasets TwitterMySpaceYoutubeBBCDiggRunners World PANAS-t Emoticons SASA SenticNet SentiWordNet SentiStrength Happiness Index LIWC  Worst performance for datasets containing formal text

Polarity Analysis Detected only positive Sentiments!  Methods tend to detect more positive sentiments  Positive as positive is usually greater than negative as negative Even disasters were classified predominantly as positive

 Combines 7, of the 8 methods analyzed  Emoticons, SentiStrength, Happiness Index, SenticNet, SentiWordNet, PANAS-t, SASA  Removed LIWC (paid method)  Weights are distributed according to the rank of prediction performance:  Higher weight for the method with highest F-measure  Emoticon received weight 7 and PANAS-t 1 Combined Method

 Best coverage and second in prediction performance  4 methods combined are sufficient

Ifeel System & Conclusions Methods & Methodology Comparing & Combining

 Example for:  “Feeling too happy today :)“  Deploys all methods, except LIWC  Allows to evaluate an entire file  Allows to change parameters on the methods iFeel (Beta version)

 We compare 8 popular sentiment analysis methods for detecting polarity  No method had the best results in all analysis  Prediction performance largely varies according to the dataset  Most methods are biased towards positivity  We propose a combined method  Achieves high coverage and high prediction performance  Ifeel: methods deployed and easily available  Future work: Compare others methods like POMS and EMOLEX Conclusions

Questions? Thank you!