Sentiment/opinion analysis Author: Martin Mikula Supervisor: Xiaoying Sharon Gao
Which one of you have read a review before you bought a new product?
Outline Motivation Tasks Domains Approaches Lexicons My lexicon approach
Motivation the shopping behaviour analysis the analysis towards politicians and government policy to contact people with same opinions WordCupinion the medicine patient‘s health analysis
Sentiment vs. emotion analysis sentiment/opinion analysis is a given piece of text positive, negative or neutral? the text may be a sentence, a tweet, an SMS message, a customers review, a document and so on emotion analysis what emotion is being expressed in a given piece of text? basic emotions: joy, trust, fear, anger other emotions: guilt, pride, frustration, optimism
Tasks
Tasks what is the sentiment of the speaker/writer is the speaker explicitly expressing sentiment what sentiment is evoked in the listener/reader what is the sentiment of an entity mentioned in the text consider the above questions with the examples below General Tapioca was ruthlessly executed today. Mass-murderer General Tapioca finally killed in battle. General Tapioca was killed in an explosion.
Domains newspaper texts novels e-mails customer reviews blog posts SMS messages tweets facebook posts ... and so on
Quirks of Social Media Texts informal short (140 characters for tweets or SMS messages) abbreviations and shortenings wide array of topics spelling mistakes and creative spelling special strings (hashtags, emoticons, conjoined words) huge volume (over 500 million tweets a day) contain meta-information (date, location, links) often express sentiment
Approaches the lexicon based approaches use lexicons – the lists of positive and negative words the machine learning approaches use machine learning techniques for sentiment analysis the hybrid approaches combine the lexicon based approaches with the machine learning techniques
Sentiment lexicons the lists of words may contain weights for every word may contain shifters intensification negation How do you want to use the dictionary?
Sentiment lexicons manually created automatically created General Inquirer (1966) – 3600 words Turney and Littman (2003) MPQA (2005) – 8000 words SentiWordNet (2006) – synsets Hu a Liu lexicon (2004) – 6800 words MSOL (2009) – 60,000 words NRC emotion lex. (2010) – 14,000 words Hashtag sentiment lexicon (2013) – 220,000 unigrams and bigrams Afinn (2009-2011) – 2400 words Sentiment140 (2013) – 330,000 unigrams and bigrams MaxDiff (2014) – 1500 words
My sentiment lexicon range from -3 to 3 customized Lancaster stemming algorithm Word Weight Subjectivity good 1 p (positive) bad -1 n (negative) quite 1,25 i (intensifier) not o (opposite) good-goodness care-careless
Results contains 2500 reviews influence of negation Dictionary Accuracy (%) posit + negat 86.2 intensification 85.9 negation 86.1 all together 85.7 contains 2500 reviews 2324 positive reviews 176 negative reviews influence of negation contains 5242 reviews 2572 positive reviews 2668 negative reviews current accuracy is aroud 71% Dictionary Accuracy (%) shift negation 60.7 switch negation 60.6 both hegations 61 Dictionary Accuracy (%) posit + negat 55.8 all together 61.6