Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I.

Similar presentations


Presentation on theme: "Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I."— Presentation transcript:

1 Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I mean, I admit it’s a really awful movie.”

2 Introduction Compared results to: Simplistic, human methods Topic generation Three machine learning classification models: Naïve Bayes (NB) Maximum Entropy (ME) Support Vector Machines (SVM) Interesting feature-creating mechanisms Framework for future analysis

3 Framework Movie Reviews Develop Features NB ME SVM Extract Insights Evaluate Results Training Model

4 Prior Work  Prior classification based on:  source/source style  genre  knowledge-based  semantic orientation

5 The Data  Internet Movie Database (IMDB) archive  Limited data to:  Reviews with author rating  Positive and negative reviews (no neutral)  19 positive, 19 negative reviews per author  Interim Dataset:  752 negative reviews  1301 positive reviews  144 reviewers represented  Final Dataset: 700 positive, 700 negative (uniform distribution)

6 Baseline  Crafted word lists using independent CS grad students  Positive vs. negative word count Positive ListNegative ListAccuracyTies Human 1dazzling, brilliant, phenomenal, excellent, fantastic suck, terrible, awful, unwatchable, hideous 58%75% Human 2gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting bad, clichéd, sucks, boring, stupid, slow 64%39%  Frequency counts (including test data)  Hand-picked words Positive ListNegative ListAccuracyTies Human 3 + stats love, wonderful, best, great, superb, still, beautiful bad, worst, stupid, waste, boring, ?, ! 69%16%

7 Features  Unigrams  appear once, twice, or thrice removed  added negation tags (not, didn’t, isn’t)  Bigrams  matched number of unigrams  no negation  Parts of Speech  Position within review  First quarter  Middle half  Last quarter

8 Models Naïve Bayes

9 Models Naïve Bayes Maximum Entropy

10 Models Naïve Bayes Maximum Entropy Support Vector Machines

11 Results Feature Type#Frequency / Presence NBMESVM (1)Unigrams > 316165Freq.78.7N/A72.8 (2)Unigrams > 316165Pres.81.080.482.9 (3)Unigrams > 3 + Top Bigrams32330Pres.80.680.882.7 (4)Top Bigrams16165Pres.77.377.477.1

12 Results Feature Type#Frequency / Presence NBMESVM (1)Unigrams > 316165Freq.78.7N/A72.8 (2)Unigrams > 316165Pres.81.080.482.9 (3)Unigrams > 3 + Top Bigrams32330Pres.80.680.882.7 (4)Top Bigrams16165Pres.77.377.477.1 (5)Unigrams > 3 + POS16695Pres.81.580.481.9 (6)Adjectives2633Pres.77.077.775.1 (7)Top Unigrams2633Pres.80.381.081.4 (8)Unigrams > 3 + Position22430Pres.81.080.181.6

13 Insights  SVM, but only 1-2% better  Not comparable to topic-based categorization models  Simple unigram presence the best  Presence > Frequency, not like topic-based  Uncovered “thwarted expectations” narrative  “Okay, I’m really ashamed of it, but I enjoyed it. I mean, I admit it’s a really awful movie.”

14 Future Work  Features that indicate sentences are on topic  Weighted by if related to overall film  “the whole is not necessarily the sum of the parts”  Important, because “thwarted- expectations” rhetoric present in many types of text  “This movie was wonderful, said no one ever.” – Don

15 Conclusion  Sentiment classification is a growing task, especially since 2002  Weighted sentence interesting idea  Our final project: movie scripts, anchored by reviews


Download ppt "Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I."

Similar presentations


Ads by Google