Download presentation
Presentation is loading. Please wait.
Published byEmery Owen Modified over 8 years ago
1
Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I mean, I admit it’s a really awful movie.”
2
Introduction Compared results to: Simplistic, human methods Topic generation Three machine learning classification models: Naïve Bayes (NB) Maximum Entropy (ME) Support Vector Machines (SVM) Interesting feature-creating mechanisms Framework for future analysis
3
Framework Movie Reviews Develop Features NB ME SVM Extract Insights Evaluate Results Training Model
4
Prior Work Prior classification based on: source/source style genre knowledge-based semantic orientation
5
The Data Internet Movie Database (IMDB) archive Limited data to: Reviews with author rating Positive and negative reviews (no neutral) 19 positive, 19 negative reviews per author Interim Dataset: 752 negative reviews 1301 positive reviews 144 reviewers represented Final Dataset: 700 positive, 700 negative (uniform distribution)
6
Baseline Crafted word lists using independent CS grad students Positive vs. negative word count Positive ListNegative ListAccuracyTies Human 1dazzling, brilliant, phenomenal, excellent, fantastic suck, terrible, awful, unwatchable, hideous 58%75% Human 2gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting bad, clichéd, sucks, boring, stupid, slow 64%39% Frequency counts (including test data) Hand-picked words Positive ListNegative ListAccuracyTies Human 3 + stats love, wonderful, best, great, superb, still, beautiful bad, worst, stupid, waste, boring, ?, ! 69%16%
7
Features Unigrams appear once, twice, or thrice removed added negation tags (not, didn’t, isn’t) Bigrams matched number of unigrams no negation Parts of Speech Position within review First quarter Middle half Last quarter
8
Models Naïve Bayes
9
Models Naïve Bayes Maximum Entropy
10
Models Naïve Bayes Maximum Entropy Support Vector Machines
11
Results Feature Type#Frequency / Presence NBMESVM (1)Unigrams > 316165Freq.78.7N/A72.8 (2)Unigrams > 316165Pres.81.080.482.9 (3)Unigrams > 3 + Top Bigrams32330Pres.80.680.882.7 (4)Top Bigrams16165Pres.77.377.477.1
12
Results Feature Type#Frequency / Presence NBMESVM (1)Unigrams > 316165Freq.78.7N/A72.8 (2)Unigrams > 316165Pres.81.080.482.9 (3)Unigrams > 3 + Top Bigrams32330Pres.80.680.882.7 (4)Top Bigrams16165Pres.77.377.477.1 (5)Unigrams > 3 + POS16695Pres.81.580.481.9 (6)Adjectives2633Pres.77.077.775.1 (7)Top Unigrams2633Pres.80.381.081.4 (8)Unigrams > 3 + Position22430Pres.81.080.181.6
13
Insights SVM, but only 1-2% better Not comparable to topic-based categorization models Simple unigram presence the best Presence > Frequency, not like topic-based Uncovered “thwarted expectations” narrative “Okay, I’m really ashamed of it, but I enjoyed it. I mean, I admit it’s a really awful movie.”
14
Future Work Features that indicate sentences are on topic Weighted by if related to overall film “the whole is not necessarily the sum of the parts” Important, because “thwarted- expectations” rhetoric present in many types of text “This movie was wonderful, said no one ever.” – Don
15
Conclusion Sentiment classification is a growing task, especially since 2002 Weighted sentence interesting idea Our final project: movie scripts, anchored by reviews
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.