Sentiment Analyzer Using a Multi-Level Classifier Tommy Tsai and Sam Yam CS224N Final Project Spring 2006
Motivation Humans can infer a document’s sentiment orientation from high-level features Sentiment of subjective sentences only Local sentiments of select portions of the document Length of document (domain-specific) Domain of documents analyzed: 2000 IMDB movie reviews 1000 positive, 1000 negative (classified by hand) Hard Most mention both positive and negative aspects Many are written in a sarcastic tone
Architecture Three stages Subjectivity filter N-gram-based classifier Input: Full review Output: Filtered review Sentence-level sentiment classifier Also N-gram-based Input: Filtered review Output: Positive/negative classification and scores for each sentence Document-level sentiment classifier Support vector machine classifier Inputs: Filtered review + scores and classification for each sentence Output: Positive/negative classification for the entire review
Stage 3 SVM Classifier Support vector machines Proven to have good performance for machine-learning classification problems Maximization of “margin”: absolute distances between classification hyperplane and closest data points on either side Features that worked well: Average sentence-level “positive” and “negative” classification scores Document-wide ratio of positive sentences to all sentences (PSRs) Local PSRs for each of 5 buckets in the review Number of sentences in the review
Results and Conclusions Hybrid model worked well our multi-level model does improve classifier’s discriminatory power Character N-gram models worked better than token N-gram models Negative reviews are harder to classify! 30 video game reviews: 27/30 = 90% accuracy