Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie.

Similar presentations


Presentation on theme: "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie."— Presentation transcript:

1 Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie Mellon University ACL 2005

2 About this problem To label scales –Differ from “thumbs up” or not –Differ from identifying opinion strength –Differ from ranking (+classification) Movie reviews from Rotten Tomatoes Study on human subjects Three algorithms

3 Problem validation and formulation (1) Check how human performs to compare with machine’s performance Use reviews of one author to factor out the effects of cross-author divergence A notch equals half star/four or five stars; 10 points/100 points Random-choice baseline 33%

4 Problem validation and formulation (2) A three-class task seems like one that most people would do quite well at. For balance issue, reduce their problem from 5-class to 4-class

5 A scale dataset Movie reviews from four corpora Remove rating indicators Remove objective sentences A total of 1,770, 902, 1,307, 1,027 documents of four authors

6 Algorithm (1) Using SVM light package Algorithm 1: One-vs-all (OVA) –An SVM binary classifier distinguishing label l to label not-l Algorithm 2: Regression –Find the hyperplane best fits the training data (within distance epsilon incur no loss) –Similar items, similar labels

7 Algorithm (2) Algorithm 3: Metric labeling –Algorithm 1 or 2 + Similarity measure –Distance metric on labels –K nearest neighbors of item x according to sim –Item-similarity function sim –Locally-weighted learning

8 Algorithm (3) Finding a label-correlated item-similarity function: vocabulary overlap (ex. Cosine) is not suitable.

9 Algorithm (PSP) Using PSP (positive-sentence percentage) A NB classifier trained on 10,062 movie- review snippets (exact one sentence long striking) Apply this classifier on their test data

10 Algorithm (PSP) = Distinguish terms: appear more than 20 times and appear in a single class 50% or more

11 Experiment Results (1)

12 Experiment Results (2) Adding PSP is useful, however, PSP it self is not good enough.

13 Multi-authors Get comparable results

14 Future Work Varying the kernel in SVM Use mixture models (combine “positive” and “negative” language models) to capture class relationships. Multi-class but no-scale-based categorization problem (positive vs. negative vs. neutral) Transductive setting (a small amount of labeled data and uses relationships between unlabeled items), well-suited to the metric- labeling approach


Download ppt "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales Bo Pang and Lillian Lee Cornell University Carnegie."

Similar presentations


Ads by Google