Presentation is loading. Please wait.

Presentation is loading. Please wait.

Amazon review utility estimator. Overview  Goal: To determine the “usefulness” of Amazon.com reviews  Using Mallet classifiers  Several custom features.

Similar presentations


Presentation on theme: "Amazon review utility estimator. Overview  Goal: To determine the “usefulness” of Amazon.com reviews  Using Mallet classifiers  Several custom features."— Presentation transcript:

1 Amazon review utility estimator

2 Overview  Goal: To determine the “usefulness” of Amazon.com reviews  Using Mallet classifiers  Several custom features  If accurate, this system could be applied beyond Amazon, including other product reviews or even Slashdot/Digg comments.

3 Reviews  Used Amazon ECS: Collected large number of reviews over 4 categories: Textbooks, Digital Cameras, Music, DVD  Textbooks: 24,419 reviews with over 5 votes  Digital Cameras: 22,566  Music: 43,328  DVD: 132,208

4 Regression?  All of the length features seem to have a trend when grouped in buckets  DVD data Avg TotalAvg WordAvg Para 0-25%133.965.581.84 26-50%197.045.662.33 51-75%248.725.682.79 76-100%281.665.722.84

5 Regression  R 2 ~.3 Rating # of words

6 Regression Rating Avg Sentence Length

7 Features  Bag of words  Average: length, sentence length, word length  % of words that are stop words  # of spelling errors  # of paragraphs  Pronouns, articles, Proper nouns etc.  Punctuation  History

8 Stuff We Learned  Some good reviews are hard to find “e-toys has this for 19.99” rated helpful by 17/21 people.  And some people are just stupid “and there you have it. That's the secret. ” 77%... “On DVD, I'll buy this NOW! Not on VHS...Jezus...” 78%...  We attempted manually classifying ~100 reviews In 4 buckets around 30% accuracy In 2 buckets around 55%.... abstract.cs.washington.edu/~kylej1/quiz.php

9 Cont.  Trade off between Precision and Recall: Many features increase precision but hurt recall The range of good reviews is very broad  Word Count / Sentence Length / % stopwords have biggest impact Precision +5%, Recall -8%  Diminishing returns..

10 Cont.  Precision in the High 80s with the right combination of features Recall suffers, drops to between 40-50%  Experimenting with multiple classifiers in series. To boost recall without destroying precision Similar to Boosting.

11 Future  When should computer override customer rating? Amazon has huge # of “Labeled” data…but the labels are sometimes poor Review Quality is very subjective Weight based on # of total votes? ○ Some concerns with this  Bias detection Positive or Negative impact?

12 End  Questions?


Download ppt "Amazon review utility estimator. Overview  Goal: To determine the “usefulness” of Amazon.com reviews  Using Mallet classifiers  Several custom features."

Similar presentations


Ads by Google