Download presentation
Presentation is loading. Please wait.
Published byJovani Brereton Modified over 10 years ago
1
1 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal DaveSteve LawrenceDavid M. Pennock IBM Google Overture Work done at NEC Laboratories
2
2 Problem Many reviews spread out across Web –Product-specific sites –Editorial reviews at C|net, magazines, etc. –User reviews at C|net, Amazon, Epinions, etc. –Reviews in blogs, author sites –Reviews in Google Groups (Usenet) No easy aggregation, but would like to get numerous diverse opinions Even at one site, synthesis difficult
3
3 Solution! Find reviews on the web and mine them… –Filtering (find the reviews) –Classification (positive or negative) –Separation(identify and rate specific attributes)
4
4 Existing work Classification varies in granularity/purpose: Objectivity –Does this text express fact or opinion? Words –What connotations does this word have? Sentiments –Is this text expressing a certain type of opinion?
5
5 Objectivity classification Best features: relative frequencies of parts of speech (Finn 02) Subjectivity is…subjective (Wiebe 01)
6
6 Word classification Polarity vs. intensity Conjunctions –(Hatzivassiloglou and McKeown, 97) Colocations –(Turney and Littman 02) Linguistic colocations –(Lin 98), (Pereira 93)
7
7 Sentiment classification Manual lexicons –Fuzzy logic affect sets (Subasic and Huettener 01) –Directionality – block/enable (Hearst 92) –Common Sense and emotions (Liu et al 03) Recommendations –Stocks in Yahoo (Das and Chen 01) –URLs in Usenet (Terveen 97) –Movie reviews in IMDb (Pang et al 02)
8
8 Applied tools AvaQuest’s GoogleMovies –http://24.60.188.10:8080/demos/ GoogleMovies/GoogleMovies.cgihttp://24.60.188.10:8080/demos/ GoogleMovies/GoogleMovies.cgi Clipping services –PlanetFeedback, Webclipping, eWatch, TracerLock, etc. –Commonly use templates or simple searches Feedback analysis: –NEC’s Survey Analyzer –IBM’s Eclassifier Targeted aggregation –BlogStreet, onfocus, AllConsuming, etc. –Rotten Tomatoes
9
9 Our approach
10
10 Train/test Existing tagged corpora! C|net – two cases –even, mixed 10-fold test 4,480 reviews 4 categories –messy, by category 7-fold test 32,970 reviews
11
11 Other corpora Preliminary work with Amazon –(Potentially) easier problem Pang et al IMDb movie reviews corpus –Our simple algorithm insignificantly worse 80.6 vs. 82.9 (unigram SVM) –Different sort of problem
12
12 Domain obstacles Typos, user error, inconsistencies, repeated reviews Skewed distributions – 5x as many positive reviews – 13,000 reviews of MP3 players, 350 of networking kits – fewer than 10 reviews for ½ of products Variety of language, sparse data – 1/5 of reviews have fewer than 10 tokens – More than 2/3 of terms occur in fewer than 3 documents Misleading passages (ambivalence/comparison) – Battery life is good, but... Good idea, but...
13
13 Base Features Unigrams – great, camera, support, easy, poor, back, excellent Bigrams – can't beat, simply the, very disappointed, quit working Trigrams, substrings – highly recommend this, i am very happy, not worth it, i sent it back, it stopped working Near – greater would, find negative, refund this, automatic poor, company totally
14
14 Generalization Replacing product names,(metadata) –the iPod is an excellent ==> the _productname is an excellent domain-specific words,(statistical) – excellent [picture|sound] quality ==> excellent _producttypeword quality rare words (statistical) – overshadowed by ==> _unique by
15
15 Generalization (2) Finding synsets in WordNet –anxious ==> 2377770 –nervous ==> 2377770 Stemming –was ==> be –compromised ==> compromis
16
16 Qualification Parsing for colocation – this stupid piece of garbage ==> (piece(N):mod:stupid(A))... – this piece is stupid and ugly ==> (piece(N):mod:stupid(A))... Negation –not good or useful ==> NOTgood NOTor NOTuseful
17
17 Optimal features Confidence/precision tradeoff Try to find best-length substrings –How to find features? Marginal improvement when traversing suffix tree Information gain worked best –How to score? [p(C|w) – p(C’|w)] * df during construction Dynamic programming or simple during use –Still usually short ( n < 5 ) Although… “i have had no problems with”
18
18 Clean up Thresholding –Reduce space with minimal impact –But not too much! (sparse data) –> 3 docs Smoothing: allocate weight to unseen features, regularize other values –Laplace, Simple Good-Turing, Witten-Bell tried –Laplace helps Naïve Bayes
19
19 Scoring SVMlight, naïve Bayes –Neither does better in both tests –Our scoring is simpler, less reliant on confidence, document length, corpus regularity Give each word a score ranging –1 to 1 –Our metric: –Tried Fisher discriminant, information gain, odds ratio If sum > 0, it’s a positive review Other probability models – presence Bootstrapping barely helps score(f i ) = p(f i |C) – p(f i |C’) p(f i |C) + p(f i |C’)
20
20 Reweighting Document frequency –df, idf, normalized df, ridf –Some improvement from logdf and gaussian (arbitrary) Analogues for term frequency, product frequency, product type frequency Tried bootstrapping, different ways of assigning scores
21
21 Test 1 (clean) Best Results Baseline85.0 % Naïve Bayes + Laplace87.0 % Weighting (log df)85.7 % Weighting (Gaussian tf)85.7 % Baseline88.7 % + _productname88.9 % Unigrams Trigrams
22
22 Test 2 (messy) Best Results Baseline82.2 % Prob. after threshold83.3 % Presence prob. model83.1 % + Odds ratio83.3 % Baseline84.6 % + Odds ratio85.4 % + SVM85.8 % Baseline85.1 % + _productname85.3 % Unigrams Bigrams Variable
23
23 Extraction Can we use the techniques from classification to mine reviews?
24
24 Obstacles Words have different implications in general usage –“main characters” is negatively biased in reviews, but has innocuous uses elsewhere Much non-review subjective content –previews, sales sites, technical support, passing mentions and comparisons, lists
25
25 Results Manual test set + web interface Substrings better…bigrams too general? Initial filtering is bad –Make “review” part of the actual search –Threshold sentence length, score –Still get non-opinions and ambiguous opinions Test corpus, with red herrings removed, 76% accuracy in top confidence tercile
26
26
27
27
28
28 Attribute heuristics Look for _producttypewords Or, just look for things starting with “the” –Of course, some thresholding –And stopwords
29
29 Results No way of judging accuracy, arbitrary For example, Amstel Light –Ideally: taste, price, image... –Statistical method: beer, bud, taste, adjunct, other, brew, lager, golden, imported... –Dumb heuristic: the taste, the flavor, the calories, the best…
30
30
31
31
32
32 Future work Methodology –More precise corpus –More tests, more precise tests Algorithm –New combinations of techniques –Decrease overfitting to improve web search –Computational complexity –Ambiguity detection? New pieces –Separate review/non-review classifier –Review context (source, time) –Segment pages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.