Download presentation
Published byDale Bryant Modified over 9 years ago
1
Opinion Observer: Analyzing and Comparing Opinions on the Web
Bing Liu, Minqing Hu, Junsheng Cheng Department of Computer Science University of Illinois at Chicago WWW05
2
Abstract This paper focuses on online customer reviews of products.
Two contributions propose a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system “Opinion Observer” is also implemented. a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews.
3
Introduction Opinion Observer
We propose a new technique to identify product features from Pros and Cons in this format: Pros, Cons and detailed review. Pros and Cons tend to be very brief. e.g., “heavy, bad picture quality, battery life too short” We do not analyze detailed reviews.
4
Related Work Hu, M., and Liu, B. 2004.
Perform the same tasks based on unsupervised itemset mining. Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T Compare information of different products in a category through search to find the reputation of the products. Bourigault, D. 1995;Daille, B. 1996;Jacquemin, C., Bourigault, D. 2001;Justeson, J., Katz, S. 1995 Terminology finding tasks. Using noun phrases are not sufficient for finding product features. Bunescu, R., Mooney, R. 2004;Etzioni et al. 2004;Freitag, D., McCallum, A. 2000;Lafferty, J., McCallum, A., Pereira, F. 2001;Rosario, B., and Hearst, M. 2004 Entity extraction tasks. Product features are usually not named entities. Also, our extraction work uses short sentence segments rather than full sentences.
5
Related Work (cont.) Hearst, M. 1992;Das, S. and Chen, M., 2001;Tong, R. 2001;Turney, P. 2002;Pang, B., Lee, L., and Vaithyanathan, S., 2002;Dave, K., Lawrence, S., and Pennock, D. 2003;Agrawal, R., Rajagopalan, S., Srikant, R., Xu, Y. 2003;Hatzivassiloglou, V., and Wiebe, J. 2000;Wiebe, J., Bruce, R., and O’Hara, T. 1999 Sentiment classification tasks. They do not identify features commented by customers or what customers praise or complain about.
6
System Architecture
7
Visualizing Opinion Comparison
8
Problem Statement P = {P1, P2, …, Pn} : a set of products.
Ri = {r1, r2, …, rk} : a set of reviews of product Pi. Explicit feature : a product feature appears in rj. Implicit feature : not appear in rj but is implied. “Battery life is too short” “Too big” → size In order to visually compare consumer opinions on a set of products, we need to analyze the reviews in Ri of each product Pi (1) to find all the explicit and implicit product features on which reviewers have expressed their (positive or negative) opinions. (2) to produce the positive opinion set and the negative opinion set for each feature.
9
Automated Opinion Analysis
Observation: Each sentence segment contains at most one product feature. Sentence segments are separated by ‘,’, ‘.’, ‘and’, and ‘but’. Cons: Pros:
10
Prepare a Training Dataset
Manually labeling a large number of reviews: POS tagging, remove digits. “<N> Battery <N> usage” “<V> included <N> MB <V>is <Adj> stingy” Replace feature words with [feature]. “<N> [feature] <N> usage” “<V> included <N> [feature] <V> is <Adj> stingy” Use 3-gram to produce shorter segments. “<V> included <N> [feature] <V> is <Adj> stingy” → “<Adj> included <N> [feature] <V> is” “<N> [feature] <V> is <Adj> stingy” Distinguish duplicate tags: “<N1> [feature] <N2> usage” Perform word stemming The resulting sentence (3-gram) segments are saved in a transaction file.
11
Rule Generation Association rule mining model
I = {i1, …, in} : a set of items. D : a set of transactions. Each transaction consists of a subset of items in I. Association rule: X → Y, where X ⊂ I, Y ⊂ I, and X ∩Y = ∅. The rule X → Y holds in D with confidence c if c% of transactions in D that support X also support Y. The rule has support s in D if s% of transactions in D contain X ∪ Y. We use the association mining system CBA (Liu, B., Hsu, W., Ma, Y. 1998) to mine rules. We use 1% as the minimum support. Some example rules: <N1>, <N2> → [feature] <V>, easy, to → [feature] <N1> → [feature], <N2> <N1>, [feature] → <N2>
12
Post-processing We only need rules that have [feature] on the RHS.
We need to consider the sequence of items in the LHS. e.g., “<V>, easy, to → [feature]” should be “easy, to, <V> → [feature]” Checking each rule against the transaction file to find the possible sequences. Remove those derived rules with confidence < 50%. Finally, we generate language patterns. e.g.,
13
Extraction of Product Features
The resulting patterns are used to match and identify candidate features from new reviews after POS tagging. A generated pattern does not need to match a part of a sentence segment with the same length as the pattern. e.g., pattern “<NN1> [feature] <NN2>” can match the segment “size of printout”. If a sentence segment satisfies multiple patterns, we normally use the pattern that gives the highest confidence. For those sentence segments that no pattern applies, we use nouns or noun phrases as features. In the cases that a sentence segment has only a single word, e.g., “heavy” and “big”, we treat these single words as candidate features.
14
Feature Refinement Two main mistakes made during extraction:
Feature conflict There is a more likely feature in the sentence segment but not extracted by any pattern. e.g., “slight hum from subwoofer when not in use” “hum” is found to be the feature but not “subwoofer”. How to find this? “subwoofer” was found as candidate features in other reviews, but “hum” was never.
15
Feature Refinement (cont.)
Refinement strategies: Frequent-noun 1. The generated product features together with their frequency counts are saved in a candidate feature list. 2. For each sentence segment, if there are two or more nouns, we choose the most frequent noun in the candidate feature list. Frequent-term For each sentence segment, we simply choose the word/phrase (it does not need to be a noun) with the highest frequency in the candidate feature list.
16
Mapping to Implicit Features
In tagging the training data for mining rules, we also tag the mapping of candidate features to their actual features. e.g., when we tag “heavy” in the sentence segment below as a feature word we also record a mapping of “heavy” to <weight> “too heavy” Rule mining can be used to generate mapping rules.
17
Grouping Synonyms Grouping features with similar meanings.
e.g., “photo”, “picture” and “image” all refers to the same feature in digital camera reviews. Employ WordNet to check if any synonym groups/sets exist among the features. Choose only the top two frequent senses of a word for finding its synonyms.
18
Experiments Training and test review data Evaluation measure
We manually tagged a large collection of reviews of 15 electronic products from epinions.com. 10 of them are used as the training data to mine patterns, and the rest are used as testing. Evaluation measure recall (r) and precision (p) n : the total number of reviews of a particular product. ECi : the number of extracted features from review i that are correct. Ci : the number of actual features in review i. Ei : the number of extracted features from review i.
19
Experiments (cont.) The frequent-term strategy gives better results than the frequent-noun strategy. some features are not expressed as nouns. POS tagger makes mistakes.
20
Experiments (cont.) There are still some adjectives and verbs appear as implicit features. The techniques in FBS are not suitable for Pros and Cons, which are mostly short phrases or incomplete sentences
21
Experiments (cont.) The results for Pros are better than those for Cons. people tend to use similar words like ‘excellent’, ‘great’, ‘good’ in Pros. In contrast, the words that people use to complain differ a lot in Cons. # pattern for Pros: 117 # pattern for Cons: 22
22
Conclusions We proposed a novel visual analysis system to compare consumer opinions of multiple products. We designed a supervised pattern discovery method to automatically identify product features from Pros and Cons in reviews. Future work improve the automatic techniques. study the strength of opinions. investigate how to extract useful information from other types of opinion sources.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.