Download presentation
Presentation is loading. Please wait.
Published byLeila Fonner Modified over 10 years ago
1
MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International Conference on Web Intelligence, Mining and Semantics (WIMS) 2012
2
Introduction Opinion Data: user-generated content Opinion Sources ForumsDiscussion GroupsBlogs CustomerManufacturer 2
3
Introduction Problems with Reviews Information Overload Time Consuming Solution Approach to Extract feature-opinions pairs from reviews Determine the reliability score of each pair Biased Information 3
4
Related Work 4 Relatively new area of study Information Retrieval Classification of positive/negative reviews NLP, text mining, probabilistic approaches Identify patterns on text to extract attribute-value pairs
5
Proposed Approach Architecture of the system 5
6
Pre-processing 6 Review crawler Noisy reviews are removed Eliminate reviews created with no purpose or to increase/decrease the popularity of a product Markup language is filtered Remaining content is divided into manageable sizes Boundaries are determined based on heuristics, e.g., granularity of words, stemming, synonyms
7
Document Parser 7 Text Analysis Assigns Part-Of-Speech (POS) tags to each word Converts each sentence into a set of dependency relations between pairs of words Facilitate information extraction Noun PhrasesAdjectivesAdverbs Product featuresDegree of expressiveness of opinions Opinions
8
Feature and Opinion Learner 8 Feature-opinion learner Analyzes dependency relations generated by a document Generates all the possible information components from the documents Information component: f refers to feature m refers to modifier o refers to an opinion
9
Feature and Opinion Learner 9 Rule 1 In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=JJ* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=JJ*, POS(w 4 )=NN* and w 3,w 4 are not stop-words, then either (w 1,w 2 ) or w 4 is considered as a feature and w 3 as an opinion.
10
Feature and Opinion Learner 10 Rule 2 In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=JJ* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=JJ*, POS(w 4 )=NN* and w 3,w 4 are not stop- words, then either (w 1,w 2 ) or w 3 is considered as a feature and w 4 as an opinion. Thereafter, the relationship advmod(w 3,w 5 ) relating w 3 with some adverbial word w 5 is searched. In case of presence of advmod relationship, the information component is identified as otherwise.
11
Feature and Opinion Learner 11 Rule 3 In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if there exist a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop- word, then we search for acomp(w 3,w 5 ) relation. If acomp relationship exists such that POS(w 5 )=JJ* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion. Thereafter, the modifier is searched and the information component is generated in the same way as in Rule-2.
12
Feature and Opinion Learner 12 Rule 4 In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if, there exist a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop-word, then we search for dobj(w 3,w 5 ) relation. If dobj relationship exists such that POS(w 5 )=NN* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion.
13
Feature and Opinion Learner 13 Rule 5 In a dependency relation R, if there exists a amod(w 1,w 2 ) relation such that POS(w 1 )=NN*, POS(w 2 )=JJ*, and w 1 and w 2 are not stop-words then w 2 is assumed to be an opinion and w 1 as an feature.
14
Feature and Opinion Learner 14 Rule 6 In a dependency relation R, if there exists relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop- word, then we search for dobj(w 3,w 5 ) relation. If dobj relationship exists such that POS(w 5 )=NN* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion. Thereafter, the relationship amod(w 5,w 6 ) is searched. In case of presence of amod relationship, if POS(w 6 )=JJ* and w 6 is not a stop-word, then the information component is identified as otherwise.
15
Feature and Opinion Learner 15 Example Consider the following opinion sentences related to Nokia N95 The screen is very attractive and bright The sound some times comes out very clear Nokia N95 has a pretty screen Yes, the push mail is the “Best” in the business
16
Reliability Score Generator 16 Reliability Score Removes noise due to parsing errors Addresses contradicting opinions in reviews
17
Reliability Score Generator 17 HITS Algorithm A higher score value for a pair reflects a tight integrity of the 2 components in a pair The Hubs and Authority scores are computed iteratively The feature score is calculated using the term frequency and inverse sentence frequency in each sentence of the document Based on feature score and opinion score Authority Hub
18
Reliability Score Generator 18 Pseudocode 1 G := set of pages 2 for each page p in G do 3 p.auth = 1 // p.auth is the authority score of the page p 4 p.hub = 1 // p.hub is the hub score of the page p 5 function HubsAndAuthorities(G) 6 for step from 1 to k do // run the algorithm for k steps 7 for each page p in G do // update all authority values first 8 p.auth = 0 9 for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p 10 p.auth += q.hub 11 for each page p in G do // then update all hub values 12 p.hub = 0 13 for each page r in p.outgoingNeighbors do //p.outgoingNeighbors is the set of pages that p links to 14 p.hub += r.auth
19
Experimental Results 19 Dataset 400 review 4333 noun (or verb) and adjective pairs 1366 candidate features obtained after filtering Sample list of extracted features, opinions, modifiers
20
Experimental Results 20 Metrics True positive (TP): number of feature-opinion pairs that the system identifies correctly False positive (FP): number of feature-opinion pairs that are falsely identified by the system False negative (FN): number of feature-opinion pairs that the system fails to identify
21
Experimental Results 21 Feature and Opinion Learner Precision: 79.3% Recall: 70.6% F-Measure: 74.7% Observations Direct and strong relationships between noun and adjectives cause non-relevant feature-opinion pairs Lack of grammatical correctness in reviews affects the results yielded by NLP parsers Recall values lower than precision indicates the inability of the systems to extract certain feature-opinion pairs correctly
22
Experimental Results 22 Sample results for different products Observations Lack of variations on metric values indicates the applicability of the proposed approach regardless of the domain of the review documents
23
Experimental Results 23 Reliability Score Generator Top-5 hub scored feature and opinion pairs and their reliability scores Sample feature- opinion pairs along with their hub and reliability scores
24
Conclusions 24 Future Work Refine rules to improve precision and identify implicit features Handle informal text common in reviews Reviews + Rules Feature-pinion pairs Hits Algorithm Reliability scores
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.