Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.

Similar presentations


Presentation on theme: "MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International."— Presentation transcript:

1 MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International Conference on Web Intelligence, Mining and Semantics (WIMS) 2012

2 Introduction  Opinion Data: user-generated content  Opinion Sources ForumsDiscussion GroupsBlogs CustomerManufacturer 2

3 Introduction  Problems with Reviews Information Overload Time Consuming  Solution  Approach to  Extract feature-opinions pairs from reviews  Determine the reliability score of each pair Biased Information 3

4 Related Work 4  Relatively new area of study  Information Retrieval Classification of positive/negative reviews  NLP, text mining, probabilistic approaches Identify patterns on text to extract attribute-value pairs

5 Proposed Approach  Architecture of the system 5

6 Pre-processing 6  Review crawler  Noisy reviews are removed Eliminate reviews created with no purpose or to increase/decrease the popularity of a product  Markup language is filtered  Remaining content is divided into manageable sizes Boundaries are determined based on heuristics, e.g., granularity of words, stemming, synonyms

7 Document Parser 7  Text Analysis  Assigns Part-Of-Speech (POS) tags to each word  Converts each sentence into a set of dependency relations between pairs of words  Facilitate information extraction Noun PhrasesAdjectivesAdverbs Product featuresDegree of expressiveness of opinions Opinions

8 Feature and Opinion Learner 8  Feature-opinion learner  Analyzes dependency relations generated by a document  Generates all the possible information components from the documents Information component: f refers to feature m refers to modifier o refers to an opinion

9 Feature and Opinion Learner 9  Rule 1  In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=JJ* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=JJ*, POS(w 4 )=NN* and w 3,w 4 are not stop-words, then either (w 1,w 2 ) or w 4 is considered as a feature and w 3 as an opinion.

10 Feature and Opinion Learner 10  Rule 2  In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=JJ* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=JJ*, POS(w 4 )=NN* and w 3,w 4 are not stop- words, then either (w 1,w 2 ) or w 3 is considered as a feature and w 4 as an opinion. Thereafter, the relationship advmod(w 3,w 5 ) relating w 3 with some adverbial word w 5 is searched. In case of presence of advmod relationship, the information component is identified as otherwise.

11 Feature and Opinion Learner 11  Rule 3  In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if there exist a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop- word, then we search for acomp(w 3,w 5 ) relation. If acomp relationship exists such that POS(w 5 )=JJ* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion. Thereafter, the modifier is searched and the information component is generated in the same way as in Rule-2.

12 Feature and Opinion Learner 12  Rule 4  In a dependency relation R, if there exist relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if, there exist a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop-word, then we search for dobj(w 3,w 5 ) relation. If dobj relationship exists such that POS(w 5 )=NN* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion.

13 Feature and Opinion Learner 13  Rule 5  In a dependency relation R, if there exists a amod(w 1,w 2 ) relation such that POS(w 1 )=NN*, POS(w 2 )=JJ*, and w 1 and w 2 are not stop-words then w 2 is assumed to be an opinion and w 1 as an feature.

14 Feature and Opinion Learner 14  Rule 6  In a dependency relation R, if there exists relationships nn(w 1,w 2 ) and nsubj(w 3,w 1 ) such that POS(w 1 )=POS(w 2 )=NN*, POS(w 3 )=VB* and w 1,w 2 are not stop-words, or if there exists a relationship nsubj(w 3,w 4 ) such that POS(w 3 )=VB*, POS(w 4 )=NN* and w 4 is not a stop- word, then we search for dobj(w 3,w 5 ) relation. If dobj relationship exists such that POS(w 5 )=NN* and w 5 is not a stop-word then either (w 1,w 2 ) or w 4 is assumed as the feature and w 5 as an opinion. Thereafter, the relationship amod(w 5,w 6 ) is searched. In case of presence of amod relationship, if POS(w 6 )=JJ* and w 6 is not a stop-word, then the information component is identified as otherwise.

15 Feature and Opinion Learner 15  Example  Consider the following opinion sentences related to Nokia N95 The screen is very attractive and bright The sound some times comes out very clear Nokia N95 has a pretty screen Yes, the push mail is the “Best” in the business

16 Reliability Score Generator 16  Reliability Score  Removes noise due to parsing errors  Addresses contradicting opinions in reviews

17 Reliability Score Generator 17  HITS Algorithm  A higher score value for a pair reflects a tight integrity of the 2 components in a pair  The Hubs and Authority scores are computed iteratively  The feature score is calculated using the term frequency and inverse sentence frequency in each sentence of the document Based on feature score and opinion score Authority Hub

18 Reliability Score Generator 18  Pseudocode 1 G := set of pages 2 for each page p in G do 3 p.auth = 1 // p.auth is the authority score of the page p 4 p.hub = 1 // p.hub is the hub score of the page p 5 function HubsAndAuthorities(G) 6 for step from 1 to k do // run the algorithm for k steps 7 for each page p in G do // update all authority values first 8 p.auth = 0 9 for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p 10 p.auth += q.hub 11 for each page p in G do // then update all hub values 12 p.hub = 0 13 for each page r in p.outgoingNeighbors do //p.outgoingNeighbors is the set of pages that p links to 14 p.hub += r.auth

19 Experimental Results 19  Dataset 400 review 4333 noun (or verb) and adjective pairs 1366 candidate features obtained after filtering  Sample list of extracted features, opinions, modifiers

20 Experimental Results 20  Metrics  True positive (TP): number of feature-opinion pairs that the system identifies correctly  False positive (FP): number of feature-opinion pairs that are falsely identified by the system  False negative (FN): number of feature-opinion pairs that the system fails to identify

21 Experimental Results 21  Feature and Opinion Learner  Precision: 79.3%  Recall: 70.6%  F-Measure: 74.7%  Observations Direct and strong relationships between noun and adjectives cause non-relevant feature-opinion pairs Lack of grammatical correctness in reviews affects the results yielded by NLP parsers Recall values lower than precision indicates the inability of the systems to extract certain feature-opinion pairs correctly

22 Experimental Results 22  Sample results for different products  Observations Lack of variations on metric values indicates the applicability of the proposed approach regardless of the domain of the review documents

23 Experimental Results 23  Reliability Score Generator Top-5 hub scored feature and opinion pairs and their reliability scores Sample feature- opinion pairs along with their hub and reliability scores

24 Conclusions 24  Future Work  Refine rules to improve precision and identify implicit features  Handle informal text common in reviews Reviews + Rules Feature-pinion pairs Hits Algorithm Reliability scores


Download ppt "MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International."

Similar presentations


Ads by Google