Download presentation
Presentation is loading. Please wait.
Published byMaude Sutton Modified over 9 years ago
1
学期工作总结 魏巍 Email: zauri@ruc.edu.cn
2
Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion words orientation identification Conclusion and Future Work
3
· Opinion mining
4
Introduction of Opinion Mining Why opinion mining? –User generated content or user generate media (more), like bbs, blog etc. –It’s hard to get some person’s opinion towards a special thing or topic. Opinion granularity(level): –Document level – Genre classification(subjective or objective) –Sentence level –Feature(word) level– object have attributes(product)
5
Problem definition (feature-based opinion mining) Object: – product, person, entity or event, etc. Feature: explicit and implicit feature –“The battery life of this camera is too short.” –“It’s really too large.(size)” Opinion: adjectives near the feature –“The battery life of this camera is too short.” –“It’s really too large.(size)”
6
Feature-based opinion mining Be able to form a table as: Att1Att2Att3………Attn Pos Neg neu Feature Review of someone
7
Objective 用户评论: canon XX R1:------------ R2:------------ R3:------------ R4:------------ … 这款相机的电池寿命很 短 。 这个相机 镜头 很 大 。 例子: … 抽取 --negative ---positive … Step2: Opinion Orientation Identify 特征正面负面中性 电池寿命 40%30% 镜头 60%20% ………… Step1: Feature & opinion extraction
8
· My work 1. feature & opinion extraction
9
Feature and Opinion words extraction Query product’s reviews Relevant reviews Irrelevant reviews Rr Ir Qr candidate Feature extraction Prune features Syntax pattern extraction Pattern matching Opinion words extraction general features: … Specific features: …
10
N-gram method is used to extract noun single word and noun phrase. –a. “ 我 /r 觉得 /v 清洁 /a 效果 /n 显著 /a” (“I feel the cleaning ability is remarkable ”) b. “ 泡沫 /n 相当 /d 丰富 /a” (“The foam is very abundant ”) In this step, we get a candidate feature list, for each unit in the list, we keep a data structure below: Candidate feature generation struct unit{ string word; int rel_num; //how many relevant reviews contain this word int frq; int irrel_num;//how many irrelevant reviews contain this word int sen_num; int op_sen_num; //how many sentences have adjectives near int sen_id[MAX]; … this word
11
Prune & Divide the feature list Pruning rules: rule 1: –eliminate candidate features according to some patterns of the combinations of the POS tags. (eg: “ 效果 / 很 / 好 ” has tags of “n/d/a”) rule 2: –eliminate candidate features according to the word’s rel_num value and irrel_num value. –Divide the feature into general feature list and specific feature list. rule 3: –eliminate candidate features according to the proportions of sentences containing the feature word that have an adjective nearby. (op_sen_num/sen_num)
12
Syntax pattern extraction & match We believe that consumers may has the same expression model on different product features. (syntax pattern) Eg: a. “ 泡沫 /n 相当 /d 丰富 /a”(“The foam is very abundant ”) (feature + 相当 /d + adjective) b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”) ( 很 /d + adjective + 的 /u +feature) · We keep a pattern list and use these patterns to find new features. Eg: b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”) ( 很 /d + adjective + 的 /u +feature) -> “ 很 /d 耐用 /a 的 /u 电池 /n” -> new feature “ 电池 ”
13
To avoid reviews only have opinion but not have explicit feature, we separate this two steps. –Implicit feature: “It’s really too large.”(size) Opinion words extraction Review Features extraction Opinion extraction Merge
14
Experiment Fail to use Liu et[2004]’s method. –For each sentence, only keep the noun segments to generate feature words. We use N-gram instead. Pruning + Relevant/irrelevant reviews FOXS RecallPrecRecallPrec Data10.840.750.920.76 Data20.80.530.80.45 Data30.730.550.930.6 Data40.740.850.860.82 Data50.590.810.82 Avg.0.740.690.870.69 recallprecision Data10.840.48 Data20.5 Data30.730.37 Data40.740.5 Data50.760.6 Avg.0.710.49
15
Supplementary( 补充 ) Try to tackle with implicit features. R i : 真是太贵了。 R n : 感觉价格太贵了。 … … Review : implicit features: { 贵,大,高, …} 贵 大 高 … 价格,价钱 Talk about “ 价格 ”
16
· My work 2. opinion orientation identification
17
Opinion orientation identification Methods in English language: –Based on WordNet –A seed list: positive and negative list –Context-dependent opinion: context rules w1 w2 w3 w4 Seed list … positive w1 w2 w3 w4 … negative
18
Opinion orientation identification(cont.) We can’t use WordNet in Chinese. What we can use now: Positive sentiment word seed list (Pset) - ( howNet gives ) Negative sentiment word seed list (Nset) - ( howNet gives ) Context-related sentiment word list (CRset) ( suppose we have whole set ) Conjunction words set Some heuristic rules (Liu et [2008]) –And, but, etc.
19
Opinion orientation identification(cont.) S1 : , 。 , 。。 , f1 f2 opw1opw2 1. Check the opw’s type in every sentence. Positive list Negative list Context dependent list Unknown word list 2. For every, but opw1 in. Save … C-d list Unknown words Add to p-list or n-list 3. 利用句法等规则判断
20
Opinion orientation identification(cont.) 现阶段得出的结果分析: 效果不是很理想, Unknown opinion list 中 的未判断出极性的词还很多 跟初始 seed 词表的规模有关 继续 …
21
· Future work
22
Future work Implicit feature identification. Improving opinion orientation identification.
23
Thank you! And Happy Dragon Boat Festival! Q & A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.