学期工作总结 魏巍 Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion.


Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Opinion Observer: Analyzing and Comparing Opinions on the Web WWW 2005, May 10-14, 2005, Chiba, Japan. Bing Liu, Minqing Hu, Junsheng Cheng.
§7.2 估计量的评价标准 上一节我们看到,对于总体 X 的同一个 未知参数,由于采用的估计方法不同,可 能会产生多个不同的估计量.这就提出一 个问题,当总体的一个参数存在不同的估 计量时,究竟采用哪一个好呢?或者说怎 样评价一个估计量的统计性能呢?下面给 出几个常用的评价准则. 一.无偏性.
Entity and Aspect Extraction for Organizing News Comments
Unit 1.
Yunzhi Tan, Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma
学期工作总结 魏巍

Outline Introduce to Opinion Mining My work: –Product feature words extraction –Product opinion words extraction –Opinion words orientation identification Conclusion and Future Work

· Opinion mining

Introduction of Opinion Mining Why opinion mining? –User generated content or user generate media (more), like bbs, blog etc. –It’s hard to get some person’s opinion towards a special thing or topic. Opinion granularity(level): –Document level – Genre classification(subjective or objective) –Sentence level –Feature(word) level– object have attributes(product)

Problem definition (feature-based opinion mining) Object: – product, person, entity or event, etc. Feature: explicit and implicit feature –“The battery life of this camera is too short.” –“It’s really too large.(size)” Opinion: adjectives near the feature –“The battery life of this camera is too short.” –“It’s really too large.(size)”

Feature-based opinion mining Be able to form a table as: Att1Att2Att3………Attn Pos Neg neu Feature Review of someone

Objective 用户评论: canon XX R1: R2: R3: R4: … 这款相机的电池寿命很 短 。 这个相机 镜头 很 大 。 例子: … 抽取 --negative ---positive … Step2: Opinion Orientation Identify 特征正面负面中性 电池寿命 40%30% 镜头 60%20% ………… Step1: Feature & opinion extraction

· My work 1. feature & opinion extraction

Feature and Opinion words extraction Query product’s reviews Relevant reviews Irrelevant reviews Rr Ir Qr candidate Feature extraction Prune features Syntax pattern extraction Pattern matching Opinion words extraction general features: … Specific features: …

N-gram method is used to extract noun single word and noun phrase. –a. “ 我 /r 觉得 /v 清洁 /a 效果 /n 显著 /a” (“I feel the cleaning ability is remarkable ”) b. “ 泡沫 /n 相当 /d 丰富 /a” (“The foam is very abundant ”) In this step, we get a candidate feature list, for each unit in the list, we keep a data structure below: Candidate feature generation struct unit{ string word; int rel_num; //how many relevant reviews contain this word int frq; int irrel_num;//how many irrelevant reviews contain this word int sen_num; int op_sen_num; //how many sentences have adjectives near int sen_id[MAX]; … this word

Prune & Divide the feature list Pruning rules: rule 1: –eliminate candidate features according to some patterns of the combinations of the POS tags. (eg: “ 效果 / 很 / 好 ” has tags of “n/d/a”) rule 2: –eliminate candidate features according to the word’s rel_num value and irrel_num value. –Divide the feature into general feature list and specific feature list. rule 3: –eliminate candidate features according to the proportions of sentences containing the feature word that have an adjective nearby. (op_sen_num/sen_num)

Syntax pattern extraction & match We believe that consumers may has the same expression model on different product features. (syntax pattern) Eg: a. “ 泡沫 /n 相当 /d 丰富 /a”(“The foam is very abundant ”) (feature + 相当 /d + adjective) b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”) ( 很 /d + adjective + 的 /u +feature) · We keep a pattern list and use these patterns to find new features. Eg: b. “ 很 /d 便宜 /a 的 /u 价格 /n”(“The price is very low”) ( 很 /d + adjective + 的 /u +feature) -> “ 很 /d 耐用 /a 的 /u 电池 /n” -> new feature “ 电池 ”

To avoid reviews only have opinion but not have explicit feature, we separate this two steps. –Implicit feature: “It’s really too large.”(size) Opinion words extraction Review Features extraction Opinion extraction Merge

Experiment Fail to use Liu et[2004]’s method. –For each sentence, only keep the noun segments to generate feature words. We use N-gram instead. Pruning + Relevant/irrelevant reviews FOXS RecallPrecRecallPrec Data Data Data Data Data Avg recallprecision Data Data20.5 Data Data Data Avg

Supplementary( 补充 ) Try to tackle with implicit features. R i : 真是太贵了。 R n : 感觉价格太贵了。 … … Review : implicit features: { 贵,大,高, …} 贵 大 高 … 价格,价钱 Talk about “ 价格 ”

· My work 2. opinion orientation identification

Opinion orientation identification Methods in English language: –Based on WordNet –A seed list: positive and negative list –Context-dependent opinion: context rules w1 w2 w3 w4 Seed list … positive w1 w2 w3 w4 … negative

Opinion orientation identification(cont.) We can’t use WordNet in Chinese. What we can use now: Positive sentiment word seed list (Pset) - ( howNet gives ) Negative sentiment word seed list (Nset) - ( howNet gives ) Context-related sentiment word list (CRset) ( suppose we have whole set ) Conjunction words set Some heuristic rules (Liu et [2008]) –And, but, etc.

Opinion orientation identification(cont.) S1 : , 。 , 。。 , f1 f2 opw1opw2 1. Check the opw’s type in every sentence. Positive list Negative list Context dependent list Unknown word list 2. For every, but opw1 in. Save … C-d list Unknown words Add to p-list or n-list 3. 利用句法等规则判断

Opinion orientation identification(cont.) 现阶段得出的结果分析: 效果不是很理想, Unknown opinion list 中 的未判断出极性的词还很多 跟初始 seed 词表的规模有关 继续 …

· Future work

Future work Implicit feature identification. Improving opinion orientation identification.

Thank you! And Happy Dragon Boat Festival! Q & A