Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright  2009 by CEBT Meeting  Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적  정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정 수식 설명 추가 요구  STFSSD 발표자료.

Similar presentations


Presentation on theme: "Copyright  2009 by CEBT Meeting  Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적  정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정 수식 설명 추가 요구  STFSSD 발표자료."— Presentation transcript:

1 Copyright  2009 by CEBT Meeting  Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적  정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정 수식 설명 추가 요구  STFSSD 발표자료 작성 Semantic Tech & Context - 1

2 A Holistic Approach to Product Review Summarization Jung-Yeon Yang, Jaeseok Myung, Sang-goo Lee Department of Computer Science and Engineering Seoul National University Center for E-Business Technology Seoul National University Seoul, Korea

3 Copyright  2009 by CEBT Outline  Introduction  Related Work  Motivation  Proposed Models  Process of a Review Summarization Feature Extraction Sentiment Analysis Feature Scoring  Experiment  Conclusion & Future work 3

4 Copyright  2009 by CEBT  Product reviews Reviews contains users’ opinion about a product Many customers references others’ reviews when they buy some products As a number of reviews increase, it is hard to read and grasp the whole reviews  Review Summarization To know the whole opinions at a glance Show the evaluation of product – Overall score about the product – Score on each representative features – An evaluation should be given on each product feature  Opinion mining To find user’s opinion in a text To find representative features Introduction 4

5 Copyright  2009 by CEBT Related Work  Feature extraction frequencies of words a structural information of sentences in a review  Sentiment analysis Natural Language Processing (NLP)–based approach – Using a word corpus (the WordNet or the SentiwordNet) Computational Statistics-based approach – Using a Point-wise Mutual Information (PMI) between opinion words  Feature scoring Calculate an evaluation score of each feature – Use a sentimental score that is from the WordNet or the SentiwordNet – Use a rating score of a review document Feature Extraction Feature Extraction Sentiment Analysis Feature Scoring Feature Scoring Review Doc. Review Doc. Summary 5

6 Copyright  2009 by CEBT Related Work (Cont.) Using NLP, sentimental polarity summation Using rating score, based on a specific feature Using Term frequencies, Clustering 6

7 Copyright  2009 by CEBT Motivation  Problems in previous work Workloads to extract features – Many strategies and methods Using a word corpus – Sentiment polarities are based on general usages of words – It cannot deal with context-sensitive words (e.g. big, small, long, short, …) Using a rating score of a review – In previous works, whole features that are extracted from the same review has the same evaluation score – Each features has to have a own evaluation score in every reviews  Challenges A dynamic and easy method to extract features is needed. (through Tools) We want to find out a meaning of an opinion about a feature that is modified by context-sensitive words A better way to scoring a product feature is needed. 7

8 Copyright  2009 by CEBT Example: using user scores of reviews Ratin g score SizeCostDesignUtility Shutter speed battery time A/Scolor ★★★★★ OO OO OOOO ★★★★ OOOO OOO OOOO ★★★ OO OO OO ★ OO ★ OO Bad Good 5 5 4 5 5 5 5 5 5 4 444 4 4 4 4 4 4 3 3 33 33 1 1 11 8

9 Copyright  2009 by CEBT Example: Considering sentimental polarities Ratin g score SizeCostDesignUtility Shutter speed battery time A/Scolor ★★★★★ OO OO OOO ★★★★ OOOO OOO OOOO ★★★ OO OO OO ★ OO ★ OO Bad Rating score : ★★★★ The size of camera is good to hold in one hand and comfortable. a design is so cool, nice body!!. But battery time is short. So, in outdoor, additional batteries are needed. This camera is almost perfect!! Rating score : ★★★★ The size of camera is good to hold in one hand and comfortable. a design is so cool, nice body!!. But battery time is short. So, in outdoor, additional batteries are needed. This camera is almost perfect!! Good 9

10 Copyright  2009 by CEBT Proposed Models R1R1 R1R1 f 11 o 11 st 11 sp 11 e 11 f 21 o 21 st 21 sp 21 e 21 f i1 o i1 st i1 sp i1 e i1 f m1 o m1 st m1 sp m1 e m1 … … us 1 RjRj RjRj f 1j o 1j st 1j sp 1j e 1j f 2j o 2j st 2j sp 2j e 2j f ij o ij st ij sp ij e ij f mj o mj st mj sp mj e mj … … us j RnRn RnRn f 1n o 1n st 1n sp 1n e 1n f 2n o 2n st 2n sp 2n e 2n f in o in st in sp in e in f mn o mn st mn sp mn e mn … … us n … … R : review us : user score f : feature o : opinion st : strength of an opinion, sp : sentimental polarity of an opinion e : evaluation score of a feature in a review E : overall evaluation score of a feature R : review us : user score f : feature o : opinion st : strength of an opinion, sp : sentimental polarity of an opinion e : evaluation score of a feature in a review E : overall evaluation score of a feature RjRj RjRj us j f ij o ij f ij o ij f ij o ij st ij sp ij st ij sp ij st ij sp ij e ij EiEi EiEi  Review Model  Review Summarization Model 10

11 Copyright  2009 by CEBT Process of a Review Summarization ㅍ Product Reviews Product Reviews Feature extraction Sentiment analysis Feature scoring Feature- opinion pairs Feature- opinion pairs Extract features Extract opinion word POS tagger Review parser Classify sentiment polarity Pattern rules Word frequency Sentiment Dictionaries Sentiment Dictionaries Construct Dictionaries automatically Construct Dictionaries automatically Sentiment polarities of Features Title Main text Reviewer Review date Review date Rate Feature co-occurrence Feature frequency Sentiment distribution Sentiment distribution Evaluation scores of product features Derive a score of feature Review Summary N-gram 11

12 Copyright  2009 by CEBT Feature Extraction  PicAChoo (Pick And Choose; a text analyzing framework) Reducing manual efforts to obtain feature and opinion words Enabling dynamic composition of several extraction methods – 4 primitive methods (freq., co-occurrence, sequential pattern, plug-in) – 2 composite methods (logical & arithmetical methods) Utilizing characteristics of textual data documents Tokenized Document Preprocessing Composition of primitive extraction methods (freq., co-occurrence, pattern-rules, …) Composition of primitive extraction methods (freq., co-occurrence, pattern-rules, …) Selected Words Selected Words Opinion Mining Summarization User Modeling … 12

13 Copyright  2009 by CEBT  Find out sentimental polarities of opinions in reviews  Consider a context of opinion word SO=SA(opinion word, Product category, product feature, user’s evaluation)  Point-wise Mutual Information (PMI) A measure of association between two words Sentiment Analysis Review Doc. Review Doc. positive word Dictionary positive word Dictionary negative word Dictionary negative word Dictionary Sentiment Analysis Sentiment Analysis (feature,opinion) (feature,opinion,polarity) Build automatically use user scores POS-tagging Dic.={reviewID, catID, type, POS, word, userScore, s_no, w_no } 13

14 Copyright  2009 by CEBT Feature Scoring  Scoring strategies Only use user score (in previous work) Consider a distribution of sentimental polarities of user’s opinion f1f1 f2f2 f3f3 f4f4 f5f5 f6f6 f7f7 …fnfn R1R1 PPNPN R2R2 PPP R3R3 NPPP R4R4 PNNP R5R5 PP R6R6 PPPN R7R7 NPPP … RnRn PNN f 1 ~ f n : featuresR 1 ~ R n : reviews P : positive opinionN : negative opinion Use the distribution of sentimental polarities in the same review Calculate evaluation scores of each feature through the adjustment of rating scores Use the distribution of sentimental polarities in the same review Calculate evaluation scores of each feature through the adjustment of rating scores Summary = { E 1, E 2, …, E i, …, E m }, m = Number of features, n = Number of reviews that contain the i th feature = number of opinions in the j th review = number of positive opinions in the j th review = number of negative opinions in the j th review F(f i, j) = frequency of f i in the j th review sp ij = Sentiment Polarity(f ij, o ij ) Summary = { E 1, E 2, …, E i, …, E m }, m = Number of features, n = Number of reviews that contain the i th feature = number of opinions in the j th review = number of positive opinions in the j th review = number of negative opinions in the j th review F(f i, j) = frequency of f i in the j th review sp ij = Sentiment Polarity(f ij, o ij ) 14

15 Copyright  2009 by CEBT Experiments  Data ePinions.com  Sentiment Analysis  Feature Scoring Improvement of our method in comparison with a previous work – about 20% 15 Product category reviews positive reviews negative reviews Product feature pair Context- sensitive word Hand phone 29472196 (74.5%)418 (25.5%)48734124 (16.9%) Digital camera 129179940 (76.9%)1740 (23.1%)37974137 (14.1%) PrecisionOur methodPrevious method (PMI using Web doc. Search) All Context- nonsensitive Context- sensitive All Context- nonsensitive Context- sensitive Hand phone0.7840.7750.8470.7860.8340.508 Digital camera0.7640.7580.7970.8170.8660.515

16 Copyright  2009 by CEBT Conclusion  Proposed the models Product review model Review summarization model  Proposed new approaches to summarize product reviews Handle context-sensitive words in the sentiment analysis process Feature scoring method – Utilizing user scores and sentimental polarities of opinions Develop a text analyzing framework for feature extraction 16

17 uKnow iKnow Feature Extraction Pairs Opinion Extraction Feature Scoring Score Summarization Sentiment Clause Sentiment Analysis Feature Score Product Summary Product Summary Product Recommend Product Recommend Product Comparison Product Comparison weKnow NLP approach use Parse Trees use the Sentiment Dictionary (defined by experts manually) find out Sentimental Polarities of Features derive scores of pairs Statistical approach use Probabilities use the POS tags use the Sentiment Dictionaries (constructed automatically) use Rating data of Reviews use a PMI values between Feature and Opinion derive the sentimental polarities use Rating data of Reviews use frequencies of features use a distribution of sentiments use the users’ profiles use inputs from users use Comparative Objects 17

18 Copyright  2009 by CEBT E-mail : jyyang@europa.snu.ac.krjyyang@europa.snu.ac.kr Intelligent Database Systems Lab. : http://ids.snu.ac.krhttp://ids.snu.ac.kr 18


Download ppt "Copyright  2009 by CEBT Meeting  Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적  정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정 수식 설명 추가 요구  STFSSD 발표자료."

Similar presentations


Ads by Google