Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah.

Similar presentations


Presentation on theme: "1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah."— Presentation transcript:

1 1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah

2 2 Web 2.0  Opinions Everywhere Novotel …… Overall Rating iPhone Sushi Kame

3 Seller’s Feedback on eBay 23,385 Feedback received Very fast shipping and awesome price!!! 3

4 Need More Specific Aspects! Fast shipping Is this seller rated high/low mainly because of service? Which seller provides fast shipping? Good service 4

5 5 Rated Aspect Summarization AspectAspect Rating Representative Phrase Support Information Challenges: –How to identify coherent aspects? with user interest? –How to accurately rate each aspect? –How to get meaningful phrases supporting the ratings? 23,385 Feedback received 5

6 Related Work Review summarization –Unsupervised feature extraction + opinion polarity identification: [Hu&Liu 04], OPINE [Popescu&Etzioni 05], … –Supervised aspect extraction: [Zhuang et al] … Sentiment classification –Binary classification: [Turney02] [Pang&Lee02] [Kim&Hovy04] [Cui et al06] … –Rating classification: [Pang&Lee05] [Snyder&Barzilay07] … Hidden aspect discovery –[Hofmann99] [Blei et al03] [Zhai et al04] [Li&McCallum06] [Titov&McDonald08]… 6

7 Overall Approach 7 Step1: Aspect Discovery and Clustering Step2: Aspect Rating PredictionStep3:Extract Representative Phrases 7

8 8 Preprocessing of Short Comments 2 1 Source businessgreat sellerhonest priceawesome shippingfast Head Term (feature)‏ Modifier (opinion)‏ Very fast shipping and awesome price!!! Great business, honest seller Shallow parsing Comment 1 Comment 2

9 Step1: Step1: Aspect Discovery & Clustering 9 Step1: Aspect Discovery and Clustering Step2: Aspect Rating PredictionStep3:Extract Representative Phrases 9

10 10 Method(1) Head Method(1) Head Term Clustering 2 1 Source shippingfast sellerhonest sellerreliable deliveryquick shippingfast Head TermModifier fast:100 speedy:80 slow:50 …Shipping fast:120 speedy:85 slow:70 …Delivery honest:80 reliable:60 …Seller Head TermModifiers Clustering: e.g. k-means Clustering: e.g. k-means Support = Cluster Size

11 Method(2) Method(2) Unstructured PLSA 2 1 Source shippingfast sellerhonest sellerreliable deliveryquick shippingfast Head TermModifier … 11 22 kk w  d1  d2  dk shiping 0.3 delivery 0.2 service 0.32 exchange 0.2 email 0.25 comm. 0.22 [Hofmann 99] Topic model = unigram language model = multinomial distribution 11

12 Method(2) Unstructured PLSA 2 1 Source shippingfast sellerhonest sellerreliable deliveryquick shippingfast Head TermModifier … 11 22 kk w  d1  d2  dk shiping delivery service exchange email comm. [Hofmann 99] Topic model = unigram language model = multinomial distribution ? ? ? ? ? ? Estimation: e.g. EM with MLE Estimation: e.g. EM with MLE 12

13 Method(3) S Method(3) Structured PLSA 2 1 Source deliveryfast Sellerhonest sellerreliable deliveryquick Shippingfast Head TermModifier … 11 22 kk w  d1  d2  dk shiping delivery service exchange email comm. ? ? ? ? ? ? shipping: 70 slow delivery: 80 response: 10 delivery: 30 shipping:180fast Head TermModifier 13

14 Method(2) Method(2) (3): Topics  Aspects … 11 22 kk w  d1  d2  dk shiping 0.3 delivery 0.2 service 0.32 exchange 0.2 email 0.25 comm. 0.22 Support = Topic Coverage TopicsAspects 14

15 Method(2) Method(2) (3): Adding Prior to PLSA … 11 22 kk w  d1  d2  dk shiping ? delivery ? service ? exchange ? email ? comm. ? a1a1 a2a2 Dirichlet PriorTopics shiping delivery email comm. Estimation: e.g. EM with Maximum A Posteriori (MAP) instead of MLE Estimation: e.g. EM with Maximum A Posteriori (MAP) instead of MLE 15

16 Step2: Step2: Aspect Rating Prediction 16 Step1: Aspect Discovery and Clustering Step2: Aspect Rating PredictionStep3:Extract Representative Phrases 16

17 Method(1) Method(1) Local Prediction productfine packagedpoorly deliveryslow 2 … 1 Source …… productgreat shippingfast Head TermModifier Shipping Aspects Product slow Shipping Packaging Product What if? 17

18 Method(2) Method(2) Global Prediction Shipping Aspects Product Shipping Packging Product productfine Packagedpoorly deliveryslow 2 … 1 Source …… productgreat shippingfast Head TermModifier fast, timely, quick, fast, slow, quickly, fast, great, bad Shipping slow, bad, fast, poor, slowly, unbearable, quick, poor Shipping What if? slow shipping What if? slow shipping fast 0.2 timely 0.2 quick 0.2 …… slow 0.01 Shipping slow 0.4 bad 0.2 … … quick 0.02 fast 0.01 Shipping Language Model 18

19 19 Method(1)(2): Method(1)(2): Rating Aggregation slow shipping Fast delivery quick shipping AVG 2.33 stars badly wrapped poor packaging well packaged AVG 1.67 stars Aspect Rating Shipping Packaging Aspect

20 Step3: Step3: Representative Phrases 20 Step1: Aspect Discovery and Clustering Step2: Aspect Rating PredictionStep3:Extract Representative Phrases 20

21 21 Step3: Step3: Top K Frequent Phrases Fast shipping Timely delivery Quickly arrived Slow shipment Bad shipping Slow delivery Step 1Step 2Step 3 slow delivery Fast delivery quick shipping Shipping bad shipping Support = Phrase Freq. (50)‏

22 22 Experiments: eBay Data Set 28 eBay sellers with high feedback scores for the past year overall rating (positive %)‏ # of phrases/comment # of comments/seller Statistics 0.9597.9 0.04421.5533 62,39557,055 STDMean Positive  rating 1 Neutral  rating 0 Negative  rating 0

23 23 Experiments: Evaluate Step 1 Step1: Aspect Discovery & Clustering Gold standard: human labeled clusters Questions: –Is phrase structure useful? –Is topic modeling effective?

24 24 Eval Step 1: Aspect Coverage Aspect Coverage measures the percentage of covered aspects Top K Clusters Aspect Coverage k-means Unstructured PLSA Structured PLSA

25 25 Eval Step 1: Clustering Accuracy Clustering Accuracy measures the cluster coherence Structured PLSA Unstructured PLSA K-means Method 0.52 0.32 0.36 Clustering Accuracy 0.67450.61540.66670.7414Annot2-3 0.6319 0.6806 0.5484 Seller2 0.7290 0.7846 0.6610 Seller1 AVG Annot1-3 Annot1-2 0.67380.6604 0.72650.7143 0.62030.6515 AVGSeller3 Low Agreement; Varies a lot Low Agreement; Varies a lot Still much room for improvement! Human Agreement

26 26 Experiments: Evaluate Step 2 Step2: Aspect Rating Prediction Questions: –Local prediction v.s. Global prediction? –How does aspect clustering affect this?

27 27 Detailed Seller Ratings as Gold std Gold standard: user DSR ratings DSR criteria as priors of aspects

28 28 Eval Step 2: Correlation -0.0250 (-108%)‏0.1225 (-58%)‏GlobalK-means 0.1106 (-62%)‏ 0.2892 Kendal’s tau Local Step 2 K-means Baseline Step 1 0.1735 (-45%)‏ 0.3162 Pearson 0.5781 (+39%)‏0.4958 (+76%)‏GlobalUnstr. PLSA 0.41580.2815LocalUnstr. PLSA 0.6118 (+35%)‏0.4167 (+119%)‏GlobalStr. PLSA 0.1905LocalStr. PLSA0.4517 Correlation measures the effectiveness of ranking the four DSRs for a given seller

29 29 Eval Step 2: Ranking Loss 0.1977 (-16%)‏LocalUnstr. PLSA 0.2101(-11%)‏GlobalUnstr. PLSA 0.1909 (-19%)‏LocalStr. PLSA 0.6307 (+167%)‏GlobalK-means 0.1534 (-35%)‏GlobalStr. PLSA Local Step 2 K-means Baseline Step 1 0.2170 (-8%)‏ 0.2363 AVG of 3 DSR Ranking Loss measures the distance between the true and predicted ratings (smaller  better)‏ Local Pred: more robust Global Pred: more accurate Local Pred: more robust Global Pred: more accurate

30 30 Experiments: Evaluate Step 3 Step3: Representative Phrases Questions: –How do previous steps affect the phrase quality?

31 31 Eval Step 3: Human Labeling Item as Described Communication Shipping time Shipping and Handling Charges Rating 1DSRRating 0 Rating 1: Rating 0: Fast deliveryPrompt emailSlow shipping… Excessive postageAs promised…

32 32 Eval Step 3: Measures & Results 0.5611 0.5925 0.4008 0.4127 0.2635 0.3055 Prec. 0.4605LocalUnstr. PLSA 0.4435GlobalUnstr. PLSA 0.6379LocalStr. PLSA 0.2923GlobalK-means 0.5952GlobalStr. PLSA Local Step 2 K-means Step 1 0.3510 Recall Information Retrieval measures: Human generated phrases  “relevant document“ Computer generated phrases  “retrieved document".

33 33Summary Novel problem – Rated Aspect Summarization General Methods –Three steps –Effective on eBay Feedback Comments Future Work –Evaluate on other data –Three steps  One optimization framework

34 34 Thank you!


Download ppt "1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan Presented by: Sapan Shah."

Similar presentations


Ads by Google