Download presentation
Presentation is loading. Please wait.
Published byAvice Welch Modified over 9 years ago
1
Stylistics in Customer Reviews of Cultural Objects Xiao Hu, J. Stephen Downie The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL) University of Illinois at Urbana-Champaign THE ANDREW W. MELLON FOUNDATION
2
Agenda zMotivation zCustomer reviews in epinions.com zExperiments Genre classification Rating classification Usage classification Feature studies zConclusions & Future Work
3
Motivation Online customer reviews on culture objects: User-generated user-centered retrieval Detailed descriptions contextual info. Large amountrich resource Self-organized ground truth zText mining: Mature techniques and Handy tools zReview mining: a place to play Stylistics Text Analysis!
4
Motivation Classify Reviews Identify User Descriptions Connect to Objects Customer Reviews Epinions.com Amazon.com ….. Class 1 Class 2 Description 1 D1D2D3 Prominent Features Genres Ratings Usages D1D2D3 User- centered access points
5
Customer Reviews Published on www.epinions.com Focused on the book, movie and music z Each review associated with: a genre label a numerical quality rating a recommended usage (for music reviews)
6
numerical rating associated full text, to be analyzed recommended usage
7
Genre Taxonomy (music) Jazz, Rock, Country, Classical, Blues, Gospel, Punk,.… Renaissance, Medieval, Baroque, Romantic, … 28 Major Genre Categories
8
Experiments zto build and evaluate a prototype system that could automatically : predict the genre of the work being reviewed predict the quality rating assigned to the reviewed item predict the usage recommended by the reviewer discover distinctive features contributing to each of the above
9
Models and Methods zPrediction problem: Naïve Bayesian (NB) Classifier Computationally efficient Empirically effective Hierarchical clustering (for usage prediction only) zFeature analysis: Frequent pattern mining Naïve Bayesian feature ranking
10
Data Preprocessing zHTML tags were stripped out; zStop words were NOT stripped out; zPunctuation was NOT stripped out; They may contain stylistic information zTokens were stemmed
11
Genre Classifications zData set Reviews onBookMovieMusic #. Of reviews 180016501800 #. Of genres 91112 Mean of review length 1,095 words 1,514 words 1,547 words Std. Dev. of review length 446 words 672 words 784 words Term list size 41,06047,01547,864
12
Genres Examined BookMovieMusic Action / ThrillerAction /AdventureBlues Juvenile FictionChildrenClassical HumorComediesCountry HorrorHorror/SuspenseElectronic Music & Performing ArtsMusical & Performing ArtsGospel Science Fiction & FantasyScience-Fiction / FantasyHardcore/Punk Biography & AutobiographyDocumentaryHeavy Metal Mystery & CrimeDramasInternational RomanceEducation/General InterestJazz Instrument Japanimation (Anime)Pop Vocal WarR&B Rock & Pop
13
Genre Classification Results Reviews onBookMovieMusic Number of genres91112 Reviews in each genre200150 Term list size (terms)41,06047,01547,864 Mean of review length (words)1,0951,5141,547 Std Dev of review length (words)446672784 Mean of precision72.18%67.70%78.89% Std Dev of precision1.89%3.51%4.11% 5 fold random cross validation for book and movie reviews 3 fold random cross validation for music reviews
14
Confusion : Book Reviews Classified As ActionBio.Hor.Hum.Juv.Mus.Mys.Rom.Sci. Action0.610.010.060.010.020.030.200.050.02 Bio.0.040.700.010.050.030.130.010.030 Horror0.0900.6600.0500.120.020.06 Humor0.010.1000.740.030.080.01 0.03 Juvenile0.01 00.070.860.020 0 Music00.09000.010.89000.01 Mystery0.2000.010 00.700.050.04 Romance0.060.01 00.0400.080.780.03 Science0.0300.020.010.110.030.010.130.66
15
Confusion : Movie Reviews Classified As Act.Ani.Chi.Com.Doc.Dra.Edu.Hor.Mus.Sci.War Action0.77000.010 0.02000.100.09 Anime00.890.03 000000.050 Children0.020.010.9500.01 0000 Comedy0.090.010.060.520.030.170.060.010.030.010.02 Docu.0.02000.040.630.010.1900.0900.02 Drama0.16000.120.100.450.050.03 0.010.04 Edu.000.02 0.310.030.57000.010.03 Horror0.150.02 0.030.020.050.6900.100.02 Music0000.010.180000.8100 Science0.040.010.0200.060.010.020.0300.760.05 War0.1100.01 0.08 0.050.030.02 0.59
16
Confusion : Music Reviews Classified As Blu.Cla.CouEle.Gos.Pun.Met.Int’lJazzPop.RBRoc. Blues0.6100.10000000000.29 Classical00.9400.030000000 Country000.9200.030000000.06 Electr.0000.92000.0600000.03 Gospel000.0500.80000000.050.10 Punk0000.0500.710.0500000.19 Metal0000000.8900000.11 Int’l00.040.000.040000.810000.04 Jazz0000.0400000.890.040 Pop Vo.000.040.070000.040.070.6800.11 R&B0000000000.060.880.06 Rock0.030 000 00 00.89
17
Rating Classification zFive-class classification 1 star vs. 2 stars vs. 3 stars vs. 4 stars vs 5 stars zBinary Group classification 1 star + 2 stars vs. 4 stars + 5 stars zad extremis classification 1 star vs. 5 stars 5 fold random cross validation for Book and Movie review experiments 5 fold cross validation for Music review experiments
18
Rating : Book Reviews Experiments5 classesBinary Group Ad extremis Number of classes522 Reviews in each class200400300 Term list size (terms)34,12328,33923,131 Mean of review length (words)1,2401,2281,079 Std Dev of review length (words)549557612 Mean of precision36.70%80.13%80.67% Std Dev of precision1.15%4.01%2.16%
19
Rating : Movie Reviews Experiments5 classesBinary Group Ad extremis Number of classes522 Reviews in each class220440400 Term list size (terms)40,23536,62031,277 Mean of review length (words)1,6401,6451,409 Std Dev of review length (words)788770724 Mean of precision44.82%82.27%85.75% Std Dev of precision2.27%2.02%1.20%
20
Rating : Music Reviews Experiments5 classesBinary Group Ad extremis Number of classes522 Reviews in each class200400 Term list size (terms)35,60033,08432,563 Mean of review length (words)1,8752,0321,842 Std Dev of review length (words)913912956 Mean of precision44.25%79.75%85.94% Std Dev of precision2.63%3.59%3.58%
21
Confusion : Book Reviews Classified As 1 star2 stars3 stars4 stars5 stars 1 star0.450.210.150.090.10 2 stars0.240.360.190.120.09 3 stars0.110.170.280.220.21 4 stars0.050.060.170.410.31 5 stars0.040.070.170.260.46
22
Confusion : Movie Reviews Classified As 1 star2 stars3 stars4 stars5 stars 1 star0.490.190.170.080.07 2 stars0.150.450.230.110.06 3 stars0.040.240.280.270.17 4 stars0.050.13 0.410.27 5 stars0.070.030.160.200.54
23
Confusion : Music Reviews Classified As 1 star2 stars3 stars4 stars5 stars 1 star0.610.240.070.050.02 2 stars0.240.150.360.150.09 3 stars0.110.130.410.200.15 4 stars0.030.060.100.320.48 5 stars000.090.110.80
24
Usage Classification zEach music review has one usage suggested by the reviewer zIt can be chosen from a ready-made list of 13 usages zChose the most popular 11 usages for experiments
25
Usage Categories and Counts UsageCountUsageCount Driving (DRV)1,349Waking up (WKU)271 Hanging With Friends (HWF)1,215Going to Sleep (GTS)269 Listening (LST)592Cleaning the House (CTH)230 Romancing (ROM)492At Work (AWK)188 Reading or Studying (ROS)447With Family35 Getting ready to go out (GRG)378Sleeping15 Exercising (EXC)291TOTAL5,772
26
Data and initial result ExperimentsAll classes Number of classes11 Reviews in each class180 Term list size (terms)36,561 Mean of review length (words)838.75 Std Dev of review length (words)511.39 Mean of precision19.55% Std Dev of precision2.89% 10 fold cross validation
27
Confusion matrix Classified As AWKCTHDRVEXCGRGGTSHWFLSTROSROMWKU AWK.139.100.067.056.05.144.078.056.100.072 CTH.072.283.128.033.022.094.050.083.128.083 DRV.106.111.150.078.089.039.133.050.083.111 EXC.094.089.111.028.206.056.028.078.111 GRG.133.083.161.083.106.033.133.067.044.106.050 GTS.056.072.089.056.039.194.078.111.078.133.094 HWF.128.100.083.039.094.028.272.050.072.083 LIS.083.067.072.039.044.100.061.189.089.167.089 ROS.089.122.072.022.067.106.056.100.111.172.083 ROM.056.128.067.028.039.028.017.044.061.500.033 WKU.078.106.100.067.083.150.072.094
28
Usage super-classes zFrequent confusions: a measure of similarity zHierarchical clustering based on the confusion matrix
29
Hierarchical clustering Going to sleep Listening Getting ready to go out Driving Reading or studying Romancing Cleaning the house At work Hanging out with friends Waking up Exercising Relaxing Stimulating R1 R2 S1 S2
30
Classifications on usage super-classes ExperimentsRelaxing, Stimulating R1,R2, S1,S2 Number of classes24 Reviews in each class900360 Term list size (terms)34,75930,637 Mean of review length (words)839.03825.45 Std Dev of review length (words)509.96514.38 Mean of precision65.72%42.60% Std Dev of precision3.15%4.60% 10 fold cross validation
31
Feature studies zWhat makes the classes distinguishable? zWhat are important features? zHow important are they? zTwo techniques applied Frequent Pattern Mining Naïve Bayesian Feature Ranking zFocus on music reviews
32
Frequent Pattern Mining (FPM) z Originally used to discover association rules z Finds patterns consisting of items that frequently occur together in individual transactions Items = candidate words (terms) depending on specific questions zTransactions = review sentences Items Transactions
33
Positive and negative descriptive patterns zRecall: rating classification on music reviews Experiments5 classesBinary Group Ad extremis Number of classes522 Reviews in each class200400 Term list size (terms)35,60033,08432,563 Mean of review length (words)1,8752,0321,842 Std Dev of review length (words)913912956 Mean of precision44.25%79.75%85.94% Std Dev of precision2.63%3.59%3.58%
34
Positive and negative descriptive patterns Mining frequent descriptive patterns in positive and negative reviews ReviewsPositiveNegative Total Reviews400 Total Sentences63118 30053 Total Words1027713 447603 Avg. (STD ) sentences per review157.80 (75.49)75.13 (41.62) Avg. (STD) words per sentence16.28 (14.43)14.89 (12.24) adjectives, adverbs and verbs, negatives no nouns, no stopwords
35
Single term patterns Positive ReviewsNegative Reviews not – 3417 sentences good – 1621 sentences: 1/4 of all sentences not – 1915 sentences good – 1025 sentences: 1/3 of all sentences Good = Bad?! Digging deeper ----
36
good in a negative context Negation: “Nothing is good.” “It just doesn't sound good.” Song titles: “Good Charlotte, you make me so mad.” “Feels So Good is dated and reprehensibly bad.” Rhetoric: “And this is a good ruiner: …” “What a waste of my good two dollars…” Faint praise: “…the only good thing… is the packaging.” Expressions: “You all have heard … the good old cliché.”
37
Double term patterns Positive ReviewsNegative Reviews not good not realli realli good not listen not great not good not bad not realli not sound realli good Good Bad?! Digging deeper and deeper --
38
Triple term patterns Positive ReviewsNegative Reviews sing open melod sing smooth melod sing fill melod sing smooth open not realli good sing lead melod sound realli good sing plai melod accompani sing melod sing soft melod not realli good not realli listen bad not good bad not sound pretti tight spit bad not don’t realli not don’t realli bad not pretti bad not not sing sound
39
Noun patterns in genre classification Reviews onMusic Number of genres12 Reviews in each genre150 Term list size (terms)47,864 Mean of review length (words)1,547 Std Dev of review length (words)784 Mean of precision78.89% Std Dev of precision4.11% z Recall: genre classification on music reviews
40
Noun patterns in genre classification zStudied four popular genres zOnly nouns considered ReviewsClassicalCountryHeavyMetalJazzInstr Total Reviews150 Total Sentences7886167202153212692 Total Words138282240595318252184220 Avg. (STD ) sentences per review 52.57 (32.68) 111.47 (43.77) 143.55 (71.69) 84.61 (28.60) Avg. (STD) words per sentence 17.54 (12.25) 14.39 (11.79) 14.78 (12.33) 14.51 (10.16)
41
Single term patterns ClassicalCountryHeavy MetalJazz Instrument music record piec cd work song album love music time song album guitar band track song album music solo time
42
Double term patterns ClassicalCountryHeavy MetalJazz Instrument cd music music piec piec piano piano concerto orchestra symphoni music record piano op music work music time music compos violin concerto cd piec cd record twain shania dixi chick station union guitar steel tim mcgraw cash johnni titl track song titl krauss alison drum guitar countri radio song beat style song album song song guitar riff guitar guitar bass drum guitar song lyric song riff song choru solo guitar song track album track band album band song music jazz liner note drum bass jazz album album song jazz song guitar bass tenor sax solo song piano bass mile davi solo piano section rhythm
43
Naïve Bayesian Feature Ranking (NBFR) zBased on NB text categorization model Prediction in binary classification cases: > 0, d i is in C j < 0, d i is not in C j
44
Features in usage super-classes zRecall: classification on usage super-classes ExperimentsRelaxing, Stimulating Number of classes2 Reviews in each class900 Term list size (terms)34,759 Mean of review length (words)839.03 Std Dev of review length (words)509.96 Mean of precision65.72% Std Dev of precision3.15%
45
Top-ranked terms in super-classes RelaxingStimulating Botti (Chris) Shelby (Lynne) Bethany (Joy) Debelah (Morgan) Mckennitt (Loreena) Pontiy(Jean Luc) Shabazz (lyricist) Tru nightwish Tarja (Turunen) Dio (Ronnie James) Roca (Zach De La Roca) Slade (British band) Incubus (band) Edan (rap artist) Twiztid (band) KJ (KJ52) blue Serj (Tankian) Stooges (The) Terms in ()’s were manually added for clarity
46
Artist-usage relationship zBinomial exact test on artists with >10 reviews (p < 0.05) ArtistUsagep value AFIWaking Up0.03252 Black SabbathAt Work0.00028 Celine DionRomancing0.02499 Dream TheaterListening0.01862 MetallicaWaking Up0.03252 Nirvana_(USA)Going to Sleep0.01862
47
Implementation & T2K (demo) zText-to-Knowledge (T2K) Toolkit A text mining framework Ready-to-use modules and itineraries Natural Language Processing tools integrated Supporting fast prototyping of text mining Data Preprocessing NB Classifier
48
Conclusions zText analysis of user-generated reviews on culture objects NB on genre, rating, and usage classification Feature studies: FPM and NBFR Customer reviews are good resources for connecting users’ opinions to cultural objects and thus facilitating information access via novel, user-oriented facets.
49
Future work zMore text mining techniques zOther critical text blogs, wikis, etc zFeature studies other kinds of features
50
Questions? IMIRSEL Thank you! THE ANDREW W. MELLON FOUNDATION
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.