Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.

Similar presentations


Presentation on theme: "1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh."— Presentation transcript:

1 1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh

2 2 Overview General area: acquire knowledge of evaluative and speculative language and use it in NLP applications Primarily corpus-based work Today: results of exploratory studies

3 3 Collaborators Rebecca Bruce, Vasileios Hatzivassiloglou, Joseph Phillips Matthew Bell, Melanie Martin,Theresa Wilson

4 4 Subjectivity Tagging Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences) Banfield 1985, Fludernik 1993, Wiebe 1994, Stein & Wright 1995

5 5 Examples At several different levels, it’s a fascinating tale. subjective Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective

6 6 Subjectivity “Complained” “You Idiot!” “Terrible product” “Speculated” “Maybe” “Enthused” “Wonderful!” “Great product”

7 7 Examples Strong addressee-oriented negative evaluation Recognizing flames (Spertus 1997) Personal e-mail filters (Kaufer 2000) I had in mind your facts, buddy, not hers. Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”

8 8 Examples Opinionated, editorial language IR, text categorization (Kessler et al. 1997) Do the writers purport to be objective? Look, this is a man who has great numbers. We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.

9 9 Examples Belief and speech reports Information extraction, summarization, intellectual attribution (Teufel & Moens 2000) Northwest Airlines settled the remaining lawsuits, a federal judge said. “The cost of health care is eroding our standard of living and sapping industrial strength”, complains Walter Maher.

10 10 Other Applications Review mining (Terveen et al. 1997) Clustering documents by ideology (Sack 1995) Style in machine translation and generation (Hovy 1987)

11 11 Potential Subjective Elements "The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher. Sap: potential subjective element Subjective element

12 12 Subjectivity Multiple types, sources, and targets We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself. Somehow grown-ups believed that wisdom adhered to youth.

13 13 Outline Data and annotation Sentence-level classification Individual words Collocations Combinations

14 14 Annotations Three levels: expression level sentence level document level Manually tagged + existing annotations

15 15 Expression Level Annotations [Perhaps you’ll forgive me] for reposting his response They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]

16 16 Expression Level Annotations Difficult for manual and automatic tagging: detailed no predetermined classification unit To date: used for training and bootstrapping Probably the most natural level

17 17 Document Level Annotations Manual: flames in Newsgroups Existing: opinion pieces in the WSJ: editorials, letters to the editor, arts & leisure reviews * to ***** reviews + More directly related to applications, but …

18 18 Document Level Annotations Opinion pieces contain objective sentences and Non-opinion pieces contain subjective sentences Editorials contain facts supporting the argument News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …” Reviews contain information about the product

19 19 Document Level Annotations opinion pieces subj 74% obj 26% In a WSJ data set: non-opinion pieces subj 43% obj 57%

20 20 Data in this Talk Sentence level 1000 WSJ sentences 3 judges reached good agreement after rounds Used for training and evaluation Expression level 1000 WSJ sentences (2J) 462 newsgroup messages (2J) + 15413 words (1J) Single round; results promising Used to generate features, and not for evaluation

21 21 Data in this Talk Document level: Existing opinion-piece annotations used to generate features Manually refined classifications used for evaluation Identified editorials not marked as such Only clear instances labeled To date: 1 judge Distinct from the other data 3 editions, each more than 150K words

22 22 Sentence Level Annotations A sentence is labeled subjective if any significant expression of subjectivity appears “The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher. “What an idiot,’’ the idiot presumably complained.

23 23 Sentence Classification Binary Features: pronoun, adjective, number, modal ¬ “will “, adverb ¬ “not”, new paragraph Lexical feature: good for subj; good for obj; good for neither Probabilistic classifier 10-fold cross validation; 51% baseline 72% average accuracy across folds 82% average accuracy on sentences rated certain

24 24 Identifying PSEs There are few high precision, high frequency potential subjective elements

25 25 Identifying Individual PSEs Classifications correlated with adjectives Good subsets Dynamic adjectives (Quirk et al. 1985) Positive, negative polarity; gradability automatically identified in corpora ( Hatzivassiloglou & McKeown 1997) Results from distributional similarity

26 26 Distributional Similarity Word similarity based on distributional pattern of words Much work in NLP (see Lee 99, Lee and Pereira 99) Purposes: Improve estimates of unseen events Thesaurus and dictionary construction from corpora

27 27 Lin’s Distributional Similarity Lin 1998 Ihaveabrowndog R1 R3 R2 R4 Word R W I R1 have have R2 dog brown R3 dog...

28 28 Lin’s Distributional Similarity R W R W R W R W Word1 Word2 Pairs statistically correlated with Word1 Sum over RWint: I(Word1,RWint) + I(Word2,RWint) / Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)

29 29 Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd

30 30 Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd

31 31 Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd

32 32 Filtering Seed Words Words+ Clusters Filtered Set Word + cluster removed if precision on training set < threshold

33 33 Parameters Seed Words Words+ Clusters Cluster size Threshold

34 34 Seeds from Annotations 1000 WSJ sentences with sentence level and expression level annotations They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]. "It's [e? 3 really] [e- 3 bizarre]," says Albert Lerman, creative director at the Wells agency.

35 35 Experiments 9 10 1 10 1/10 used for training, 9/10 for testing Parameters: Cluster-size fixed at 20 Filtering threshold: precision of baseline adjective feature on the training data +7.5% ave 10-fold cross validation [More improvements with other adj features]

36 36 Opinion Pieces 3 WSJ data sets, over 150K words each Skewed distribution: 13-17% words in opinions Baseline for comparison: # words in opinions / total # words For measuring precision: Prec(S) = # instances of S in opinions / total # instances of S

37 37 Parameters Seed Words Words+ Clusters Cluster size Threshold 1-70% 2-40

38 38 Results Varies with parameter settings, but there are smooth regions of the space Here: training/validation/testing

39 39 Low Frequency Words Single instance in a corpus ~ low frequency Analysis of expression level annotations: there are many more single-instance words in subjective elements than outside them

40 40 Unique Words Replace all words that appear once in the test data with “UNIQUE” +5-10% points

41 41 Collocations here we go again get out of here what a well and good rocket science for the last time just as well … ! Start with the observation that low precision words often compose higher precision collocations

42 42 Collocations Identify n-gram PSEs as sequences whose precision is higher than the maximum precision of its constituents W1,W2 is a PSE if prec(W1,W2) > max (prec(W1),prec(W2)) W1,W2,W3 is a PSE if prec(W1,W2,W3) > max(prec(W1,W2),prec(W3)) or prec(W1,W2,W3) > max(prec(W1),prec(W2,W3))

43 43 Collocations Moderate improvements: +3-10% points But with all unique words mapped to “UNIQUE”: +13-24% points

44 44 Example Collocations with Unique highly||adverb UNIQUE||adj highly unsatisfactory highly unorthodox highly talented highly conjectural highly erotic

45 45 Example Collocations with Unique UNIQUE||verb out||IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out

46 46 Collocations UNIQUE||adj to||TO UNIQUE||verb impervious to reason strange to celebrate wise to temper UNIQUE||noun of||IN its||pronoun sum of its usurpation of its proprietor of its they||pronoun are||verb UNIQUE||noun they are fools they are noncontenders

47 47 Opinion Results: Summary Best Worst baseline 17% baseline 13% +prec/freq +prec/freq Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 2-grams +10/569 +04/525 3-grams +07/156 +03/148 1-U-grams +10/6065 +06/6045 2-U-grams +24/294 +14/288 3-U-grams +27/138 +13/144 Disparate features have consistent performance N Collocation sets largely distinct

48 48 Does it add up? Good preliminary results classifying opinion pieces using density and feature count features.

49 49 Future Work Mutual bootstrapping (Riloff & Jones 1999) Co-training (Collins & Singer 1999) to learn both PSEs and contextual features Integration into a probabilistic model Text classification and review mining

50 50 References Banfield, A. (1982). Unspeakable Sentences. Routledge and Kegan Paul. Collins, M. & Singer, Y. (1999). Unsupervised models for named entity classification. EMNLP-VLC-99. van Dijk, T.A. (1988). News as Discourse. Lawrence Erlbaum. Fludernik, M. (1983). The Fictions of Language and the Languages of Fiction. Routledge. Hovy, E. (1987). Generating Natural Language Under Pragmatic Constraints. PhD dissertation. Kaufer, D. (2000). Flaming. www.eudora.comwww.eudora.com Kessler, B., Nunberg, G., Schutze H. (1997). Automatic Detection of Genre. ACL-EACL-97. Riloff, E. & Jones R. (1999). Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. AAAI-99

51 51 References Stein, D. & Wright, S. (1995). Subjectivity and Subjectivisation. Cambridge. Terveen, W., Hill, W., Amento, B.,McDonald D. & Creter, J. (1997). Building Task-Specific Interfaces to High Volume Conversational Data. CHI-97. Teufel S., & Moens M. (2000). What’s Yours and What’s Mine: Determining Intellectual Attribution in Scientific Texts. EMNLP-VLC- 00. Wiebe, J. (2000). Learning Subjective Adjectives from Corpora. AAAI- 00. Wiebe, J. (1994). Tracking Point of View in Narrative. Computational Linguistics (20) 2. Wiebe, J., Bruce, R., & O’Hara T. (1999). Development and Use of a Gold Standard Data Set for Subjectivity Classifications. ACL-99.

52 52 References Hatzivassiloglou V. & McKeown K. (1997). Predicting the Semantic Orientation of Adjectives. ACL-EACL-97. Hatzovassiloglou V. & Wiebe J. (2000). Effects of Adjective Orientation and Gradability on Sentence Subjectivity. COLING-00. Lee, L. (1999). Measures of Distributional Similarity. ACL-99. Lee, L. & Pereira F. (1999). ACL-99. Lin, D. (1998). Automatic Retrieval and Clustering of Similar Words. COLING-ACL-98. Quirk, R, Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. Longman. Sack, W. (1995). Representing and Recognizing Point of View. AAAI Fall Symposium on Knowledge Navigation and Retrieval.

53 53 Sentence Annotations Ave pair-wise Kappa scores: all data:.69 certain data:.88 (60% of the corpus) Case study of analyzing and improving intercoder reliability : if there is symmetric disagreement resulting from bias assessed by fitting probability models (Bishop et al. 1975, CoCo) bias: marginal homogeneity symmetric disagreement: quasi-symmetry yuse the latent class model to correct disagreements

54 54 Test for Bias: Marginal Homogeneity Worse the fit, greater the bias C1 C2 C4 C1 C3 C2C3C4 4+ = X4 3+ = X3 2+ = X2 1+ = X1 X1 +1 = X2 +2 = X3 +3 = X4 +4 = for all i

55 55 Test for Symmetric Disagreement: Quasi-Symmetry C1 C2 C4 C1 C3 C2C3C4 * * *** *** ** ** Tests relationships among the off-diagonal counts Better the fit, higher the correlation

56 56 (Potential) Subjective Elements Same word, different types “Great majority” objective “Great!“ positive evaluative “Just great.” negative evaluative

57 57 Review Mining From: Hoodoo>hoodooBUGZAPPER@newnorth.net> Newsgroups: rec.gardens Subject: Re: Garden software I bought a copy of Garden Encyclopedia from Sierra. Well worth the time and money.


Download ppt "1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh."

Similar presentations


Ads by Google