Presentation is loading. Please wait.

Presentation is loading. Please wait.

7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.

Similar presentations


Presentation on theme: "7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh."— Presentation transcript:

1 7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh

2 7/2003EMNLP032 Subjectivity Subjective language includes opinions, speculations, emotions Distinguishing subjective and objective information could benefit many applications: –Information extraction (discard subjective information or label it as uncertain) –Question answering (find answers reflecting different opinions) –Summarization (summarize various views on topic)

3 7/2003EMNLP033 Goals Sentence-level subjectivity classification –Wiebe et al. 2001 found that 44% of sentences in news articles are subjective *

4 7/2003EMNLP034 Goals Sentence-level subjectivity classification Learning subjectivity clues *

5 7/2003EMNLP035 Goals Sentence-level subjectivity classification Learning subjectivity clues from unannotated text *

6 7/2003EMNLP036 Goals Sentence-level subjectivity classification Learning subjectivity clues from unannotated text corpora Learning linguistically rich patterns (represented as IE extraction patterns)

7 7/2003EMNLP037 Previous Work in NLP Subjectivity Analysis in Text Document-level subjectivity classification (e.g., Turney 2002; Pang et al 2002; Spertus 1997) and above (Tong 2001) Genre classification (e.g., Karlgren and Cutting 1994; Kessler et al. 1997; Wiebe et al. 2001) Supervised sentence-level classification (Wiebe et al 1999) Learning adjectives, adjectival phrases, verbs, nouns, and N-grams (e.g., Turney 2002; Hatzivassiloglou & McKeown 1997; Wiebe et al. 2001; Riloff et al. 2003)

8 7/2003EMNLP038 Recent Related Work Yu and Hatzivassiloglou (EMNLP03): unsupervised sentence level classification. Complementary approach and features. Dave et al. (WWW03): reviews classified as positive or negative. Agrawal et al. (WWW03): newsgroup authors partitioned into camps based on quotation links Gordon et al. (ACL03): manually developed grammars for some types of subjective language

9 7/2003EMNLP039 Extraction Patterns Extraction patterns are lexico-syntactic patterns to identify relevant information Typically they represent role relationships surrounding noun and verb phrases

10 7/2003EMNLP0310 Extraction Patterns Extraction patterns are lexico-syntactic patterns to identify relevant information Typically they represent role relationships surrounding noun and verb phrases hijacking of : hijacked vehicle was hijacked: hijacked vehicle

11 7/2003EMNLP0311 Extraction Patterns Extraction patterns are lexico-syntactic patterns to identify relevant information Typically they represent role relationships surrounding noun and verb phrases hijacking of : hijacked vehicle was hijacked: hijacked vehicle hijacked: hijacker

12 Our Method Subjective expressions represented as extraction patterns get to know appear to be was satisfied complained Subtle variations can be significant: “The comedian bombed last night.” Often higher precision than sub-expressions More general than fixed n-grams

13 7/2003EMNLP0313 Our Method Subjective expressions represented as extraction patterns get to know appear to be was satisfied complained Supervised extraction pattern learning Training data generated automatically

14 7/2003EMNLP0314 Our Method Subjective expressions represented as extraction patterns get to know appear to be was satisfied complained Supervised extraction pattern learning Training data generated automatically Entire process bootstrapped

15 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

16 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

17 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

18 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

19 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

20 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences Results For 1 cycle

21 7/2003EMNLP0321 Test Data Manual annotation for multiple perspective QA (ARDA AQUAINT NRRC) (working on copyright issues to release corpus this summer) Good agreement on sentence classes used here –0.77 ave pair-wise kappa –0.89 ave pair-wise kappa with borderline sentences removed (11% of the corpus) Wilson & Wiebe SIGdial 2003 describes the annotation scheme and agreement study

22 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

23 7/2003EMNLP0323 Unannotated Text Collection English language versions of FBIS news articles from a variety of countries. Size: 302,160 sentences

24 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

25 Known subjective vocabulary From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al CoNLL03)

26 Known subjective vocabulary From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al. 2003) Strongly subjective: most instances subjective Weakly subjective: objective instances also common

27 Known subjective vocabulary From previous work Manually identified (e.g, entries from Levin 1993) Automatically identified (e.g., nouns from Riloff et al. 2003) Strongly subjective: most instances subjective Weakly subjective: objective instances also common Any data used is separate from data in this paper

28 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

29 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective >1 strongly subjective Classifier clue unlabeled sentences subjective sentences Objective Classifier objective sentences 91.3% Precision 31.9% Recall Test set: 2197 sentences 59% subjective

30 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective 2+ strongly subjective Classifier clues unlabeled sentences Objective previous, current, next sentence: Classifier 0 strongly subjective clue & 0 or 1 weakly subjective clue subjective sentences 82.6% Precision 16.4% Recall objective sentences

31 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

32 Subjective Classifier Extraction Pattern AutoSlog-TS Learner Riloff 1996 Objective Classifier subjective patterns subjective sentences “relevant texts” 17,000 objective sentences “irrelevant texts” 17,000

33 7/2003EMNLP0333 Step 1: Apply Syntactic Templates active-verb dobj verb infinitive aux noun Active-verb Verb infinitive Noun prep Infinitive prep

34 7/2003EMNLP0334 Step 1: Apply Syntactic Templates active-verb dobj dealt blow verb infinitive appear to be aux noun has position Active-verb endorsed Verb infinitive get to know Noun prep opinion on Infinitive prep to resort to

35 7/2003EMNLP0335 Step 1: Apply Syntactic Templates active-verb dobj dealt blow verb infinitive appear to be aux noun has position Active-verb endorsed Verb infinitive get to know Noun prep opinion on Infinitive prep to resort to

36 7/2003EMNLP0336 Step 1: Apply Syntactic Templates active-verb dobj dealt blow Matches any sentence with verb phrase with head=dealt direct object with head=blow. “The experience certainly dealt a stiff blow to his pride.”

37 7/2003EMNLP0337 Step 2: Select Patterns Apply all learned patterns to training data Calculate precision and frequency: precision(pattern) = # in subjective sentences / total # Select patterns based on their frequency and precision on the training data (No tuning on the test set)

38 Examples from Training Data was asked100% asked 63% is talk 100% talk of 90% will talk 71% was expected from 100% was expected 42% is fact 100% fact is 100% %SUBJ

39 Examples from Training Data was asked100% asked 63% is talk 100% talk of 90% will talk 71% was expected from 100% was expected 42% is fact 100% fact is 100% %SUBJ

40 Examples from Training Data was asked100% asked 63% is talk 100% talk of 90% will talk 71% was expected from 100% was expected 42% is fact 100% fact is 100% %SUBJ

41 Examples from Training Data was asked100% asked 63% is talk 100% talk of 90% will talk 71% was expected from 100% was expected 42% is fact 100% fact is 100% %SUBJ

42 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

43 7/2003EMNLP0343 Evaluation of Learned Patterns Test data: –3947 sentences –54% subjective Train Test F >= 10 P=100% P = 85% Recall=41% F >= 2 P >= 60% P = 71% Recall=92%

44 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

45 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

46 unlabeled sentences Known subjective vocabulary Subjective Classifier subjective sentences Extraction Pattern Learner subjective patterns

47 unlabeled sentences Known subjective vocabulary Subjective Classifier New subjective sentences: 1 old clue + 1 new >1 new old + new subjective sentences Extraction Pattern Learner F >= 10, P = 100% on training data subjective patterns

48 7/2003EMNLP0348 Evaluation on Test Data Original subjective classifier Augmented subjective classifier 40.1% recall 32.9% recall 90.2% precision 91.3% precision

49 7/2003EMNLP0349 Future Work

50 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

51 Known subjective vocabulary Pattern-Based Objective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences objective sentences objective sentences Improve original high-precision classifier identify new objective sentences during bootstrapping

52 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

53 Unannotated Text Collection unlabeled sentences Subjective Classifier Iteration 0 Iteration 1+ Objective Classifier Iteration 0 Iteration 1+ Known subjective vocabulary Iteration 0: use corpus-independent subjectivity clues to generate initial training set Iteration 1+: supervised learning algorithm to tune to corpus and combine old and new clues effectively

54 Known subjective vocabulary Build up subjective lexicon as the process is applied to additional corpora Once bootstrapping process terminates, human review of high precision patterns tough act to follow: linguistic subjectivity Rush Limbaugh: opinionated source police: “lightning rod” topic

55 7/2003EMNLP0355 Conclusions High-precision subjectivity classification can be used to generate large amounts of labeled training data Extraction pattern learning techniques can learn linguistically rich subjective patterns Bootstrapping process results in higher recall with little loss in precision

56 Known subjective vocabulary Build up subjective lexicon as the process is applied to new corpora. Richer Representation with deeper knowledge (theta roles, polarity, evaluative?, speculative?, tone, ambiguity,…) Human review of high-precision patterns tough act to follow: linguistic subjectivity Rush Limbaugh: opinionated source police: “lightning rod” topic

57 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

58 Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns objective sentences 17000 new subjective sentences

59 Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier > 0 instances of patterns with F >4 P = 100 on training data Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns objective sentences 17000 subjective sentences 9500 new

60 Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective sentences objective sentences 17000 7500 9500 new new subjective patterns 4248 new patterns P >.59 on training data 308 new patterns P = 100 on training data

61 Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective sentences objective sentences 17000 7500 9500 new new subjective patterns New + old patterns on test set: Recall increased more than precision decreased +2R, -0.5P to +4R, -2P

62 7/2003EMNLP0362 Example The Foreign Ministry said Thursday that it was “surprised, to put it mildly” by the U.S. State Department’s criticism of Russia’s human rights record and objected in particular to the “odious” section on Chechnya. (writer,FM,FM) (writer,FM) (writer,FM,FM,SD) (writer,FM)

63 7/2003EMNLP0363

64 7/2003EMNLP0364 Annotation Scheme The annotation scheme was developed as part of a U.S. government-sponsored project (ARDA AQUAINT NRRC) to investigate multiple perspective question answering. Annotators labeled private state expressions. Each private state can have low, medium, or high strength. Our gold standard considers a sentence to be subjective if it contains at least one private state expression of medium or higher strength.

65 7/2003EMNLP0365 Two Ways of Expressing Private States Explicit mentions of private states and speech events –The United States fears a spill-over from the anti-terrorist campaign Expressive subjective elements –The part of the US human rights report about China is full of absurdities and fabrications.

66 7/2003EMNLP0366 Nested Sources “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer, Xirao-Nima, US) (writer, Xirao-Nima) (writer) “The report is full of absurdities,’’ he continued. (writer, Xirao-Nima) (writer)

67 7/2003EMNLP0367 OnlyFactive “The US fears a spill-over’’, said Xirao-Nima, a professor of foreign affairs at the Central University for Nationalities. (writer) OnlyFactive=yes (writer, Xirao-Nima) OnlyFactive=yes (writer, Xirao-Nima, US) OnlyFactive=no

68 7/2003EMNLP0368 Example The Foreign Ministry said Thursday that it was “surprised, to put it mildly” by the U.S. State Department’s criticism of Russia’s human rights record and objected in particular to the “odious” section on Chechnya. (writer,FM,FM) (writer,FM) (writer,FM,FM,SD) (writer,FM)

69 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences

70 Unannotated Text Collection unlabeled sentences Known subjective vocabulary Subjective Classifier unlabeled sentences subjective sentences Pattern-Based Subjective Classifier Extraction Pattern Learner Objective Classifier unlabeled sentences subjective patterns subjective sentences subjective patterns objective sentences


Download ppt "7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh."

Similar presentations


Ads by Google