Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson 2007. 10. 24 서 진 이 HPC Lab, UOS.

Similar presentations


Presentation on theme: "Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson 2007. 10. 24 서 진 이 HPC Lab, UOS."— Presentation transcript:

1 Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson 2007. 10. 24 서 진 이 HPC Lab, UOS

2 2 목 차 Introduction Subjectivity Data Using Extraction Patterns to Learn Subjective Nouns Creating Subjectivity Classifiers Related Work

3 3 1.Introduction : Bootstrap 통계적 추론은 표본통계량의 표본분포를 기초로 한다. bootstrap 은 무엇보다도 최소한 대략적으로라도 한 표본으로부터 표본분포를 찾는 방식 1. 원래의 표본은 모집단으로부터 무작위로 뽑힌다. 2. 부트스트랩표본들은 원래 표본으로부터 무작위로 뽑힌다 Bootstrap 절차 -.  1 단계 : resample 하기 원래의 랜덤표본에서 복원추출로 새로운 수천개의 표본을 뽑아낸 것을 bootstrap sample 또는 resample 이라고 한다 각 resample 의 크기는 원래의 랜덤표본의 크기와 같다.  2 단계 : bootstrap 분포 계산하기 각각의 resample 에 대한 통계량을 계산한다. 이 resample 통계량의 분포는 bootstrap distribution 이라고 부른다.  3 단계 : bootstrap 분포 이용하기 bootstrap distribution 은 표본분포의 통계량의 모양, 중심, 퍼진 정도에 대한 정보를 알려준다.

4 4 1.Introduction : Learning Subjective Nouns Goal: to learn subjective nouns from unannotated text Method: applying IE-based bootstrapping algorithms that were designed to learn semantic categories Hypothesis: extraction patterns can identify subjective contexts that co-occur with subjective nouns Example: “expressed ” concern, hope, support

5 5 2. Subjectivity : the Annotation Scheme(1) Mark polarity of subjective expressions as positive, negative, both, or neutral African observers generally approved of his victory while Western governments denounced it. Besides, politicians refer to good and evil … Jerome says the hospital feels no different than a hospital in the states. positive negative both neutral

6 6 http://www.cs.pitt.edu/~wiebe/pubs/ardasummer02/ Goal: to identify and characterize expressions of private states in a sentence. Private state = opinions, evaluations, emotions and speculations. Negative evaluation Also judge the strength of each private state: low, medium, high, extreme. Annotation gold standard: a sentence is subjective if it contains at least one private-state expression of medium or higher strength objective – all the rest The time has come, gentlemen, for Sharon, the assassin, to realize that injustice cannot last long. 2. Subjectivity : the Annotation Scheme(2)

7 7 @ www.cs.pitt.edu/mpqa English language versions of articles from the world press (187 news sources) Also includes contextual polarity annotations Themes of the instructions: No rules about how particular words should be annotated. Don ’ t take expressions out of context and think about what they could mean, but judge them as they are used in that sentence. 2. Subjectivity : Corpus(1)

8 8 2. Subjectivity : Corpus(2) Extentions Wilson 2007

9 9 I think people are happy because Chavez has fallen. direct subjective span: are happy source: attitude: inferred attitude span: are happy because Chavez has fallen type: neg sentiment intensity: medium target: target span: Chavez has fallen target span: Chavez attitude span: are happy type: pos sentiment intensity: medium target: direct subjective span: think source: attitude: attitude span: think type: positive arguing intensity: medium target: target span: people are happy because Chavez has fallen 2. Subjectivity : Corpus(3) Extentions Wilson 2007

10 10 3. Using Extraction Patterns to Learn Subjective Nouns : Meta-Bootstrapping(1) (Riloff and Jones 1999) Mutual bootstrapping: Begin with a small set of seed words that represent a targeted semantic category  (e.g. begin with 10 words that represent LOCATIONS)  and an unannotated corpus. Produce thousands of extraction patterns for the entire corpus (e.g. “ was hired ” ) Compute a score for each pattern based on the number of seed words among its extractions Select the best pattern, all of its extracted noun phrases are labeled as the target semantic category Re-score extraction patterns (original seed words + newly labeled words)

11 11 Meta-bootstrapping: After the normal bootstrapping  all nouns that were put into the semantic dictionary are reevaluated  each noun is assigned a score based on how many different patterns extracted it.  only the 5 best nouns are allowed to remain in the dictionary; the others are discarded  restart mutual bootstrapping 3. Using Extraction Patterns to Learn Subjective Nouns : Meta-Bootstrapping(2)

12 12 Unannotated Texts Best Extraction Pattern Extractions (Nouns) Ex: hope, grief, joy, concern, worries Ex: expressed Ex: happiness, relief, condolences 3. Using Extraction Patterns to Learn Subjective Nouns : Meta-Bootstrapping(3)

13 13 3.Using Extraction Patterns to Learn Subjective Nouns : Basilisk(1) (Thelen and Riloff 2002) Begin with an unannotated text corpus and a small set of seed words for a semantic category Bootstrapping: Basilisk automatically generates a set of extraction patterns for the corpus and scores each pattern based upon the number of seed words among its extractions  best patterns in the Pattern Pool. All nouns extracted by a pattern in the Pattern Pool  Candidate Word Pool. Basilisk scores each noun based upon the set of patterns that extracted it and their collective association with the seed words. The top 10 nouns are labeled as the targeted semantic class and are added to the dictionary. Repeat bootstrapping process.

14 14 extraction patterns and their extractions corpus seed words semantic lexicon 5 best candidate words Pattern Pool best patterns Candidate Word Pool extractions 3.Using Extraction Patterns to Learn Subjective Nouns : Basilisk(2) - [Thelen & Riloff 02]

15 15 expressed condolences, hope, grief, views, worries indicative of compromise, desire, thinking inject vitality, hatred reaffirmed resolve, position, commitment voiced outrage, support, skepticism, opposition, gratitude, indignation show of support, strength, goodwill, solidarity was sharedanxiety, view, niceties, feeling 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(1) Extraction Examples

16 16 Subjective Seed Words cowardiceembarrassment hatred outrage crapfool hell slander delightgloom hypocrisy sigh disdaingrievance love twit dismayhappiness nonsense virtue 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(2)

17 17 Examples of Strong Subjective Nouns anguish exploitation pariah antagonism evil repudiation apologist fallacies revenge atrocities genius rogue barbarian goodwill sanctimonious belligerence humiliationscum bully ill-treatment smokescreen condemnation injustice sympathy denunciation innuendo tyranny devil insinuation venom diatribe liar exaggeration mockery 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(3)

18 18 Examples of Weak Subjective Nouns aberration eyebrowsresistant allusion failuresrisk apprehensions inclinationsincerity assault intrigue slump beneficiary liabilityspirit benefit likelihoodsuccess blood peacefultolerance controversy persistenttrick credence plaguetrust distortion pressureunity drama promise eternity rejection 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(4)

19 19 Subjective Noun Results Bootstrapping corpus: 950 unannotated news articles We ran both bootstrapping algorithms for several iterations We manually reviewed the words and labeled them as strong, weak, or not subjective 1052 subjective nouns were learned (454 strong, 598 weak) included in our subjectivity lexicon @ www.cs.pitt.edu/mpqa 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(5)

20 20

21 21 The graph tracks the accuracy as bootstrapping progressed. Accuracy was high during the initial iterations but tapered off as the bootstrapping continued. After 20 words, both algorithms were 95% accurate. After 100 words, Basilisk was 75% accurate and MetaBoot 81%. After 1000 words, MetaBoot 28% and Basilisk 53%. 3.Using Extraction Patterns to Learn Subjective Nouns : Experimental Results(7)

22 22 4.Creating Subjectivity Classifiers : Subjective Noun Features Na ï ve Bayes classifier using the nouns as features. Sets: BA-Strong: the set of StrongSubjective nouns generated by Basilisk BA-Weak: the set of WeakSubjective nouns generated by Basilisk MB-Strong: the set of StrongSubjective nouns generated by Meta-Bootstrapping MB-Weak: the set of WeakSubjective nouns generated by Meta-Bootstrapping For each set – a three-valued feature: presence of 0, 1, ≥2 words from that set

23 23 4. Creating Subjectivity Classifiers : Previously Established Features (Wiebe, Bruce, O ’ Hara 1999) Sets: a set of stems positively correlated with the subjective training examples – subjStems a set of stems positively correlated with the objective training examples – objStems For each set – a three-valued feature the presence of 0, 1, ≥2 members of the set. A binary feature for each: presence in the sentence of a pronoun, adjective, cardinal number, modal other than will, adverb other than not. Other features from other researchers.

24 24 4. Creating Subjectivity Classifiers : Discourse Features subjClues = all sets defined before except objStems Four features: ClueRate subj for the previous and following sentences ClueRate obj for the previous and following sentences Feature for sentence length.

25 25 4. Creating Subjectivity Classifiers : Classification Results The results of Na ï ve Bayes classifiers trained with different combinations of features. Using both WBO and SubjNoun achieves better performance than either one alone. The best results are achieved with all the features combined. Another classification, with a higher precision, can be obtained by classifying a sentence as subjective if it contains any of the StrongSubjective nouns. 26% recall 87% precision

26 26 The Bootstrapping Era Unannotated Texts + = KNOWLEDGE !


Download ppt "Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson 2007. 10. 24 서 진 이 HPC Lab, UOS."

Similar presentations


Ads by Google