Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Similar presentations


Presentation on theme: "Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer."— Presentation transcript:

1 Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer Science and Engineering, MAIT Pablo Gervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid Raquel Hervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid

2 Outline of the Problem Coreference Resolution = Anaphoric + Non- anaphoric Different genres of text studied: ▫Text without dialogues (like news articles) ▫Text consisting only of dialogues (conversations) 2

3 An Example Sachin Tendulkar has been honoured with Padma Vibhushan Award. India’s world number one batsman secured 17,000 runs on home soil. Tendulkar has put India in a strong position against Australia in the One-Day Series. The Indian responded to his critics who believed that his career was sliding with his 40 th century. Generally the kind of text found in News Articles. 3

4 Problems in Dialogue - Why? Pronominal Reference within quoted fragments Change in referential value of demonstratives ▫“You take these bags and I’ll take those” Non-NP antecedents or no antecedents at all 4

5 Coreference in Narrative Contain many characters and objects Rich in dialogues and coreferences Cover different style of writing from different authors and time periods 5

6 Another Example The two elder sons did not delay but set off at once, and the third and youngest son began pleading. "No, my son, you mustn't leave me, an old man, all alone," said the king. "Please let me go, Father! I do so want to travel over the world and find my mother." The king reasoned with him, but, seeing that he could not stop him from going, said: "Oh, all right then, I suppose it can't be helped. Go and God be with you!" An excerpt from Three Kingdoms (by Alexander Afanasiev ) 6

7 Quantitatively Analyzing the Presence of Dialogs in Narrative Texts 7

8 Resolving Coreference in NPs Knowledge-rich and Knowledge-poor Different approaches considered by us: ▫Decision trees ▫C4.5 Machine Learning algorithm ▫Clustering ▫Hybrid 8

9 Corpus of narrative texts Thirty folk tales in English Different styles, authors and time periods Rich in dialogs between characters Process: ▫Identify references ▫Enrich references with semantic information ▫Coreference resolution using a clustering approach 9

10 Step 1: Identifying References GATE (General Architecture for Text Engineering) ▫Annie Sentence Splitter ▫Annie English Tokeniser ▫Annie POS Tagger ▫CREOLE plugin Output in XML format 10

11 Step 2: Feature Extraction Position Part of Speech (POS) Article Number Semantic Class ▫WordNet (sysnets) Gender ▫A resource of Gender data 11

12 Annotated Data 12

13 Step 3: Algorithm and Working Based on the clustering algorithm by (Cardie and Wagstaff, 1999) dist(NP i, NP j ) = ∑ w f * incompatibility (Np i, NP j ) f Є F Feature (f) - Position, Pronoun, Article, Word- substring, Number, Semantic Class, Gender 13

14 Evaluation and Results 14

15 Evaluation Clustering algorithm over the tales twice ▫With dialogs ▫Without dialogs Hand correction of the obtained coreferences for comparison ▫Precision and recall 15

16 Results Precision and Recall Results with and without dialogues: PrecisionRecall With Dialogue61.1056.57 Without Dialogue 70.4963.15 RadiusWith DialoguesWithout Dialogues 1036.8150.9341.9562.69 2053.7759.2657.0166.77 3156.5761.1063.1570.49 16

17 Conclusions Nested dialogues decrease the efficiency by 9% in Precision and 7% in Recall But information lost if dialogues are removed ▫Dialogs need to be treated separately In addition, constructed a corpus of tales annotated with coreference information for nominal phrases 17

18 Future work Dialogs could be extracted from the tale, and considered as a separated text ▫Information about the characters involved is required Possible improvements in different problems ▫Word Sense Disambiguation ▫Named Entity Recognition 18

19 Thank You. 19


Download ppt "Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer."

Similar presentations


Ads by Google