Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Once upon a time,there was an old man. He had 4 sons
An Introduction to GATE
The.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.
Automatic Timeline Generation Jessica Jenkins Josh Taylor CS 276b.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.
Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
1 Task 2 : Listening 1. A teacher used a dialogue for listening. Below, she describes what she did and how well it worked : “ I told the class to close.
Anaphora Resolution Sanghoon Kwak Takahiro Aoyama.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
A Global Relaxation Labeling Approach to Coreference Resolution Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
ELN – Natural Language Processing Giuseppe Attardi
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Thank you for coming to Samsbiblestories.com and for taking a look at the lessons I have added. These lessons are the result of years of teaching Sunday.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Survey of Semantic Annotation Platforms
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Information Extraction From Medical Records by Alexander Barsky.
Sight Words.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria .
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Unit 2 Review Point of View Characterizations Dialect Values.
1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Let’s make your paragraphs STRONG
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Natural Language Interfaces to Ontologies Danica Damljanović
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
High Frequency Words.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
For which of you, desiring to build a tower, does not first sit down and count the cost, whether he has enough to complete it? (Luke 14:28 RSV)
TOUGH CHOICES Some choices are difficult and life changing.
Science Report Writing Frame.
General Rules for Writing Literary Analysis & Expletives.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CRF &SVM in Medication Extraction
For Evaluating Dialog Error Conditions Based on Acoustic Information
Clustering Algorithms for Noun Phrase Coreference Resolution
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Automatic Detection of Causal Relations for Question Answering
A Machine Learning Approach to Coreference Resolution of Noun Phrases
ENG 1510 – September 17, 2015 Review Greenburg’s & Angelou’s essays (6-6:20) Essay organization (6:20-25) Importance of structure (6:25-45) Narration –
Presentation transcript:

Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer Science and Engineering, MAIT Pablo Gervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid Raquel Hervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid

Outline of the Problem Coreference Resolution = Anaphoric + Non- anaphoric Different genres of text studied: ▫Text without dialogues (like news articles) ▫Text consisting only of dialogues (conversations) 2

An Example Sachin Tendulkar has been honoured with Padma Vibhushan Award. India’s world number one batsman secured 17,000 runs on home soil. Tendulkar has put India in a strong position against Australia in the One-Day Series. The Indian responded to his critics who believed that his career was sliding with his 40 th century. Generally the kind of text found in News Articles. 3

Problems in Dialogue - Why? Pronominal Reference within quoted fragments Change in referential value of demonstratives ▫“You take these bags and I’ll take those” Non-NP antecedents or no antecedents at all 4

Coreference in Narrative Contain many characters and objects Rich in dialogues and coreferences Cover different style of writing from different authors and time periods 5

Another Example The two elder sons did not delay but set off at once, and the third and youngest son began pleading. "No, my son, you mustn't leave me, an old man, all alone," said the king. "Please let me go, Father! I do so want to travel over the world and find my mother." The king reasoned with him, but, seeing that he could not stop him from going, said: "Oh, all right then, I suppose it can't be helped. Go and God be with you!" An excerpt from Three Kingdoms (by Alexander Afanasiev ) 6

Quantitatively Analyzing the Presence of Dialogs in Narrative Texts 7

Resolving Coreference in NPs Knowledge-rich and Knowledge-poor Different approaches considered by us: ▫Decision trees ▫C4.5 Machine Learning algorithm ▫Clustering ▫Hybrid 8

Corpus of narrative texts Thirty folk tales in English Different styles, authors and time periods Rich in dialogs between characters Process: ▫Identify references ▫Enrich references with semantic information ▫Coreference resolution using a clustering approach 9

Step 1: Identifying References GATE (General Architecture for Text Engineering) ▫Annie Sentence Splitter ▫Annie English Tokeniser ▫Annie POS Tagger ▫CREOLE plugin Output in XML format 10

Step 2: Feature Extraction Position Part of Speech (POS) Article Number Semantic Class ▫WordNet (sysnets) Gender ▫A resource of Gender data 11

Annotated Data 12

Step 3: Algorithm and Working Based on the clustering algorithm by (Cardie and Wagstaff, 1999) dist(NP i, NP j ) = ∑ w f * incompatibility (Np i, NP j ) f Є F Feature (f) - Position, Pronoun, Article, Word- substring, Number, Semantic Class, Gender 13

Evaluation and Results 14

Evaluation Clustering algorithm over the tales twice ▫With dialogs ▫Without dialogs Hand correction of the obtained coreferences for comparison ▫Precision and recall 15

Results Precision and Recall Results with and without dialogues: PrecisionRecall With Dialogue Without Dialogue RadiusWith DialoguesWithout Dialogues

Conclusions Nested dialogues decrease the efficiency by 9% in Precision and 7% in Recall But information lost if dialogues are removed ▫Dialogs need to be treated separately In addition, constructed a corpus of tales annotated with coreference information for nominal phrases 17

Future work Dialogs could be extracted from the tale, and considered as a separated text ▫Information about the characters involved is required Possible improvements in different problems ▫Word Sense Disambiguation ▫Named Entity Recognition 18

Thank You. 19