Download presentation
Presentation is loading. Please wait.
Published byAmy Cook Modified over 9 years ago
1
A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email : refoufia@yahoo.fr Introduction Types of anaphora Knowledge sources The main algorithm DiscussionConclusion
2
what is anaphora ? The term anaphora relates to the presence in the text of entities (noun phrases, pronouns, etc.) which, on one hand, refer to the same entity (are co referential) and, on the other hand supply additional information. The term anaphora relates to the presence in the text of entities (noun phrases, pronouns, etc.) which, on one hand, refer to the same entity (are co referential) and, on the other hand supply additional information. Reference to an entity is generally termed anaphora, the entity to which the anaphora refers is the antecedent or the referent, anaphor is the entity used to make the reference. Example “ called last night, wanted to see me”. Reference to an entity is generally termed anaphora, the entity to which the anaphora refers is the antecedent or the referent, anaphor is the entity used to make the reference. Example “My brother called last night, he wanted to see me”. Linguistic unit which is a substitution to another linguistic unit already introduced. Linguistic unit which is a substitution to another linguistic unit already introduced.
3
Types of anaphora Pronominal: it’s the most used one, the reference is made by a pronoun :” took the apple on the table. ate it” Pronominal: it’s the most used one, the reference is made by a pronoun :”Sabrina took the apple on the table. She ate it” Definite noun phrase : the antecedent is referred to by a definite noun phrase « visited the city. inaugurated several realisations ». Definite noun phrase : the antecedent is referred to by a definite noun phrase « The president visited the city. The host of the people’s palace inaugurated several realisations ». Verb phrase as antecedent : « Sarah was vain ». Verb phrase as antecedent : « Sarah tried to convince him to stay. The attempt was vain ». Ordinal Anaphora : the anaphor is a cardinal number like first, second, etc. “Sarah was not satisfied by. She looked for ”. Ordinal Anaphora : the anaphor is a cardinal number like first, second, etc. “Sarah was not satisfied by the solution. She looked for a new one”.
4
knowledge sources Morphology is concerned with the structure of words; it tells us how to extract the base forms out of inflected forms that occur in texts. Morphology is concerned with the structure of words; it tells us how to extract the base forms out of inflected forms that occur in texts. Syntax is concerned with the ways words combine to form phrases, and phrases combine to form sentences. It extracts the syntactic function of each word (verb, noun, pronoun, etc.). This process is known as parsing. Syntax is concerned with the ways words combine to form phrases, and phrases combine to form sentences. It extracts the syntactic function of each word (verb, noun, pronoun, etc.). This process is known as parsing. Semantics deals with the meaning of words, phrases and sentences. Semantics deals with the meaning of words, phrases and sentences. Pragmatic knowledge uses the context in order to disambiguate among different settings. Pragmatic knowledge uses the context in order to disambiguate among different settings.
5
The main algorithm Recognition phase Recognition phase –Morphosyntactic analysis –Recognition of non anaphoric pronouns –Identification of focusing expressions –Data structures building Resolution phase Resolution phase For each anaphor do : For each anaphor do : –Carry out in order the constraints –Carry out in order the preferences
6
constraints Constraints are rules which participate in the purging of the candidates appearing in the structures built during the parsing process. Consistency conditions : candidates are eliminated on morphological grounds (number, gender, person) Consistency conditions : candidates are eliminated on morphological grounds (number, gender, person) Condition on insertions : an expression which is included in an insertion cannot be the antecedent of an anaphor located outside the insertion. Condition on insertions : an expression which is included in an insertion cannot be the antecedent of an anaphor located outside the insertion.
7
preferences Preferences, as opposed to constraints, can be violated by the antecedent candidates, they are used to rank the candidates. However those that verify the preferences are retained. The order in which they appear reflects their weight. Preferences, as opposed to constraints, can be violated by the antecedent candidates, they are used to rank the candidates. However those that verify the preferences are retained. The order in which they appear reflects their weight. Syntactic parallelism Syntactic parallelism Antecedent not occurring in a prep. phrase Antecedent not occurring in a prep. phrase Focus expressions Focus expressions Recency Recency
8
Some preferences Syntactic parallelism states that we prefer the antecedent that shares the same syntactic function as the anaphor. “ recognized the king, although has never met him before”. Syntactic parallelism states that we prefer the antecedent that shares the same syntactic function as the anaphor. “The child recognized the king, although he has never met him before”. An expression included in a prep. phrase is unlikely to be referred to because it only brings additional information. “ de la voisine bloque le passage, il faut déplacer» An expression included in a prep. phrase is unlikely to be referred to because it only brings additional information. “La voiture de la voisine bloque le passage, il faut la déplacer»
9
Focus expressions They identify the main theme, the focus of attention. Of the form : C’est NP qui … C’est NP qui … Il y a NP qui … Il y a NP qui …
10
appositions Fragments of sentences which can be eliminated without ‘altering’ the main meaning. Goal : eliminate candidates which occur inside. Mainly three forms : “,” “(“ Delimited by separators “,” “(“ : La dame, assise en face de Sarah, était anxieuse. Elle voulait prendre la parole.” Relative clauses : La dame qui discute avec Sarah est une voisine. Just one comma : Caesar, the roman emperor VERB …
11
discussion The algorithm realises a success rate of 68%. The evaluation has been carried out so far on more than 100 texts of reasonable size(1 page) from literary stories. The algorithm realises a success rate of 68%. The evaluation has been carried out so far on more than 100 texts of reasonable size(1 page) from literary stories. The results show that the resolution of pronouns such as il(s), elle(s), le, la is relatively successful (success rate of 93%). The results show that the resolution of pronouns such as il(s), elle(s), le, la is relatively successful (success rate of 93%). The insertion constraint tends to add more complexity in the implementation. The insertion constraint tends to add more complexity in the implementation.
12
Unresolved problems Multiple source anaphor :“ left early this morning. Multiple source anaphor :“ Sarah and Sofia left early this morning. They have an appointment at the university” Self referring expressions : ” Self referring expressions : ” everyone knows it, John is a good driver ” Reference to verb phrases, sentences : « On to we are vulnerable. The problem is to forget ”. Reference to verb phrases, sentences : « On two wheels we are vulnerable. The problem is to forget it ”.
13
conclusion The main idea of our work is to establish a link between nominal phrases that share similar context with constituents in the input text. The main idea of our work is to establish a link between nominal phrases that share similar context with constituents in the input text. It relies heavily on a morphosyntactic parser. The application of a set of constraints followed by a set of preferences provides an elegant modular, easy to update anaphora resolution algorithm. It relies heavily on a morphosyntactic parser. The application of a set of constraints followed by a set of preferences provides an elegant modular, easy to update anaphora resolution algorithm. Unfortunately, current state-of-the-art of practically applicable parsing technology still falls short of robust and reliable delivery of syntactic analysis of real texts to the level of detail and precision that most algorithms assume. Unfortunately, current state-of-the-art of practically applicable parsing technology still falls short of robust and reliable delivery of syntactic analysis of real texts to the level of detail and precision that most algorithms assume. Shallow parsing, on the other hand, can affect greatly the performance and the efficiency of the algorithm. Shallow parsing, on the other hand, can affect greatly the performance and the efficiency of the algorithm.
14
Related work Type of knowledge Success rate corpus Lappin & Leass Robust parser 75% Computer texts Kennedy et al. Shallow parser 85% Web documents Mitkov P.O.S. tagger 89.7% Manuel texts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.