Download presentation
Presentation is loading. Please wait.
Published byValerie Lawson Modified over 9 years ago
1
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute of Science and Technology {ryu-i,inui,matsu}@is.naist.jp NLP-KE’05, October 30, 2005
2
2 Noun phrase anaphora resolution Anaphora resolution is the process of determining whether two expressions in natural language refer to the same real world entity Important process for various NLP applications : machine translation, information extraction, question answering A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. antecedentanaphor
3
3 Anaphora resolution can be decomposed into two sub processes 1. Anaphoricity determination is the task of classifying whether a given noun phrase (NP) is anaphoric or non- anaphoric 2. Antecedent identification is the identification of the antecedent of a given anaphoric NP Noun phrase anaphora resolution A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. antecedentanaphor non-anaphor
4
4 Previous work Early corpus-based work on anaphora resolution does not address anaphoricity determination (Hobbs `78, Lappin and Leass `94) Assuming that the anaphora resolution system knows a priori all the anaphoric noun phrases This problem has been paid attention by an increasing number of researchers (Bean and Riloff `99, Ng and Cardie `02, Uryupina `03, Ng `04) Determining anaphoricity is not a trivial problem Overall performance of anaphora resolution crucially depends on the accuracy of anaphoricity determination
5
5 Previous work (Cont’d) Previous efforts to tackle anaphoricity determination problem have provided the two findings 1.One useful cue for determining anaphoricity of a given NP can be obtained by searching for an antecedent (Soon et al. 01, Ng and Cardie 02a) 2.Anaphoricity determination can be effectively carried out by a binary classifier that learns instances of non- anaphoric NPs (Ng and Cardie 02b, Ng 04) None of the previous models effectively combines the strengths of these findings
6
6 Aim Improving anaphora resolution performance : Using better anaphoricity determination Combining sources of evidence from previous models
7
7 Proposal Introducing a 2-step process for combining antecedent information and non-anaphoric information We call this model the selection-and-classification model 1.Select the most likely candidate antecedent (CA) of a target NP (TNP) using the tournament model (Iida et al. `03) 2.Classify a TNP paired with CA is classified as anaphoric if CA is identified as the antecedent of TNP; otherwise TNP is judged non-anaphoric
8
8 2-step process for anaphora resolution A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, … candidate anaphor tournament model USAir suit USAir Group Inc order federal judge candidate anaphor candidate antecedents …
9
9 2-step process for anaphora resolution A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, … candidate anaphor tournament model USAir suit USAir Group Inc order federal judge candidate anaphor candidate antecedents … USAir Group Inc USAir suit USAir Group Inc Federal judge candidate anaphor candidate antecedents … order
10
10 2-step process for anaphora resolution USAir Group Inc candidate antecedent A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, … candidate anaphor tournament model USAir suit USAir Group Inc order federal judge candidate anaphor candidate antecedents … Anaphoricity determination model is non-anaphoric USAir score θ ana score θ ana is anaphoric and is the USAir USAir Group Inc antecedent of USAir Group Inc USAir
11
11 Training phase Anaphoric Non-anaphoric NANP NP5 NP4 NP3 NP2 NP1 Non-anaphoric NP set of candidate antecedents NP3 tournament model candidate antecedent Non-anaphoric instances NP3NANP ANP NP5 NP4 NP3 NP2 NP1 Anaphoric NP set of candidate antecedents Antecedent Anaphoric instances NP4ANP NPi: candidate antecedent
12
12 Comparison with previous approaches 1. Search-based approach (SM) (Soon et al. `01, Ng and Cardie `02) Recasting anaphora resolution as binary classification problems Comparable to the state-of-the-art rule-based system disadvantage: not use non-anaphoric instances in training 2. Classification-and-search approach (CSM) (Ng and Cardie `02, Ng `04) Introducing anaphoricity determination as a classification task The performance of the CSM is better than the SM if the threshold parameters are appropriately tuned disadvantage: not use the contextual information (i.e. whether an appropriate antecedent appears on the context)
13
13 Experiments Noun phrase anaphora resolution in Japanese Japanese newspaper article corpus tagged NP- anaphoric relations 90 text, 1,104 sentences Noun phrases : 876 anaphors and 6,292 non-anaphors Recall = Precision = # of correctly detected anaphoric relations # of anaphoric NPs # of correctly detected anaphoric relations # of NPs classified as anaphoric
14
14 Experimental setting Conduct 10-fold cross-validation with support vector machines Comparison among three models 1. Search-based model (Ng and Cardie `02) 2. Classification-and-Search model (Ng and Cardie `04) 3. Selection-and-Classification model (Proposed model) using the tournament model (Iida et al. `03)
15
15 Results of noun phrase anaphora resolution Proposed model Search-based model Classification-and- search model Search-based model (SM) vs. Classification-and-search model (CSM) the performance of CSM is significantly better than the SM
16
16 Results of noun phrase anaphora resolution Proposed model Search-based model Classification-and- search model Classification-and-search model (CSM) vs. Proposed model the proposed model outperforms the CSM in the higher-recall portion
17
17 Conclusion Our selection-and-classification approach to anaphora resolution improves on the performance of previous learning-based models by combining their advantages 1.Our model uses non-anaphoric instances together with anaphoric instances to induce anaphoricity classifier 2.Our model determines the anaphoricity of a given NP by taking antecedent information into account
18
18 Future work The majority of errors are caused by the difficulty of judging the semantic compatibility e.g.) the system outputs that “ ani (elder brother)” is anaphoric with “ kanojo (she)” The lexical resource we employed in the experiments did not contain gender information D eveloping a lexical resource which includes a broad range of semantic compatible relations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.