Detecting Anaphoricity and Antecedenthood for Coreference Resolution Olga Uryupina Institute of Linguistics, RAS
Overview Anaphoricity and Antecedenthood Experiments Incorporating A&A detectors into a CR system Conclusion
A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.
A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.
Anaphoricity Likely anaphors: - pronouns, definite descriptions Unlikely anaphors: - indefinites Unknown: - proper names Poesio&Vieira: more than 50% of definite descriptions in a newswire text are not anaphoric!
A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.
A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.
Antecedenthood Related to referentiality (Karttunen, 1976): „no debt“ etc Antecedenthood vs. Referentiality: corpus-based decision
Experiments Can we learn anaphoricity/antecedenthood classifiers? Do they help for coreference resolution?
Methodology MUC-7 dataset Anaphoricity/antecedenthood induced from the MUC annotations Ripper, SVM
Features Surface form (12) Syntax (20) Semantics (3) Salience (10) „same-head“ (2) From Karttunen, 1976 (7) 49 features – 123 boolean/continuous
Results: anaphoricity Feature groupsRPF Baseline All Surface Syntax Semantics Salience Same-head Karttunen‘s Synt+SH
Results: antecedenthood Feature groupsRPF Baseline All Surface Syntax Semantics Salience Same-head Karttunen‘s
Integrating A&A into a CR system Apply an A&A prefiltering before CR starts: -Saves time -Improves precision Problem: we can filter out good candidates..: - Will loose some recall
Oracle-based A&A prefiltering Take MUC-based A&A classifier („gold standard“ CR system: Soon et al. (2001) with SVMs MUC-7 validation set (3 „training“ documents)
Oracle-based A&A prefiltering RPF No prefilteing ±ana ±ante ±ana & ±ante
Automatically induced classifiers Precision more crucial than Recall Learn Ripper classifiers with different Ls (Loss Ratio)
Anaphoricity prefiltering
Antecedenthood prefiltering
Conclusion Automatically induced detectors: Reliable for anaphoricity Much less reliable for antecedenthood (a corpus, explicitly annotated for referentiality could help) A&A prefiltering: Ideally, should help In practice – substantial optimization required
Thank You!