Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman
Goal Improve on Soon et al. by 4 better preprocessing (chunking, names, …) 4 better search procedure for antecedent 4 better selection of positive examples 4 more features 4 more features...
Better search for antecedent 4 Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve 4 Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability 4 Ng&Cardie then take highest ranking antecedent (if probability > 0.5)
Better choice of positive examples 4 Soon et al. always use most recent antecedent 4 For Ng&Cardie, if anaphor is not a pronoun, they use most recent antecedent that is not a pronoun
More features #1 4 Soon et al. Have a ‘same string’ feature 4 Ng&Cardie split this up into 3 features, for pronominals, nominals, and names
First improvements: F scores
More features Added 41 more features: 4 lexical 4 grammatical 4 semantic
Lexical features (examples) 4 Non-empty overlap of words of two NPs 4 Prenominal modifiers of one NP are a subset of prenominal modifiers of other
Grammatical features (examples) 4 NPs are in predicate nominal construct 4 One NP spans the other 4 NP1 is a quoted string 4 One of the NPs is a title
Semantic features (examples) For nominals with different heads 4 direct or indirect hypernym relation in WordNet 4 distance of hypernym relation 4 sense number for hypernym relation
Selecting features 4 Full feature set yielded very low precision on nominal anaphors overtraining: too many features for too little data 4 So they (manually) eliminated many features which led to low precision (on training data) no ‘development set’ separate from training and test sets
Adding features: F scores