Download presentation
Presentation is loading. Please wait.
1
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman
2
Goal Improve on Soon et al. by 4 better preprocessing (chunking, names, …) 4 better search procedure for antecedent 4 better selection of positive examples 4 more features 4 more features...
3
Better search for antecedent 4 Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve 4 Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability 4 Ng&Cardie then take highest ranking antecedent (if probability > 0.5)
4
Better choice of positive examples 4 Soon et al. always use most recent antecedent 4 For Ng&Cardie, if anaphor is not a pronoun, they use most recent antecedent that is not a pronoun
5
More features #1 4 Soon et al. Have a ‘same string’ feature 4 Ng&Cardie split this up into 3 features, for pronominals, nominals, and names
6
First improvements: F scores
7
More features Added 41 more features: 4 lexical 4 grammatical 4 semantic
8
Lexical features (examples) 4 Non-empty overlap of words of two NPs 4 Prenominal modifiers of one NP are a subset of prenominal modifiers of other
9
Grammatical features (examples) 4 NPs are in predicate nominal construct 4 One NP spans the other 4 NP1 is a quoted string 4 One of the NPs is a title
10
Semantic features (examples) For nominals with different heads 4 direct or indirect hypernym relation in WordNet 4 distance of hypernym relation 4 sense number for hypernym relation
11
Selecting features 4 Full feature set yielded very low precision on nominal anaphors overtraining: too many features for too little data 4 So they (manually) eliminated many features which led to low precision (on training data) no ‘development set’ separate from training and test sets
12
Adding features: F scores
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.