Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.

Similar presentations


Presentation on theme: "Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman."— Presentation transcript:

1 Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman

2 Goal Improve on Soon et al. by 4 better preprocessing (chunking, names, …) 4 better search procedure for antecedent 4 better selection of positive examples 4 more features 4 more features...

3 Better search for antecedent 4 Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve 4 Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability 4 Ng&Cardie then take highest ranking antecedent (if probability > 0.5)

4 Better choice of positive examples 4 Soon et al. always use most recent antecedent 4 For Ng&Cardie, if anaphor is not a pronoun, they use most recent antecedent that is not a pronoun

5 More features #1 4 Soon et al. Have a ‘same string’ feature 4 Ng&Cardie split this up into 3 features, for pronominals, nominals, and names

6 First improvements: F scores

7 More features Added 41 more features: 4 lexical 4 grammatical 4 semantic

8 Lexical features (examples) 4 Non-empty overlap of words of two NPs 4 Prenominal modifiers of one NP are a subset of prenominal modifiers of other

9 Grammatical features (examples) 4 NPs are in predicate nominal construct 4 One NP spans the other 4 NP1 is a quoted string 4 One of the NPs is a title

10 Semantic features (examples) For nominals with different heads 4 direct or indirect hypernym relation in WordNet 4 distance of hypernym relation 4 sense number for hypernym relation

11 Selecting features 4 Full feature set yielded very low precision on nominal anaphors overtraining: too many features for too little data 4 So they (manually) eliminated many features which led to low precision (on training data) no ‘development set’ separate from training and test sets

12 Adding features: F scores


Download ppt "Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman."

Similar presentations


Ads by Google