Download presentation
Presentation is loading. Please wait.
Published byMerry Leona Blair Modified over 6 years ago
1
System Combination LING 572 Fei Xia 01/31/06
2
Papers (Henderson and Brill, EMNLP-1999): Exploiting Diversity in NLP: Combining Parsers (Henderson and Brill, ANLP-2000): Bagging and Boosting a Treebank Parser
3
Task f1 f2 fm f …
4
Paper #1 ML1 f1 ML2 f2 f MLm fm
5
Paper #2: bagging ML f1 ML f2 f ML fm
6
Combining parsers
7
Scenario ML1 f1 ML2 f2 f MLm fm
8
Three parsers Collins (1997) Charniak (1997) Ratnaparkhi (1997)
9
Major strategies Parse hybridization: combine substructures of the input parses to produce a better parse. Parser switching: for each x, f(x) is one of the fi(x)
10
Parse hybridization: Method 1
Constituent voting: Include a constituent if it appears in the output of a majority of the parsers. It requires no training. All parsers are treated equally.
11
Parse hybridization: Method 2
Naïve Bayes Y=π(c) is a binary function return true when c should be included in the hyp Xi=Mi(c) is a binary function return true when parser i suggests c should be in the parse
12
Parse hybridization If the number of votes required by constituent voting is greater than half of the parsers, the resulting structure has no crossing constituents. What will happen if the input parsers disagree often?
13
Parser switching: Method 1
Similarity switching Intuition: choose the parse that is most similar to the other parses. Algorithm: For each parse πi, create the constituent set Si. The score for πi is Choose the parse with the highest score. No training is required.
14
Parser switching: Method 2
Naïve Bayes
15
Experiments Training data: WSJ except sections 22 and 23
Development data: Section 23 For training Naïve Bayes Test data: Section 22
16
Parsing results
17
Robustness testing Add a 4th parser: F-measure about 67.6
90.43 90.74 91.25 Add a 4th parser: F-measure about 67.6 Performance remains the same except for constituent voting
18
Summary of 1st paper Combining parsers produces good results:
89.67% 91.25% Different methods of combining: Parse hybridization Constituent voting Naïve Bayes Parser switching Similarity switching
19
Bagging and Boosting a Treebank Parser
20
Experiment settings Parser: Collins’s Model 2 (1997)
Training data: sections 01-21 Test data: Section 23
21
Bagging f1 f2 f {(s,t)} fm Combining method: constituent voting ML ML
22
Experiment results Baseline (no bagging): 88.63
Initial (one bag): Final (15 bags):
23
Training corpus size effects
24
… Boosting f f1 f2 fT ML Training Sample ML Weighted Sample ML
25
Boosting results Boosting does not help: 88.84
26
Summary Combining parsers produces good results: 89.67% 91.25%
Bagging helps: 88.63% 89.17% Boosting does not help (in this case): % 88.84%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.