Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-supervised Training of Statistical Parsers CMSC 35100 Natural Language Processing January 26, 2006.

Similar presentations


Presentation on theme: "Semi-supervised Training of Statistical Parsers CMSC 35100 Natural Language Processing January 26, 2006."— Presentation transcript:

1 Semi-supervised Training of Statistical Parsers CMSC 35100 Natural Language Processing January 26, 2006

2 Roadmap Motivation: –Resource Bottleneck Co-training Co-training with different parsers –CFG & LTAG Experiments: –Initial seed set size –Parse selection –Domain porting Results and discussion

3 Motivation: Issues Current statistical parsers –Many grammatical models –Significant progress: F-score ~ 93% Issues: –Trained on ~1M words Penn WSJ treebank Annotation: significant investment: time & money –Portability: Single genre – business news Later treebanks – smaller, still news –Training resource bottleneck

4 Motivation: Approach Goal: –Enhance portability, performance without large amounts of additional training data Observations: –“Self-training”: Train parser on own output Very small improvement (better counts for heads) Limited to slightly refining current model –Ensemble methods, voting: useful Approach: Co-training

5 Co-Training Co-Training (Blum & Mitchell 1998) –Weakly supervised training technique Successful for basic classification –Materials Small “seed” set of labeled examples Large set of unlabeled examples –Training: Evidence from multiple models –Optimize degree of agreement b/t models on unlabeled data Train several models on seed data Run on unlabeled data Use new “reliable” labeled examples to train others Iterate

6 Co-training Issues Challenge: –Picking reliable novel examples No guaranteed, simple approach Rely on heuristics –Intersection: Highly ranked by other; low by self –Difference: Score by other exceeds self by some margin Possibly employ parser confidence measures

7 Experimental Structure Approach (Steedman et al, 2003) –Focus here: Co-training with different parsers Also examined reranking, supertaggers &parsers Co-train CFG (Collins) & LTAG Data: Penn Treebank WSJ, Brown, NA News Questions: –How select reliable novel samples? –How does labeled seed size affect co-training? –How effective in co-training w/in, across genre?

8 System Architecture Two “different” parsers –“Views” – can be different by feature space Here Collins CFG & LTAG –Comparable performance, different formalisms Cache Manager –Draws labeled sentences for parsers to label –Selects subset of newly labeled to training set

9 Two Different Parsers Both train on treebank input –Lexicalized, head information percolated Collins-CFG –Lexicalized CFG parser “Bi-lexical”: each pair of non-terminals leads to bigram relation b/t pair of lexical items Ph= head percolation; Pm=modifiers of head dtr LTAG: –Lexicalized TAG parser Bigram relations b/t trees Ps=substitution probability; Pa=adjunction probability Different in tree creation and lexical reln depth

10 Selecting Labeled Examples Scoring the parse –Ideal – true – score impossible F-prob: trust the parser; F-norm-prob: norm by len F-entropy: Diff b/t parse score distr and uniform –Baseline: # of parses, sentence length Selecting (newly labeled) sentences –Goal: minimize noise, maximize training utility S-base: n highest scores (both parsers use same) Asymmetric: teacher/student –S-topn: teacher’s top n –S-intersect: sentences in teacher’s top n, student’s bottom n –S-diff: teacher’s score higher than student’s by some amount

11 Experiments: Initial Seed Size Typically evaluate after all training Consider convergence rate –Initial rapid growth – tailing off w/more –Largest improvement: 500-1000 instances Collins-CFG plateaus at 40K (89.3) LTAG still improving –Will benefit from additional training Co-training w/500 vs 1000 instances –Less data, greater benefit Enhance coverage –However, 500 seed doesn’t reach level of 1000 seed

12

13 Experiments: Parse Selection Contrast: –Select-all newly labeled vs S-intersect (67%) Co-training experiments: –500 seed set –LTAG performs better w/S-intersect Reduces noise, LTAG sensitive to noisy trees –CFG performs better w/S-select-all CFG needs to increase coverage, more samples

14

15

16

17 Experiments: Cross-domain Train on Brown corpus -1000 seed –Co-train on WSJ –CFG, w/s-intersect improves, 76.6-> 78.3 Mostly in first 5 iterations –Lexicalizing for new domain vocab Train on Brown + 100 WSJ seed –Co-train on other WSJ –Base improves to 78.7, co-train to 80 Gradual improvement, new constructs?

18 Summary Semi-supervised parser training –Co-training Two different parse formalisms provide diff’t views –Enhances effectiveness Biggest gains with small seed sets Cross-domain enhancement –Selection methods dependent on Parse model, amount of seed data

19 Findings Co-training enhances parsing when trained on small datasets: 500-10000 sentences Co-training aids genre porting w/o labels Co-training improved w/ANY labels for genre Approaches for crucial sample selection


Download ppt "Semi-supervised Training of Statistical Parsers CMSC 35100 Natural Language Processing January 26, 2006."

Similar presentations


Ads by Google