Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head lexicalization [Collins 99, Charniak 00] Automatic clustering?

Previous Work: Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated [Klein & Manning 03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning 0386.3

Previous Work: Automatic Annotation Induction Advantages: Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. Disadvantages: Grammar gets too large Most categories are oversplit while others are undersplit. [Matsuzaki et. al 05, Prescher 05] ModelF1 Klein & Manning 0386.3 Matsuzaki et al. 0586.7

[Petrov, Barrett, Thibaux & Klein in ACL06] [Petrov & Klein in NAACL07] Overview Learning: Hierarchical Training Adaptive Splitting Parameter Smoothing Inference: Coarse-To-Fine Decoding Variational Approximation German Analysis

Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward

Starting Point Limit of computational resources

Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

Refinement of the DT tag DT

Hierarchical Refinement of the DT tag DT

Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

Refinement of the, tag Splitting all categories the same amount is wasteful:

The DT tag revisited Oversplit?

Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.

Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

Number of Phrasal Subcategories

Number of Lexical Subcategories

Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics

Linear Smoothing

ModelF1 Previous89.5 With Smoothing90.7 Result Overview

Coarse-to-Fine Parsing [Goodman 97, Charniak&Johnson 05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

Hierarchical Pruning Consider the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0 (G) 1 (G) 2 (G) 3 (G) 4 (G) 5 (G) G

Bracket Posteriors (after G 0 )

Bracket Posteriors (after G 1 )

Bracket Posteriors (Movie)(Final Chart)

Bracket Posteriors (Best Tree)

Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function / Variational Approximation. Parses: -2 Derivations: -2 -2 -2

Efficiency Results Berkeley Parser: 15 min Implemented in Java Charniak & Johnson 05 Parser 19 min Implemented in C

Accuracy Results 40 words F1 all F1 ENG Charniak&Johnson 05 (generative)90.189.6 This Work90.690.1 GER Dubey 0576.3- This Work80.880.1 CHN Chiang et al. 0280.076.6 This Work86.383.4

Parsing German Shared Task Two Pass Parsing Determine constituency structure (F1: 85/94) Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

Parsing German Shared Task Two Pass Parsing Determine constituency structure Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

Development Set Results

Shared Task Results

Part-of-speech splits

Linguistic Candy

Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing

Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.

Similar presentations

Presentation on theme: "Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.

Similar presentations

Presentation on theme: "Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback