Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.

Similar presentations


Presentation on theme: "Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley."— Presentation transcript:

1 Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

2 The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head lexicalization [Collins 99, Charniak 00] Automatic clustering?

3 Previous Work: Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated [Klein & Manning 03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning 0386.3

4 Previous Work: Automatic Annotation Induction Advantages: Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. Disadvantages: Grammar gets too large Most categories are oversplit while others are undersplit. [Matsuzaki et. al 05, Prescher 05] ModelF1 Klein & Manning 0386.3 Matsuzaki et al. 0586.7

5 [Petrov, Barrett, Thibaux & Klein in ACL06] [Petrov & Klein in NAACL07] Overview Learning: Hierarchical Training Adaptive Splitting Parameter Smoothing Inference: Coarse-To-Fine Decoding Variational Approximation German Analysis

6 Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward

7 Starting Point Limit of computational resources

8 Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

9 Refinement of the DT tag DT

10 Hierarchical Refinement of the DT tag DT

11 Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

12 Refinement of the, tag Splitting all categories the same amount is wasteful:

13 The DT tag revisited Oversplit?

14 Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

15 Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

16 Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.

17 Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

18 Number of Phrasal Subcategories

19 Number of Lexical Subcategories

20 Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics

21 Linear Smoothing

22 ModelF1 Previous89.5 With Smoothing90.7 Result Overview

23 Coarse-to-Fine Parsing [Goodman 97, Charniak&Johnson 05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

24 Hierarchical Pruning Consider the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

25 Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

26 State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

27 G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0 (G) 1 (G) 2 (G) 3 (G) 4 (G) 5 (G) G

28 Bracket Posteriors (after G 0 )

29 Bracket Posteriors (after G 1 )

30 Bracket Posteriors (Movie)(Final Chart)

31 Bracket Posteriors (Best Tree)

32 Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function / Variational Approximation. Parses: -2 Derivations: -2 -2 -2

33 Efficiency Results Berkeley Parser: 15 min Implemented in Java Charniak & Johnson 05 Parser 19 min Implemented in C

34 Accuracy Results 40 words F1 all F1 ENG Charniak&Johnson 05 (generative)90.189.6 This Work90.690.1 GER Dubey 0576.3- This Work80.880.1 CHN Chiang et al. 0280.076.6 This Work86.383.4

35 Parsing German Shared Task Two Pass Parsing Determine constituency structure (F1: 85/94) Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

36 Parsing German Shared Task Two Pass Parsing Determine constituency structure Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

37 Development Set Results

38 Shared Task Results

39 Part-of-speech splits

40 Linguistic Candy

41 Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing

42 Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu


Download ppt "Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley."

Similar presentations


Ads by Google