Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley.

Slides:

Advertisements

Similar presentations

Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.

Advertisements

Dirichlet Processes in Dialogue Modelling

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Statistical NLP Winter 2009

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.

1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)

Preliminary Results (Synthetic Data) We generate a random 4-ary MRF and we sample training and test data. We forget the structure and start learning with.

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!

Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.

Feature Selection for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan University ( 台灣大學資訊工程系 )

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.

0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.

Dependency Parsing Some slides are based on:

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

What is the internet? 

Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.

ADVANCED PARSING David Kauchak CS159 – Fall 2014 some slides adapted from Dan Klein.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

Statistical NLP Spring 2010 Lecture 14: PCFGs Dan Klein – UC Berkeley.

CSA2050 Introduction to Computational Linguistics Parsing I.

Hidden-Variable Models for Discriminative Reranking Jiawen, Liu Spoken Language Processing Lab, CSIE National Taiwan Normal University Reference: Hidden-Variable.

Supertagging CMSC Natural Language Processing January 31, 2006.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

PRESENTED BY: PEAR A BHUIYAN

CSE 517 Natural Language Processing Winter 2015

Authorship Attribution Using Probabilistic Context-Free Grammars

CSC 594 Topics in AI – Natural Language Processing

Statistical NLP Spring 2011

Statistical NLP Spring 2011

Machine Learning in Natural Language Processing

LING/C SC 581: Advanced Computational Linguistics

David Kauchak CS159 – Spring 2019

Presentation transcript:

Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head lexicalization [Collins 99, Charniak 00] Automatic clustering?

Previous Work: Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated [Klein & Manning 03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning

Previous Work: Automatic Annotation Induction Advantages: Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. Disadvantages: Grammar gets too large Most categories are oversplit while others are undersplit. [Matsuzaki et. al 05, Prescher 05] ModelF1 Klein & Manning Matsuzaki et al

[Petrov, Barrett, Thibaux & Klein in ACL06] [Petrov & Klein in NAACL07] Overview Learning: Hierarchical Training Adaptive Splitting Parameter Smoothing Inference: Coarse-To-Fine Decoding Variational Approximation German Analysis

Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward

Starting Point Limit of computational resources

Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

Refinement of the DT tag DT

Hierarchical Refinement of the DT tag DT

Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

Refinement of the, tag Splitting all categories the same amount is wasteful:

The DT tag revisited Oversplit?

Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.

Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

Number of Phrasal Subcategories

Number of Lexical Subcategories

Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics

Linear Smoothing

ModelF1 Previous89.5 With Smoothing90.7 Result Overview

Coarse-to-Fine Parsing [Goodman 97, Charniak&Johnson 05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

Hierarchical Pruning Consider the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0 (G) 1 (G) 2 (G) 3 (G) 4 (G) 5 (G) G

Bracket Posteriors (after G 0 )

Bracket Posteriors (after G 1 )

Bracket Posteriors (Movie)(Final Chart)

Bracket Posteriors (Best Tree)

Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function / Variational Approximation. Parses: -2 Derivations:

Efficiency Results Berkeley Parser: 15 min Implemented in Java Charniak & Johnson 05 Parser 19 min Implemented in C

Accuracy Results 40 words F1 all F1 ENG Charniak&Johnson 05 (generative) This Work GER Dubey This Work CHN Chiang et al This Work

Parsing German Shared Task Two Pass Parsing Determine constituency structure (F1: 85/94) Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

Parsing German Shared Task Two Pass Parsing Determine constituency structure Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

Development Set Results

Shared Task Results

Part-of-speech splits

Linguistic Candy

Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing

Thank You! Parser is avaliable at