Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.

Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.

Authors Sebastian Riedel and James Clarke Paper review by Anusha Buchireddygari Incremental Integer Linear Programming for Non-projective Dependency Parsing.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

HPSG parser development at U-tokyo Takuya Matsuzaki University of Tokyo.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Part-of-speech tagging and chunking with log-linear models University of Manchester Yoshimasa Tsuruoka.

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Part-of-speech tagging and chunking with log-linear models University of Manchester National Centre for Text Mining (NaCTeM) Yoshimasa Tsuruoka.

Part-of-Speech Tagging & Sequence Labeling

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

Albert Gatt Corpora and Statistical Methods Lecture 11.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

CSA2050 Introduction to Computational Linguistics Parsing I.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

Supertagging CMSC Natural Language Processing January 31, 2006.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Question Classification Ling573 NLP Systems and Applications April 25, 2013.

Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.

CSC 594 Topics in AI – Natural Language Processing

COSC 6336 Natural Language Processing Statistical Parsing

PRESENTED BY: PEAR A BHUIYAN

CS 388: Natural Language Processing: Statistical Parsing

LING/C SC 581: Advanced Computational Linguistics

CS4705 Natural Language Processing

David Kauchak CS159 – Spring 2019

Artificial Intelligence 2004 Speech & Natural Language Processing

Presentation transcript:

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo

Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG P(NN|a/NN, pretty)

Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG

Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG

Implicit assumption Processing state = Primitive probability –Efficient algorithm for searching –Avoid exponential explosion of ambiguities NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG A pretty girl is crying POS tag = processing state = primitive probability

The assumption is right? Ex.) Shallow parsing, NE recognition

The assumption is right? Ex.) Shallow parsing, NE recognition NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

The assumption is right? Ex.) Shallow parsing, NE recognition –B(Begin), I(Internal), O(Other) tags are introduced to represent multi-word tags NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

The assumption is right? Ex.) Syntactic parsing

The assumption is right? Ex.) Syntactic parsing What do you want to give? VP S S S P(VP|VPto give)

The assumption is right? Ex.) Syntactic parsing –Non-local dependencies are not represented What do you want to give? VP S S S P(VP|VPto give)

Problem of existing models Processing state Primitive probability

Problem of existing models Processing state Primitive probability How to model the probability of ambiguous structures with more flexibility?

Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing What do you want to give? VP S S S

Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing what do you want to give ARG1 ARG2 MODIFY ARG2

Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing Probability of argument structures what do you want to give ARG1 ARG2 MODIFY ARG2

Problem Complete structures have exponentially many ambiguities NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP Exponentially many sequences

Proposal Feature forest model [Miyao and Tsujii, 2002]

Proposal Feature forest model [Miyao and Tsujii, 2002] Conjunctive node Disjunctive node Features Exponentially many trees are packed Features are assigned to each conjunctive node

Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002] When unpacking the forest, the model is equivalent to maximum entropy models [Berger et al., 1996]

Application to parsing Applying a feature forest model to disambiguation of argument structures

Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest?

Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest? –Argument structures are not trees, but DAGs (including reentrant structures)

want ARG1 ARG2 I argue1 1 ARG1 1 fact ARG1 want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact Packing argument structures An example including reentrant structures She neglected the fact that I wanted to argue.

I Packing argument structures She neglected the fact that I wanted to argue.

want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated She neglected the fact that I wanted to argue. I

want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I

Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I

want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 ? Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I

Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I

Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact

Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I

Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I A1 want

Feature forest representation of argument structures fact A1 want fact argue2 A1 A2 want A1 A2 argue1 I A1 She neglected the fact that I wanted to argue. I argue1 I want A1 A2 argue2 I fact I she neglect A1 A2 fact she Conjunctive nodes correspond to argument structures whose arguments are all instantiated

Experiments Grammar: a treebank grammar of HPSG [Miyao and Tsujii, 2003] –Extracted from the Penn Treebank [Marcus et al., 1994] Section Training: Section of the Penn Treebank Test: sentences from Section 22 covered by the grammar Measure: Accuracy of dependencies in argument structures

Experiments Features: the combinations of –Surface strings/POS –Labels of dependencies (ARG1, ARG2, …) –Labels of lexical entries (head noun, transitive, …) –Distance Estimation algorithm: Limited-memory BFGS algorithm [Nocedal, 1980] with MAP estimation [Chen & Rosenfeld, 1999]

Preliminary results Estimation time: 143 min. Accuracy (precision/recall): exactpartial Baseline48.1 / / 56.2 Unigram77.3 / / 81.3 Feature forest85.5 / / 88.2

Conclusion Feature forest models allow the probabilistic modeling of complete structures without exponential explosion The application to syntactic parsing resulted in the high accuracy

Ongoing work Refinement of the grammar and tuning of estimation parameters Development of efficient algorithms for best-first/beam search