Raymond J. Mooney University of Texas at Austin

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
Grammar and Algorithm }
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Dependency Parsing Some slides are based on:
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Supertagging CMSC Natural Language Processing January 31, 2006.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.
Formal Languages, Automata and Models of Computation
Convolutional Sequence to Sequence Learning
Parsing #1 Leonidas Fegaras.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Learning for Bacteria Event Identification
CSC 594 Topics in AI – Natural Language Processing
Programming Languages Translator
Parsing and Parser Parsing methods: top-down & bottom-up
Textbook:Modern Compiler Design
Recursive Neural Networks
Parsing IV Bottom-up Parsing
Basic Parsing with Context Free Grammars Chapter 13
David Mareček and Zdeněk Žabokrtský
Authorship Attribution Using Probabilistic Context-Free Grammars
Semantic Parsing for Question Answering
Table-driven parsing Parsing performed by a finite state machine.
Top-down parsing cannot be performed on left recursive grammars.
Dependency Parsing & Feature-based Parsing
Parsing Techniques.
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
Subject Name:COMPILER DESIGN Subject Code:10CS63
Regular Grammar - Finite Automaton
Lexical and Syntax Analysis
CS 388: Natural Language Processing: Syntactic Parsing
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
Compiler Design 7. Top-Down Table-Driven Parsing
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Parsing IV Bottom-up Parsing
CSCI 5832 Natural Language Processing
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Kanat Bolazar February 16, 2010
CSCI 5832 Natural Language Processing
Parsing I: CFGs & the Earley Parser
Probabilistic Parsing
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural Joint Model for Transition-based Chinese Syntactic Analysis
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Raymond J. Mooney University of Texas at Austin CS 388: Natural Language Processing: Neural Shift-Reduce Dependency Parsing Raymond J. Mooney University of Texas at Austin

Shift Reduce Parser Deterministically builds a parse incrementally, bottom up, and left to right, without backtracking. Maintains buffer of input words and a stack of constructed constituents. Perform sequence of operations/actions: Shift: Push the next word in the buffer onto the stack. Reduce: Replace a set of the top elements on the stack with a constituent composed of them.

Sample Parse of “Bob eats pasta” Buffer: Bob eats pasta Stack

Sample Parse of “Bob eats pasta” Action: Shift Buffer: eats pasta Stack Bob

Sample Parse of “Bob eats pasta” Action: Reduce(Bob  NP) Buffer: eats pasta Stack (NP Bob)

Sample Parse of “Bob eats pasta” Action: Shift Buffer: pasta Stack eats (NP Bob)

Sample Parse of “Bob eats pasta” Action: Reduce(eats VB) Buffer: pasta Stack (VB eats) (NP Bob)

Sample Parse of “Bob eats pasta” Action: Shift Buffer: Stack pasta (VB eats) (NP Bob)

Sample Parse of “Bob eats pasta” Action: Reduce(pasta  NP) Buffer: Stack (NP pasta) (VB eats) (NP Bob)

Sample Parse of “Bob eats pasta” Action: Reduce(VB NP  VP) Buffer: Stack (VP (VB eats)(NP pasta)) (NP Bob)

Sample Parse of “Bob eats pasta” Action: Reduce(S  NP VP) Buffer: Stack (S (NP Bob) (VP (VB eats)(NP pasta)))

Shift Reduce Parsing Must use “look ahead” to use next words in the buffer to pick the correct action. Originally introduced to parse programming languages which are DCFLs. Use for NLP requires heuristics to pick an action at each step which (due to ambiguity) could be wrong, resulting in a “garden path.” Can perform backup when an impasse is reached in order to search for a parse.

Shift-Reduce Dependency Parser Easily adapted to dependency parsing by using reduce operators that introduce dependency arcs. In addition to a stack and buffer, maintain a set of dependency arcs created.

Arc-Standard System (Nivre, 2004) Buffer b = [b1, b2,… bn] Stack s = [s1, s2,… sm] Arcs A = {label(wi, wj), …} Configuration c = (s, b, A) Initial Config: ([ROOT], [w1, w2, … wn], {}) Final Config: ([ROOT], [], {label(wi, wj), …})

Arc Standard Actions

Sample Parse of “He has good control” Stack Arcs Buffer: [He, has, good, control] ROOT

Sample Parse of “He has good control” Action: Shift Stack Arcs Buffer: [has, good, control] He ROOT

Sample Parse of “He has good control” Action: Shift Stack Arcs Buffer: [good, control] has He ROOT

Sample Parse of “He has good control” Action: LeftArc(nsubj) Stack Arcs Buffer: [good, control] nsubj(has,He) has ROOT

Sample Parse of “He has good control” Action: Shift Stack Arcs Buffer: [control] good nsubj(has,He) has ROOT

Sample Parse of “He has good control” Action: Shift Stack Arcs Buffer: [] control nsubj(has,He) good has ROOT

Sample Parse of “He has good control” Action: LeftArc(amod) Stack Arcs Buffer: [] control nsubj(has,He) amod(control,good) has ROOT

Sample Parse of “He has good control” Action: RightArc(dobj) Stack Arcs Buffer: [] has nsubj(has,He) amod(control,good) ROOT dobj(has,control)

Sample Parse of “He has good control” Action: RightArc(root) Stack Arcs Buffer: [] ROOT nsubj(has,He) amod(control,good) dobj(has,control) root(ROOT,has)

Stanford Neural Dependency Parser (Chen and Manning, 2014) Train a neural net to choose the best shift-reduce parser action to take at each step. Uses features (words, POS tags, arc labels) extracted from the current stack, buffer, and arcs as context. History (thru citation trail): Neural shift-reduce parser (Mayberry & Miikkulainen, 1999) Decision-tree shift-reduce parser (Hermjakob & Mooney, 1997) Simple learned shift-reduce parser (Simmons & Yu, 1992)

Parse action classification Neural Architecture Parse action classification

Context Features Used (rc = right-child, lc=left-child) The top 3 words on the stack and buffer: s1; s2; s3; b1; b2; b3; The first and second leftmost / rightmost children of the top two words on the stack: lc1(si); rc1(si); lc2(si); rc2 (si), i = 1; 2. The leftmost-of-leftmost and rightmost-of-rightmost children of the top two words on the stack: lc1(lc1(si)); rc1(rc1(si)), i = 1; 2. Also include the POS tag and parent arc label (where available) for these same items.

Input Embeddings Instead of using one-hot input encodings, words and POS tags are “embedded” in a 50 dimensional set of input features. Embedding POS tags is unusual since there are relatively few; however, it allows similar tags (e.g. NN and NNS) to have similar embeddings and thereby behave similarly.

Cube Activation Function Alternative non-linear output function instead of sigmoid (softmax) or tanh. Allows modeling the product terms of xixjxk for any three different input elements. Based on previous empirical results, capturing interactions of three elements seems important for shift-reduce dependency parsing.

Training Data Automatically construct dependency parses from treebank phrase-structure parse trees. Compute correct sequence of “oracle” shift-reduce parse actions (transitions, ti) at each step from gold-standard parse trees. Determine correct parse sequence by using a “shortest stack” oracle which always prefers LeftArc over Shift.

Training Algorithm Training objective is to minimize the cross-entropy loss, plus a L2-regularization term: Initialize word embeddings to precomputed values such as Word2Vec. Use AdaGrad with dropout to compute model parameters that approximately minimize this objective.

Evaluation Metrics for Dependency Parsing Unlabeled Atachment Score (UAS): % of tokens for which a system has predicted the correct parent. Labeled Atachment Score (LAS): % of tokens for which a system has predicted the correct parent with the correct arc label.

Sample Results on Penn WSJ Treebank

Conclusions Shift-reduce parsing is an efficient and effective alternative to standard PCFG parsing. Particularly effective for dependency parsing. Models deterministic, left-to-right parsing that seems to characterize human parsing (therefore subject to garden paths). Neural methods to select parse operations give state-of-the-art results.