Supertagging CMSC 35100 Natural Language Processing January 31, 2006.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
LTAG Semantics on the Derivation Tree Presented by Maria I. Tchalakova.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Midterm Review CS4705 Natural Language Processing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Starting With Complex Primitives Pays Off: Complicate Locally, Simplify Globally ARAVIND K. JOSHI Department of Computer and Information Science and Institute.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
1 Natural Language Processing Lecture Notes 11 Chapter 15 (part 1)
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
CPSC 503 Computational Linguistics
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
CIS Treebanks, Trees, Querying, QC, etc. Seth Kulick Linguistic Data Consortium University of Pennsylvania
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Treebanks, Trees, Querying, QC, etc.
PRESENTED BY: PEAR A BHUIYAN
Introduction to Parsing (adapted from CS 164 at Berkeley)
Textbook:Modern Compiler Design
Basic Parsing with Context Free Grammars Chapter 13
Table-driven parsing Parsing performed by a finite state machine.
Probabilistic and Lexicalized Parsing
CS 388: Natural Language Processing: Syntactic Parsing
LING/C SC 581: Advanced Computational Linguistics
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CS4705 Natural Language Processing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Parsing I: CFGs & the Earley Parser
Presentation transcript:

Supertagging CMSC Natural Language Processing January 31, 2006

Roadmap Motivation Tagging, Parsing & Lexicalization Supertags Definition & Examples Parsing as disambiguation Structural filters Unigram & N-gram models Results & Discussion

Motivation: Good, Robust Parsing Main approaches: Finite-state parsers: Shallow parsing, longest match resolves ambiguity Hand-crafted Partition domain-independent, domain-dependent Statistical parsers: Assign some structure to any string w/probability Automatically extracted from manual annotations Not linguistically transparent, hard to modify

Lexicalization Lexicalization: Provides integration of syntax, semantics of lexical Enforces subcategorization, semantic constraints Finite set of elementary structures Trees, strings,etc Anchored on lexical item Lexical items Associated w/at least one elementary grammar struct Finite set of operations combines

Framework FB-LTAG: Feature-based, lexicalized TAG Elementary trees: Lexical item (anchor) on frontier Provides complex description of anchor Specifies domain of locality for syn/sem constraints Initial (non-recursive), auxiliary (recursive) Derived by substitution and adjunction

Supertags Inspired by POS taggers Locally resolve much POS ambiguity before parsing (96-97% accurate) Limited context, e.g. tri-gram Elementary trees localize dependencies All and only dependent elements are in tree Supertag=elementary structure Highly ambiguous One per lexical item per distinct use Word with one POS has (probably) many supertags E.g. always a verb, many subcat frames

Extended Domain of Locality Each supertag must contain all and only the arguments of the anchor in the struct For each lexical item, grammar must contain supertag for each syntactic environment in which the item can appear

Factoring Recursion Recursive constructs represented as auxiliary trees / supertags Initial supertags define domains of locality for agreement, subcategorization Auxiliary trees can capture long distance dependencies by adjunction

Supertags: Ambiguity and Parsing One supertag per word in complete parse Must select to parse Problem: Massive ambiguity Solution: Manage with local disambiguation Supertags localize dependencies Apply local n-gram constraints b/f parsing POS disambiguation makes parsing easier Supertagging disambiguation makes it trivial Just structure combination

Example

Structural Filtering Simple local tests for supertag use Span of supertag: Minimum number of lexical items covered Can’t be larger than input string Left/Right span constraint: Left or right of anchor can’t be longer than input Lexical items: Can’t use if terminals don’t appear in input Features, etc Reduces ambiguity by 50% before parsing Verbs, esp. light verbs worst, reduce 50% Still > 250 per POS

N-gram Supertagging Initially, (POS,supertag) pairs Tri-gram model, trained on 5K WSJ sentences 68% accuracy – small corpus, little smoothing Dependency parsing Avoid fixed context length Dependency if substitutes or adjoins to tree Limited by size of LTAG parsed corpus Too much like regular parsing

Smoothed N-gram Models: Data Training data: Two sources: XTAG parses of WSJ, IBM, ATIS Small corpus but clean TAG derivations Converted Penn Treebank WSJ sentences Associates each lexical item with supertag using parse Requires heuristics using local tree contexts Labels of dominating nodes, siblings, parent’s sibs Approximate

Smoothed N-gram Models: Unigram Disambiguation redux: Assume structural filters applied Unigram approach: Select supertag for word by its preference Pr(t|w) = freq(t,w)/freq(w) Most frequent supertag for word in training Missing word: Backoff to most frequent supertag of POS of word Results: 73-77% top 1 vs 91% POS Errors: Verb subcat, PP attachment, NP head/mod

Smoothed N-gram Models: Trigram Enhances previous models Trigram: lexicalized, (word,supertag) Unigram: Adds context Ideally T=argmaxTP(T1,T2,..Tn)*P(W1,..Wn|T1,..Tn) Really Good-Turing smoothing, Katz backoff Results: up to 92%, 1M words training

Supertagging & Parsing Front-end to parser Analogous to POS tagging Key: disambiguate supertags A) Pick most probable by trigram method 4 vs 120 seconds per sentence to parse If tag is wrong, parse is wrong B) Pick n-best supertags Recovers many more parses, slower Still fails if ill-formed

Conclusion Integration of light-weight n-gram approach With rich, lexicalized representation N-gram supertagging produces almost parse Good tag, parsing effectiveness > 90% Faster than XTAG by factor of 30 Issues: conversion, ill-formed input, etc

Applications Lightweight dependency analysis Enhanced information retrieval Exploit syntactic structure of query Can enhance precision 33-> 79% Supertagging in other formalisms CCG

Lightweight Dependency Analysis Heuristic, linear-time, deterministic Pass 1: modifier supertags Pass 2: non-modifier supertags Compute dependencies for s (w/anchor w) For each frontier node d in s Connect word to left or right of w by position Label arc to d with internal node P=82.3, R=93.8 Robust to fragments