Issues in Computational Linguistics: Parsing and Generation Dick Crouch and Tracy King.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Lexical Functional Grammar History: –Joan Bresnan (linguist, MIT and Stanford) –Ron Kaplan (computational psycholinguist, Xerox PARC) –Around 1978.
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammar Development Platform Miriam Butt October 2002.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
PARSING WITH CONTEXT-FREE GRAMMARS
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Features & Unification Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Features & Unification Ling 571 Deep Processing Techniques for NLP January 31, 2011.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Features and Unification
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
7. Parsing in functional unification grammar Han gi-deuc.
2007CLINT-LIN-FEATSTR1 Computational Linguistics for Linguists Feature Structures.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Carnegie Mellon School of Computer Science Copyright © 2007, Carnegie Mellon. All Rights Reserved. 1 LTI Grammars and Lexicons Grammar Writing Lecture.
Chapter 11: Parsing with Unification Grammars Heshaam Faili University of Tehran.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Natural Language Processing Vasile Rus
Parsing IV Bottom-up Parsing
Basic Parsing with Context Free Grammars Chapter 13
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
CS 388: Natural Language Processing: Syntactic Parsing
Parsing and More Parsing
CS416 Compiler Design lec00-outline February 23, 2019
Parsing I: CFGs & the Earley Parser
Dekai Wu Presented by David Goss-Grubbs
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
David Kauchak CS159 – Spring 2019
COMPILER CONSTRUCTION
Presentation transcript:

Issues in Computational Linguistics: Parsing and Generation Dick Crouch and Tracy King

Course Purpose To introduce linguistics students and theoretically-oriented LFG researchers to high-level issues in computational linguistics To introduce computer science students and researchers to computational linguistics from an LFG perspective

Course Outline Parsing and generation (Day 1) Grammar engineering (Day 2) Disambiguation and packing (Day 3) Semantics (Day 4)

Parsing and Generation Parsing –Context-free parsing –Unification Generation and reversibility

What is parsing? Have a natural language string (sentence) Want to produce a structure/analysis for it Applications use this structure as input

Types of structures for parse analyses Phrase structure trees (LFG c-structure) Feature attribute value matrices (LFG f- structure) Shallow parsing –Part of speech tagging I/PRP saw/VBD her/PRP duck/VB./PUNCT –Chunking [when I read] [a sentence], [I read it] [a chunk] [at a time].

Phrase structure trees Constituency (and type of constituent) –NPs, VPs, Ss Dominance –S dominates NP and VP Linear order –NP precedes VP S NPVP

Context free grammar Set of rewrite rules NP --> Det Nbar NP --> ProperNoun Nbar --> Noun Nbar --> Adj Noun Lexicon of "leaves" Det --> a Det --> the Noun --> box Start symbol S

Parsing with CFG Have a string Want a tree spanning it Need: –a grammar –a way to apply the grammar to the string Top down parsing Bottom up parsing Chart parsing

Top down parsing with a CFG Start with start symbol (e.g. S) Expand all possible ways (1st “ply”) Expand the constituents of the new trees Once get to parts of speech, map against the words Reject trees that don’t match

Example CFG rules and parse S -> NP VP V -> push S -> VP N -> push NP -> D N N -> boxes NP -> N D -> the VP -> V NP String: Push the boxesS NPVP S S NPVP DN S NPVP N S VNP

Problems: top down parsing Generate many trees inconsistent with the input Duplicate work by rebuilding constituents

Bottom up parsing with a CFG Apply rules to words in string First do part of speech rules Expand these upwards Reject any trees that do not top out with the start symbol

Example CFG rules and parse S -> NP VP V -> push S -> VP N -> push NP -> D N N -> boxes NP -> N D -> the VP -> V NP String: Push the boxes push V the D boxes N push N the D boxes N push V the D boxes N NP push N the D boxes N NP

Problems: Bottom up parsing Explore trees that cannot result in an S Duplicate work by rebuilding constituents

Chart Parsing: Earley algorithm Parallel top down search Eliminate repetitive solution of subtrees Left-right pass creates an array/chart At the end of the string, chart encodes all the possible parses –each subtree only represented once

Chart Processing Look words up in the lexicon. Create an edge for each entry and put it on the agenda. Find all words in the lexicon that are substrings of the input specification Until the agenda is empty, remove edges from it and put them in the chart –Active edge added to chart: schedule to combine with inactive edges to its right –Inactive edge added to chart: schedule to combine with active edge to its left Solutions are edges that cover all the input.

The Initial Situation

Step 1

Step 2

Step 3

Step 4

The Final Situation

The Context-free parsing paradox The number of structures a sentence may have grows exponentially with its length, but You can characterize them all in polynomial (cubic) time HOWEVER It takes exponential time to examine (read out) all the structures

UNIFICATION Feature structures –LFG f(unctional)-structures Unifying feature structures –Merging f-structure information

Feature Structures Sets of feature-value pairs –attribute-value matrices Values can be –atomic [ PERS 3 ] [ NUM sg ] –feature-structures [ SUBJ [ PERS 3 ] ] [ TNS-ASP [ TENSE present ] ]

Unifying feature structures Merge the information of two feature structures –Takes two feature structures as arguments –Returns a feature structure if successful: if the information is compatible Simple example:  [ NUM sg ] UNI [ NUM sg ] = [ NUM sg ] * [ NUM sg ] UNI [ NUM pl ]

More unification Empty value is compatible: a value can unify with the empty value  [ NUM sg ] UNI [ NUM [ ] ] = [ NUM sg ] Merger: two different features can unify to form a complex structure  [ NUM sg ] UNI [ PERS 3 ] = [ NUM sg PERS 3 ]

Mathematics and computation LFG: simple combination of two simple theories –Context-free grammars for c-structures –Quantifier free theory of equality for f-structures Both theories are easy to compute –Cubic CFG Parsing –Linear equation solving Combination is difficult: Recognition problem is NP Complete –Exponential/intractible in the worst case

Sources of worst-case complexity Exponentially many trees –CFG parser quickly produces packed representation of all trees –But trees must be enumerated to produce f- structure constraints, determine satisfiability –CFG can be exponentially ambiguous Boolean combinations of per-tree constraints (  SUBJ NUM)  SG  (  SUBJ PERS)  3 Exponentially many exponential problems

Strategies for efficient processing Restrict the formalism –Still expressive enough for natural language? Drop the curve –Better CFG parser, unifier –Compile to (virtual) machine language –Wait Change the shape of the curve (on typical cases) –Bet on common cases, optimize for those –Polynomial representation of exponentially many ambiguities when solutions are combinations of independent subtrees

Parsing summary Have a natural language string (sentence) Want to produce a structure for it –Tree (LFG c-structure) –Attribute value matrix (LFG f-structure) Some computational issues –Efficiency –Ambiguity

Generation: Basic idea Have: –an analysis (LFG f-structure) –a grammar (reversibility issues) Want the natural language string(s) that correspond to the analysis Issues –Is there any such string –Underspecified input

Reversibility Parsing grammar and generation grammar need not be the same However, if they are: –only have to write one grammar and lexicon (modulo minor differences) –easier maintenance

Equivalence classes Generation produces different equivalence classes than parsing For a given f-structure, there may be several c-structures that rise to it. –Grammar dependent Examples: –{ Two | 2 } { didn’t | did not } come. –{ Yesterday we fell. | We fell yesterday. }

C-structure forest S NP VP Two2 V did NegVP notn’tV come

C-structure forest with f-structure correspondence S NP VP Two2 V did NegVP notn’tV come PRED ‘come ’ SUBJ PRED ‘two’ NUM pl ADJ {[ PRED ‘not’ ]} TNS past

Generating the c-structure/string For each f-structure (e.g. SUBJ, ADJ), the generator produces all the c-structures consistent with it This gives a generation chart –similar to a parsing chart –different equivalence classes Completeness –everything generated at least once –semantic forms (PREDs) generated at most once the red, red dog vs. the red dog

Generation chart: Kay/Maxwell Each edge lists the semantic facts generated beneath it Root edge must list all the semantic facts Exponential in number of Adjuncts due to optionality in the grammar

XLE generation algorithm Two phases Quickly construct a guide, a representation from which at least all the proper strings, and some incorrect ones, can be produced. –The guide is constructed on the basis of only some of the functional information. –The guide gives a finite-set of possibilities. Run all (or most) of the parsing machinery on these possibilities to evaluate the functional constraints.

XLE generation issues XLE is oriented towards process efficiency. Issues: –If the generated language is infinite, then XLE must depend on some pumping mechanism to enumerate (in principle) all possible strings on the basis of a finite basis-set. –Underspecified input

Underspecified input F-structure may be underspecified (features may be missing) –machine translation –knowledge representation Generator must be able to fill in the missing values –which attributes and values are addable –how to stop the process

Bounded underspecification Problem if input is an arbitrary underspecification of some intended f- structure (Context-free results do not hold) Fail for arbitrary extensions. OK if only a bounded number of features can be added: The set of all possible extensions is then finite (Context-free result does hold).

Generation: Where do the analyses come from? To generate, need an analysis (f-structure) Producing these structures can be challenging –for a given sentence –for a discourse (connected set of sentences) F-structure rewrites –translation –sentence condensation Challenge comes with non-f-structure sources –Question answering

Conclusions Parsing: have a string, produce an analysis –LFG: produce c-structure and f-structure –Context free backbone for c-structure –Unification for f-structure Generation: have an f-structure, produce a string –Efficient, complete generation –Reversibility of the grammar –Underspecified input