May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
CS Basic Parsing with Context-Free Grammars.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course,
Context-Free Grammars Lecture 7
Discussion #31/20 Discussion #3 Grammar Formalization & Parse-Tree Construction.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 4705 Basic Parsing with Context-Free Grammars.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
CSA2050 Introduction to Computational Linguistics Parsing I.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Recursive Data Structures and Grammars Themes –Recursive Description of Data Structures –Grammars and Parsing –Recursive Definitions of Properties of Data.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Programming Languages Translator
Introduction to Parsing (adapted from CS 164 at Berkeley)
Basic Parsing with Context Free Grammars Chapter 13
Syntax Analysis Chapter 4.
4 (c) parsing.
Compiler Design 4. Language Grammars
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSA2050 Introduction to Computational Linguistics
csa3180: Setence Parsing Algorithms 1
Parsing I: CFGs & the Earley Parser
David Kauchak CS159 – Spring 2019
Presentation transcript:

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars

May 2006CLINT-LN Parsing2 Chomsky Hierarchy

May 2006CLINT-LN Parsing3 Weak Equivalence A grammar should generate all and only sentences in the language under investigation. Let H be language under investigation and G be the grammar we are developing. The grammar should generate all sentences in the language, i.e. for any s in H, s is also in L(G). The grammar should generate only sentences in the language, i.e. for any s in L(G), s is also in H.

May 2006CLINT-LN Parsing4 All and Only L(G) G H =

May 2006CLINT-LN Parsing5 Overgeneration L(G) H

May 2006CLINT-LN Parsing6 Overgeneration Basic Problem: L(G) is larger than H There are sentences generated by the grammar that are not in H. The “only” constraint is violated. The grammar is too weak. Example: a grammar which ignores number and gender

May 2006CLINT-LN Parsing7 Undergeneration L(G) H

May 2006CLINT-LN Parsing8 Undergeneration Basic Problem: H is larger than L(G) There are sentences in H that are not generated by the grammar. The “all” constraint is violated. The grammar is too strong. Examples (for H = NL): –a grammar which lacks recursion; –a finite state grammar

May 2006CLINT-LN Parsing9 Weak and Strong Equivalence A grammar/lexicon G generates a characteristic language L(G) Grammars G1 and G2 are said to be weakly equivalent if L(G1) = L(G2) A grammar G also assigns one or more phrase structures to any s in L(G) Weakly equivalent grammars G1 and G2 are said to be strongly equivalent if in addition they assign identical phrase structures to any s in L(G1).

May 2006CLINT-LN Parsing10 Weak Equivalence A  a A  aA A  a A  Aa

May 2006CLINT-LN Parsing11 Appropriate Structure The structure assigned by the grammar should be appropriate. The structure should Be understandable Allow us to make generalisations. Reflect the underlying meaning of the sentence.

May 2006CLINT-LN Parsing12 Ambiguity A grammar is ambigious if it assigns two or more structures to the same sentence. The grammar should not generate too many possible structures for the same sentence. There is a tradeoff between ambiguity and clarity: too much detail can obscure the design principles. Too little detail means that the grammar is undercommitted,

May 2006CLINT-LN Parsing13 Limitations of CF Grammars Simple CF Grammars tend to overgenerate The only mechanism available to control overgeneration is to invent new categories. Proliferation of categories soon becomes intractable. Problems include –Size of grammar –Understandability of grammar

May 2006CLINT-LN Parsing14 Criteria for Evaluating Grammars Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand? How many rules are there? Does it contain generalisations or special cases? How ambiguous is it? How many structures for a given sentence?

May 2006CLINT-LN Parsing15 CF Phrase Structure Rules s → np vp np → d N vp → V vp → V np (4 rules) Nice grammar – but it overgenerates Solution – invent more categories nps, nppl, vpsn, vppl etc.

May 2006CLINT-LN Parsing16 CF Phrase Structure Rules with Number Agreement s -> nps vps s -> nppl vppl nps -> DS NS nppl -> DPL NPL vps -> VS vps -> VS nps vps -> VS nppl vppl -> VPPL vppl -> VPPL nps vppl -> VPPL nppl (10 rules)

May 2006CLINT-LN Parsing17 Constraints and Information Structures PATR2 handles this problem by augmenting CF rules with constraints between constituents. Basic idea is that each constituent of a CF rule is associated with an information structure We then express constraints between information structures.

May 2006CLINT-LN Parsing18 Example of a PATR rule with Number Constraints Rule s -> np vp =

May 2006CLINT-LN Parsing19 Example of a Grammar with Number Constraints s -> np vp = np -> D N = vp -> V =

May 2006CLINT-LN Parsing20 Summary Pure CFGs become unwieldy when we try to constrain them to incorporate, for example, agreement information PATR2 deals with this problem by associating information structures and constraints with each rule constituent. Information structures are often referred to as F-structures.

May 2006CLINT-LN Parsing21 Grammar versus Parsing A grammar is a description of a language. A grammar abstractly associates structures with all and only the strings of the grammar. A parser is an implementation of an algorithm that actually discovers the structures assigned by a grammar to a sentence. Typically there may be several different parsing algorithms for achieving this. Top down strategy Bottom up strategy

May 2006CLINT-LN Parsing22 Parse Tree A valid parse tree for a grammar G is a tree –whose root is the start symbol for G –whose interior nodes are nonterminals of G –whose children of a node T (from left to right) correspond to the symbols on the right hand side of some production for T in G. –whose leaf nodes are terminal symbols of G. Every sentence generated by a grammar has a corresponding parse tree Every valid parse tree exactly covers a sentence generated by the grammar

May 2006CLINT-LN Parsing23 Parsing Problem Given grammar G and sentence A find all valid parse trees for G that exactly cover A S VP NP V Det Nom N book that flight

May 2006CLINT-LN Parsing24 Soundness and Completeness A parser is sound if every parse tree it returns is valid. A parser is complete for grammar G if for all s  L(G) –it terminates –it produces the corresponding parse tree For many purposes, we settle for sound but incomplete parsers

May 2006CLINT-LN Parsing25 Top Down Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules. Nodes with pre-terminal (word class) labels are compared to input words.

May 2006CLINT-LN Parsing26 Top Down Search Space Start node → Goal node ↓

May 2006CLINT-LN Parsing27 Bottom Up Each state is a forest of trees. Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon) Transformations look for places where RHS of rules can fit. Any such place is replaced with a node labelled with LHS of rule.

May 2006CLINT-LN Parsing28 Bottom Up Search Space fl

May 2006CLINT-LN Parsing29 Top Down vs Bottom Up General Top down –For: Never wastes time exploring trees that cannot be derived from S –Against: Can generate trees that are not consistent with the input Bottom up –For: Never wastes time building trees that cannot lead to input text segments. –Against: Can generate subtrees that can never lead to an S node.

May 2006CLINT-LN Parsing30 Top Down Parsing - Remarks Top-down parsers do well if there is useful grammar driven control: search can be directed by the grammar. Left recursive rules can cause problems. A top-down parser will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V. Top-down is unsuitable for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup. Useless work: expands things that are possible top-down but not there. Repeated work: anywhere there is common substructure

May 2006CLINT-LN Parsing31 Bottom Up Parsing - Remarks Empty categories: termination problem unless rewriting of empty constituents is somehow restricted (but then it’s generally incomplete) Inefficient when there is great lexical ambiguity (grammar driven control might help here) Conversely, it is data-directed: it attempts to parse the words that are there. Both TD (LL) and BU (LR) parsers can do work exponential in the sentence length on NLP problems Useless work: locally possible, but globally impossible. Repeated work: anywhere there is common substructure

May 2006CLINT-LN Parsing32 Development of a Concrete Strategy Combine best features of both top down and bottom up strategies. –Top down, grammar directed control. –Bottom up filtering. Examination of alternatives in parallel uses too much memory. Depth first strategy using agenda-based control.