Language and Speech Technology: Parsing Jan Odijk January 2011 LOT Winter School 2011 1.

Slides:



Advertisements
Similar presentations
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
Advertisements

BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming.
Features and Unification
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
中文信息处理 Chinese NLP Lecture 9.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Chapter 11: Parsing with Unification Grammars Heshaam Faili University of Tehran.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Speech and Language Processing SLP Chapter 13 Parsing.
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Basic Parsing with Context Free Grammars Chapter 13
Natural Language - General
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Parsing and More Parsing
Language and Speech Technology: Parsing
Parsing I: CFGs & the Earley Parser
David Kauchak CS159 – Spring 2019
Presentation transcript:

Language and Speech Technology: Parsing Jan Odijk January 2011 LOT Winter School

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 2

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 3

Grammars Grammar G = (V T, V N, P, S) where –V T terminal vocabulary –V N nonterminal vocabulary –P set of rules α→β (lhs → rhs) α Є V N + β Є (V N U V T ) * –S Є V N (start symbol) 4

Grammars Example Grammar G = (V T, V N, P, S) with –V T = {the, a, garden, book, in,} –V N = {NP, Det, N, P, PP} –P = {PP→P NP, NP→Det N, Det→the, Det→a, N→garden, N→book, P→in } –S = PP 5

Example Derivation PP (start symbol) P NP (PP →P NP) in NP(P → in) in Det N (NP →Det N) in the N (Det → the) in the garden ( N → garden) 6

Grammar Types Finite State Grammars (Type 3) –A → aA, A → a. A Є V N, a Є V T –Too weak to deal with natural language in toto –Efficient processing techniques –Often used for applications where partial analyses of natural language are sufficient –Often used for morphology / phonology 7

Grammar Types Context-Free Grammars (CFG, Type 2) –A → β. A Є V N –To weak to deal with natural language Surely for strong generative adequacy Also for weak generative adequacy –Reasonably efficient processing techniques –Generally taken as a basis for dealing with natural language, extended with other techniques 8

Grammar Types Context-Sensitive Grammars (Type 1) –α→β, |α| <= |β| –Usually not considered in the context of NLP Type-0 grammars –No restrictions –Usually not considered except in combination with CFG 9

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 10

Parsing –Is an algorithm It must finish! –For assigning syntactic structures Ambiguity! –To a sequence of terminal symbols –In accordance with a given grammar –(If possible, efficient) 11

Parsing for CFGs Focus here on –Parser for CFGs –for natural language –More specifically: Earley parser Why? –Most NLP systems with a grammar use a parser for CFG as a basis – Basic techniques will also recur in parsers for different grammar types 12

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 13

Naïve Parsing see handout Problems for naïve parsing –A lot of re-parsing of subtrees –Bottom-up Wastes time and space on trees that cannot lead to S –Top-down Wastes time and space on trees that cannot match input string 14

Naïve parsing Top-down –Recursion problem Can be solved for right-recursion by matching with input tokens, but Problem with left recursion remains: –NP → NP PP Ambiguity –Temporary ambiguity –Real ambiguity 15

Naïve parsing Naïve Parsing Complexity –Time needed to parse is exponential: –c n (c a constant, length input tokens) –(in the worst case) Takes too much time Is not practically feasible 16

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 17

Earley Parser Top-down approach but –Predictor avoids wasting time and space on irrelevant trees –Does not build actual structures, but stores enough information to reconstruct structures –Uses dynamic programming technique to avoid recomputation of subtrees –Avoids problems with left recursion –Makes complexity cubic: n 3 18

Earley Parser Number positions in input string (0.. N) 0 book 1 that 2 flight 3 Notation [i,j] stands for the string from position i to position j –[0,1] = “book” –[1,3] = “that flight” –[2,2]= “” 19

Earley Parser Dotted Rules –is a grammar rule + indication of progress – ie. Which elements of the rhs have been seen yet and which ones not yet –Indicated by a dot (we use an asterisk) Example –S → Aux NP * VP –Aux and NP have been dealt with but VP not yet 20

Earley Parser Input: –Sequence of N words (words[1..N]), and –grammar Output: –a Store = (agenda, chart) (sometimes chart = N+1 chart entries: chart[0.. N]) 21

Earley Parser Agenda, chart: sets of states A state consists of –Dotted rule –Span relative to the input: [i,j] –Previous states: list of state identifiers And gets a unique identifier Example –S11: VP → V’ * NP; [0,1]; [S8] 22

Earley Parser State –Is complete iff dot is the last element in the dotted rule E.g. state with VP → Verb NP * is complete NextCat (state) –Only applies if state is not complete –Is the category immediately following the dot –VP → Verb * NP : NextCat(state)= NP 23

Earley Parser 3 operations on states, –Predictor Predicts which categories to expect –Scanner if a terminal category C is expected, and a word of category C is encountered in this position, –Consumes the word and shifts the dot –Completer Applies to a complete state s, and modifies all states that gave rise to this state 24

Earley Parser Predictor –Applies to an incomplete state –( A → α * B β, [i,j], _) –B is a nonterminal –For each (B → γ) in grammar Make a new state s = (B → * γ, [j,j], []) enqueue(s, store) –Enqueue (s,ce) = add s to ce unless ce already contains s 25

Earley Parser Scanner –Applies to an incomplete state –( A → α * b β, [i,j], _) –b is a terminal Make a new state s = (b → words[j] *, [j,j+1], []) enqueue(s, store) 26

Earley Parser Completer –Applies to an complete state –( B → γ *, [j,k], L1) –For each (A → α * B β, [i,j], L2) in chart[j] Make new state s = (A → α B * β, [i,k], L2 ++ L1) enqueue(s, store) 27

Earley Parser Store = (agenda, chart) Apply operations on states in the agenda until the agenda is empty When applying an operation to a state s in the agenda –Move the state s from the agenda into the chart –Add the resulting states of the operation to the agenda 28

Earley Parser Initial store = ([Г → *S], emptychart) –Where Г is a ‘fresh’ nonterminal start symbol Input sentence accepted –Iff there is a state (Г → S *, [0,N], LS) in the chart and the agenda is empty Parse tree(s) can be reconstructed via the list of earlier states (LS) 29

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 30

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 31

Earley Parser Extensions Replace elements of V by feature sets (attribute-value matrices, AVMs) –Harmless if finitely valued –E.g. instead of NP [cat=N, bar=max, case=Nom] –Usually other relation than ‘=‘ used for comparison E.g. ‘is compatible with’, ‘unifies with’, ‘subsumes’ 32

Earley Parser Extensions Replace rhs of rules by regular expressions over V (or AVMs) E.g. VP → V NP? (AP | PP)* abbreviates VP → V, VP → V NP, VP → V APorPP, VP → V NP APorPP, APorPP → AP APorPP, APorPP → PP APorPP, APorPP → AP, APorPP → PP Where APorPP is a ‘fresh’ virtual nonterminal Virtual : is discarded when constructing the trees 33

Earley Parser Extensions My grammatical formalism has no PS rules! But only ‘lexical projection’ of syntactic selection properties (subcategorization list) E.g. buy: [cat=V, subcat = [_ NP PP, _ NP]]  create PS rules on the fly –If buy occurs in the input tokens, create rules VP → buy NP PP and VP → buy NP –From the lexical entry –And use these rules to parse 34

Earley Parser Extensions My grammar contains ε-rules: –NP → ε –Where ε stands for the empty string –(i.e. NP matches the empty string in the input token list) Earley parser can deal with these! But extensive use creates many ambiguities! 35

Earley Parser Extensions My grammar contains empty categories –Independent PRO as subject of non-finite verbs –PRO buying books is fun pro as subject of finite verbs in pro-drop languages –pro no hablo Español Pro as subject of imperatives –pro schaam je! Epsilon rules can be used or represent this at other level 36

Earley Parser Extensions My grammar contains empty categories –Dependent trace of wh-movement –What did you buy t Trace of Verb movement (e.g V2 in Dutch, German, Aux movement in English –Hij belt hem op t –Did you t buy a book? –Epsilon rules are not sufficient 37

Earley Parser Extensions Other types (levels) of representation LFG: (c-structure, f-structure) HPSG: DAGs (special type of AVMs) (constituent structure, semantic representation) Use CFG as backbone grammar –Which accepts a superset of the language –For each rule specify how to construct other level of representation –Extend Earley parser to deal with this 38

Earley Parser Extensions Other types (levels) of representation f-structure, DAGs, semantic representations are not finitely valued Thus it will affect efficiency But allows dealing with e.g. –Non-context-free aspects of a language –Unbounded dependencies (e.g. by ‘gap-threading’) 39

Earley Parser in Practice Parsers for natural language yield –Many many parse trees for an input sentence Many more than you can imagine (thousands) Even for relatively short, simple sentences They are all syntactically correct But make no sense semantically 40

Earley Parser in Practice Additional constraining is required –To reduce the temporary ambiguities –To come up with the ‘best’ parse Can be done by semantic constraints –But only feasible for very small domains Is most often done using probabilities –Rule probabilities derived from frequencies in treebanks 41

Parsers: Some Examples Dutch: Alpino parserAlpino parser Stanford parsers –English, Arabic, Chinese English: ACL OverviewACL Overview 42

Overview Grammars & Grammar Types Parsing –Naïve Parsing –Earley Parser –Example (using handouts) Earley Parser Extensions Parsers & CLARIN 43

Parsers & CLARIN Parser allows one to automatically analyze large text corpora Resulting in treebanks Can be used for linguistic research –But with care!! Example: Lassy Demo (Dutch)Lassy Demo –Simple search interface to LASSY-small Treebank –Use an SVG compatible browser (e.g. Firefox) 44

Parsers & CLARIN Example of linguistic research using a treebank: Van Eynde 2009: A treebank-driven investigation of predicative complements in DutchVan Eynde

Thanks for your attention! 46