Language and Speech Technology: Parsing

Slides:



Advertisements
Similar presentations
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Advertisements

Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming.
Features and Unification
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 Features and Unification Chapter 15 October 2012 Lecture #10.
Language and Speech Technology: Parsing Jan Odijk January 2011 LOT Winter School
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Speech and Language Processing SLP Chapter 13 Parsing.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
Chapter 3 – Describing Syntax
CSC 594 Topics in AI – Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Programming Languages Translator
CS510 Compiler Lecture 4.
Basic Parsing with Context Free Grammars Chapter 13
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Subject Name:COMPILER DESIGN Subject Code:10CS63
Compiler Design 4. Language Grammars
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
CSCI 5832 Natural Language Processing
Lecture 14: Grammar and Parsing (II) November 11, 2004 Dan Jurafsky
Natural Language - General
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Kanat Bolazar February 16, 2010
CPSC 503 Computational Linguistics
CSA2050 Introduction to Computational Linguistics
Parsing I: CFGs & the Earley Parser
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
David Kauchak CS159 – Spring 2019
COMPILER CONSTRUCTION
NLP.
Presentation transcript:

Language and Speech Technology: Parsing Jan Odijk January 2011 LOT Winter School 2011

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Grammars Grammar G = (VT, VN, P, S) where VT terminal vocabulary VN nonterminal vocabulary P set of rules α→β (lhs → rhs) α Є VN+ β Є (VN U VT)* S Є VN (start symbol)

Grammars Example Grammar G = (VT, VN, P, S) with VT = {the, a, garden, book, in,} VN = {NP, Det, N, P, PP} P = {PP→P NP, NP→Det N, Det→the, Det→a, N→garden, N→book, P→in } S = PP

Example Derivation PP (start symbol) P NP (PP →P NP) in NP (P → in) in Det N (NP →Det N) in the N (Det → the) in the garden ( N → garden)

Grammar Types Finite State Grammars (Type 3) A → aA, A → a. A Є VN, a Є VT Too weak to deal with natural language in toto Efficient processing techniques Often used for applications where partial analyses of natural language are sufficient Often used for morphology / phonology

Grammar Types Context-Free Grammars (CFG, Type 2) A → β. A Є VN To weak to deal with natural language Surely for strong generative adequacy Also for weak generative adequacy Reasonably efficient processing techniques Generally taken as a basis for dealing with natural language, extended with other techniques

Grammar Types Context-Sensitive Grammars (Type 1) Type-0 grammars α→β, |α| <= |β| Usually not considered in the context of NLP Type-0 grammars No restrictions Usually not considered except in combination with CFG

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Parsing Parsing Is an algorithm For assigning syntactic structures It must finish! For assigning syntactic structures Ambiguity! To a sequence of terminal symbols In accordance with a given grammar (If possible, efficient)

Parsing for CFGs Focus here on Why? Parser for CFGs for natural language More specifically: Earley parser Why? Most NLP systems with a grammar use a parser for CFG as a basis Basic techniques will also recur in parsers for different grammar types

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Naïve Parsing see handout Problems for naïve parsing A lot of re-parsing of subtrees Bottom-up Wastes time and space on trees that cannot lead to S Top-down Wastes time and space on trees that cannot match input string

Naïve parsing Top-down Ambiguity Recursion problem Temporary ambiguity Can be solved for right-recursion by matching with input tokens, but Problem with left recursion remains: NP → NP PP Ambiguity Temporary ambiguity Real ambiguity

Naïve parsing Naïve Parsing Complexity Takes too much time Time needed to parse is exponential: cn (c a constant, length input tokens) (in the worst case) Takes too much time Is not practically feasible

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Earley Parser Top-down approach but Predictor avoids wasting time and space on irrelevant trees Does not build actual structures, but stores enough information to reconstruct structures Uses dynamic programming technique to avoid recomputation of subtrees Avoids problems with left recursion Makes complexity cubic: n3

Earley Parser Number positions in input string (0 .. N) 0 book 1 that 2 flight 3 Notation [i,j] stands for the string from position i to position j [0,1] = “book” [1,3] = “that flight” [2,2]= “”

Earley Parser Dotted Rules Example is a grammar rule + indication of progress ie. Which elements of the rhs have been seen yet and which ones not yet Indicated by a dot (we use an asterisk) Example S → Aux NP * VP Aux and NP have been dealt with but VP not yet

Earley Parser Input: Output: Sequence of N words (words[1..N]), and grammar Output: a Store = (agenda, chart) (sometimes chart = N+1 chart entries: chart[0 .. N])

Earley Parser Agenda, chart: sets of states A state consists of Dotted rule Span relative to the input: [i,j] Previous states: list of state identifiers And gets a unique identifier Example S11: VP → V’ * NP; [0,1]; [S8]

Earley Parser State NextCat (state) Is complete iff dot is the last element in the dotted rule E.g. state with VP → Verb NP * is complete NextCat (state) Only applies if state is not complete Is the category immediately following the dot VP → Verb * NP : NextCat(state)= NP

Earley Parser 3 operations on states, Predictor Scanner Completer Predicts which categories to expect Scanner if a terminal category C is expected, and a word of category C is encountered in this position, Consumes the word and shifts the dot Completer Applies to a complete state s, and modifies all states that gave rise to this state

Earley Parser Predictor Applies to an incomplete state ( A → α * B β, [i,j], _) B is a nonterminal For each (B → γ) in grammar Make a new state s = (B → * γ, [j,j], []) enqueue(s , store) Enqueue (s,ce) = add s to ce unless ce already contains s

Earley Parser Scanner Applies to an incomplete state ( A → α * b β, [i,j], _) b is a terminal Make a new state s = (b → words[j] * , [j,j+1], []) enqueue(s , store)

Earley Parser Completer Applies to an complete state ( B → γ *, [j,k], L1) For each (A → α * B β, [i,j], L2) in chart[j] Make new state s = (A → α B * β, [i,k], L2 ++ L1) enqueue(s , store)

Earley Parser Store = (agenda, chart) Apply operations on states in the agenda until the agenda is empty When applying an operation to a state s in the agenda Move the state s from the agenda into the chart Add the resulting states of the operation to the agenda

Earley Parser Initial store = ([Г → *S], emptychart) Where Г is a ‘fresh’ nonterminal start symbol Input sentence accepted Iff there is a state (Г → S *, [0,N], LS) in the chart and the agenda is empty Parse tree(s) can be reconstructed via the list of earlier states (LS)

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Earley Parser Extensions Replace elements of V by feature sets (attribute-value matrices, AVMs) Harmless if finitely valued E.g. instead of NP [cat=N, bar=max, case=Nom] Usually other relation than ‘=‘ used for comparison E.g. ‘is compatible with’, ‘unifies with’, ‘subsumes’

Earley Parser Extensions Replace rhs of rules by regular expressions over V (or AVMs) E.g. VP → V NP? (AP | PP)* abbreviates VP → V, VP → V NP, VP → V APorPP, VP → V NP APorPP, APorPP → AP APorPP, APorPP → PP APorPP, APorPP → AP, APorPP → PP Where APorPP is a ‘fresh’ virtual nonterminal Virtual : is discarded when constructing the trees

Earley Parser Extensions My grammatical formalism has no PS rules! But only ‘lexical projection’ of syntactic selection properties (subcategorization list) E.g. buy: [cat=V, subcat = [_ NP PP, _ NP]]  create PS rules on the fly If buy occurs in the input tokens, create rules VP → buy NP PP and VP → buy NP From the lexical entry And use these rules to parse

Earley Parser Extensions My grammar contains ε-rules: NP → ε Where ε stands for the empty string (i.e. NP matches the empty string in the input token list) Earley parser can deal with these! But extensive use creates many ambiguities!

Earley Parser Extensions My grammar contains empty categories Independent PRO as subject of non-finite verbs PRO buying books is fun pro as subject of finite verbs in pro-drop languages pro no hablo Español Pro as subject of imperatives pro schaam je! Epsilon rules can be used or represent this at other level

Earley Parser Extensions My grammar contains empty categories Dependent trace of wh-movement What did you buy t Trace of Verb movement (e.g V2 in Dutch, German, Aux movement in English Hij belt hem op t Did you t buy a book? Epsilon rules are not sufficient

Earley Parser Extensions Other types (levels) of representation LFG: (c-structure, f-structure) HPSG: DAGs (special type of AVMs) (constituent structure, semantic representation) Use CFG as backbone grammar Which accepts a superset of the language For each rule specify how to construct other level of representation Extend Earley parser to deal with this

Earley Parser Extensions Other types (levels) of representation f-structure, DAGs, semantic representations are not finitely valued Thus it will affect efficiency But allows dealing with e.g. Non-context-free aspects of a language Unbounded dependencies (e.g. by ‘gap-threading’)

Earley Parser in Practice Parsers for natural language yield Many many parse trees for an input sentence Many more than you can imagine (thousands) Even for relatively short, simple sentences They are all syntactically correct But make no sense semantically

Earley Parser in Practice Additional constraining is required To reduce the temporary ambiguities To come up with the ‘best’ parse Can be done by semantic constraints But only feasible for very small domains Is most often done using probabilities Rule probabilities derived from frequencies in treebanks

Parsers: Some Examples Dutch: Alpino parser Stanford parsers English, Arabic, Chinese English: ACL Overview

Overview Grammars & Grammar Types Parsing Earley Parser Extensions Naïve Parsing Earley Parser Example (using handouts) Earley Parser Extensions Parsers & CLARIN

Parsers & CLARIN Parser allows one to automatically analyze large text corpora Resulting in treebanks Can be used for linguistic research But with care!! Example: Lassy Demo (Dutch) Simple search interface to LASSY-small Treebank Use an SVG compatible browser (e.g. Firefox)

Parsers & CLARIN Example of linguistic research using a treebank: Van Eynde 2009: A treebank-driven investigation of predicative complements in Dutch

Thanks for your attention!