TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)
DAL CHUNKING ALL’ANALISI SINTATTICA COMPLETA
PROBLEMA: AMBIGUITA’ While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.
PROBLEMA: AMBIGUITA’ While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.
CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS Slides ELN?
CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS Capture constituency and ordering – Ordering: What are the rules that govern the ordering of words and bigger units in the language? – Constituency: How words group into units and how the various kinds of units behave
Constituency E.g., Noun phrases (NPs) Three parties from Brooklyn A high-class spot such as Mindy’s The Broadway coppers They Harry the Horse The reason he comes into the Hot Box How do we know these form a constituent?
Constituency (II) – They can all appear before a verb: Three parties from Brooklyn arrive… A high-class spot such as Mindy’s attracts… The Broadway coppers love… They sit – But individual words can’t always appear before verbs: *from arrive… *as attracts… *the is *spot is… – Must be able to state generalizations like: Noun phrases occur before verbs
Constituency (III) Preposing and postposing: – On September 17th, I’d like to fly from Atlanta to Denver – I’d like to fly on September 17th from Atlanta to Denver – I’d like to fly from Atlanta to Denver on September 17th. But not: – *On September, I’d like to fly 17th from Atlanta to Denver – *On I’d like to fly September 17th from Atlanta to Denver
Indicating constituents: brackets, trees [ S [ NP [ PRO I]] [ VP [ V prefer] [ NP [ Det a] [ Nom [ N morning] [ N flight] ] ] ] ] S NPVP NP VerbPro Nom DetNoun Iprefermorningaflight
CFG example S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left
NLE12 Beyond regular languages: Context- Free Grammars S NP VP NP Det Nominal Nominal Noun VP V Det the Det a Noun flight V left
CFGs: set of rules S -> NP VP – This says that there are units called S, NP, and VP in this language – That an S consists of an NP followed immediately by a VP – Doesn’t say that that’s the only kind of S – Nor does it say that this is the only place that NPs and VPs occur
Generativity As with FSAs you can view these rules as either analysis or synthesis machines – Generate strings in the language – Reject strings not in the language – Impose structures (trees) on strings in the language How can we define grammatical vs. ungrammatical sentences?
Derivations A derivation is a sequence of rules applied to a string that accounts for that string – Covers all the elements in the string – Covers only the elements in the string
Derivations as Trees S NPVP NP VerbPro Nom DetNoun Iprefermorningaflight
CFGs more formally A context-free grammar has 4 parameters (“is a 4-tuple”) 1)A set of non-terminal symbols (“variables”) N 2)A set of terminal symbols (disjoint from N) 3)A set of productions P, each of the form A -> Where A is a non-terminal and is a string of symbols from the infinite set of strings ( N)* 4)A designated start symbol S
Defining a CF language via derivation A string A derives a string B if – A can be rewritten as B via some series of rule applications More formally: – If A -> is a production of P – and are any strings in the set ( N)* – Then we say that A directly derives or A – Derivation is a generalization of direct derivation – Let 1, 2, … m be strings in ( N)*, m>= 1, s.t. 1 2, 2 3 … m-1 m We say that 1 derives m or 1* m – We then formally define language L G generated by grammar G A set of strings composed of terminal symbols derived from S L G = {w | w is in * and S * w}
NLE19 Derivations A DERIVATION of a string is a sequence of rule applications – E.g., the string “a flight” can be derived from the grammar above and symbol NP by the (leftmost first) derivation NP => Det Nominal => a Nominal => a Noun => a flight Derivations can be visualized as PARSE TREES The LANGUAGE defined by a CFG is the set of strings derivable from the start symbol S (for Sentence)
NLE20 Derivations and parse trees
NLE 21 A more formal definition A CFG is a 4-tuple consisting of
NLE22 What `context free’ means
NLE23 Derivations and languages The language L G GENERATED by a CFG grammar G is the set of strings of TERMINAL symbols that can be derived from the start symbol S using the production rules in G – L G = {w | w is in * and S derives w} The strings in L G are called GRAMMATICAL The strings not in L G are called UNGRAMMATICAL
NLE24 Grammar development One of the most basic skills in NLE is the ability to write a CFG for some fragment of a language (e.g., the dates) We’ll briefly cover some of the issues to be addressed when writing small CFG grammars
CFG in PYTHON NLTK, 8.3
ANALISI SINTATTICA TOP-DOWN search: the parse tree has to be rooted in the start symbol S – EXPECTATION-DRIVEN parsing – Esempio; RECURSIVE DESCENT BOTTOM-UP search: the parse tree must be an analysis of the input – DATA-DRIVEN parsing – Esempio: SHIFT-REDUCE
TOP-DOWN PARSING CON NLTK Recursive descent parsing (NLTK, 8.3) – nltk.RecursiveDescentParser(grammar) – nltk.app.rdparser()
BOTTOM-UP PARSING CON NLTK Shift-reduce (NLTK, 8.3, p. 305) – nltk.app.srparser() – ShiftReduceParser(grammar)
MODELLI PIU’ AVANZATI DI PARSING Left corner (NLTK) Chart (NLTK)
DEPENDENCIES E DEPENDENCY GRAMMAR (NLTK, 8.5)
IL PROBLEMA DELL’AMBIGUITA’ Ambiguity – Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867 Avoiding reparsing
COMMON STRUCTURAL AMBIGUITIES COORDINATION ambiguity – OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN ATTACHMENT ambiguity: – Gerundive VP attachment ambiguity I saw the Eiffel Tower flying to Paris – PP attachment ambiguity I shot an elephant in my pajamas
PP ATTACHMENT AMBIGUITY
AMBIGUITY: SOLUTIONS Use a PROBABILISTIC GRAMMAR (not covered in this module) Use semantics
SCRIVERE UNA GRAMMATICA NLTK, 8.6