Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

Similar presentations


Presentation on theme: "TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)"— Presentation transcript:

1 TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)

2 DAL CHUNKING ALL’ANALISI SINTATTICA COMPLETA

3 PROBLEMA: AMBIGUITA’ While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.

4 PROBLEMA: AMBIGUITA’ While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I'll never know.

5 CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS Slides ELN?

6 CARATTERIZZAZIONE DELLA SINTASSI DI UNA LINGUA: CONTEXT-FREE GRAMMARS Capture constituency and ordering – Ordering: What are the rules that govern the ordering of words and bigger units in the language? – Constituency: How words group into units and how the various kinds of units behave

7 Constituency E.g., Noun phrases (NPs) Three parties from Brooklyn A high-class spot such as Mindy’s The Broadway coppers They Harry the Horse The reason he comes into the Hot Box How do we know these form a constituent?

8 Constituency (II) – They can all appear before a verb: Three parties from Brooklyn arrive… A high-class spot such as Mindy’s attracts… The Broadway coppers love… They sit – But individual words can’t always appear before verbs: *from arrive… *as attracts… *the is *spot is… – Must be able to state generalizations like: Noun phrases occur before verbs

9 Constituency (III) Preposing and postposing: – On September 17th, I’d like to fly from Atlanta to Denver – I’d like to fly on September 17th from Atlanta to Denver – I’d like to fly from Atlanta to Denver on September 17th. But not: – *On September, I’d like to fly 17th from Atlanta to Denver – *On I’d like to fly September 17th from Atlanta to Denver

10 Indicating constituents: brackets, trees [ S [ NP [ PRO I]] [ VP [ V prefer] [ NP [ Det a] [ Nom [ N morning] [ N flight] ] ] ] ] S NPVP NP VerbPro Nom DetNoun Iprefermorningaflight

11 CFG example S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left

12 NLE12 Beyond regular languages: Context- Free Grammars S  NP VP NP  Det Nominal Nominal  Noun VP  V Det  the Det  a Noun  flight V  left

13 CFGs: set of rules S -> NP VP – This says that there are units called S, NP, and VP in this language – That an S consists of an NP followed immediately by a VP – Doesn’t say that that’s the only kind of S – Nor does it say that this is the only place that NPs and VPs occur

14 Generativity As with FSAs you can view these rules as either analysis or synthesis machines – Generate strings in the language – Reject strings not in the language – Impose structures (trees) on strings in the language How can we define grammatical vs. ungrammatical sentences?

15 Derivations A derivation is a sequence of rules applied to a string that accounts for that string – Covers all the elements in the string – Covers only the elements in the string

16 Derivations as Trees S NPVP NP VerbPro Nom DetNoun Iprefermorningaflight

17 CFGs more formally A context-free grammar has 4 parameters (“is a 4-tuple”) 1)A set of non-terminal symbols (“variables”) N 2)A set of terminal symbols  (disjoint from N) 3)A set of productions P, each of the form A ->  Where A is a non-terminal and  is a string of symbols from the infinite set of strings (   N)* 4)A designated start symbol S

18 Defining a CF language via derivation A string A derives a string B if – A can be rewritten as B via some series of rule applications More formally: – If A ->  is a production of P –  and  are any strings in the set (   N)* – Then we say that  A  directly derives  or  A    – Derivation is a generalization of direct derivation – Let  1,  2, …  m be strings in (   N)*, m>= 1, s.t.  1   2,  2   3 …  m-1   m We say that  1 derives  m or  1*   m – We then formally define language L G generated by grammar G A set of strings composed of terminal symbols derived from S L G = {w | w is in  * and S *  w}

19 NLE19 Derivations A DERIVATION of a string is a sequence of rule applications – E.g., the string “a flight” can be derived from the grammar above and symbol NP by the (leftmost first) derivation NP => Det Nominal => a Nominal => a Noun => a flight Derivations can be visualized as PARSE TREES The LANGUAGE defined by a CFG is the set of strings derivable from the start symbol S (for Sentence)

20 NLE20 Derivations and parse trees

21 NLE 21 A more formal definition A CFG is a 4-tuple consisting of

22 NLE22 What `context free’ means

23 NLE23 Derivations and languages The language L G GENERATED by a CFG grammar G is the set of strings of TERMINAL symbols that can be derived from the start symbol S using the production rules in G – L G = {w | w is in  * and S derives w} The strings in L G are called GRAMMATICAL The strings not in L G are called UNGRAMMATICAL

24 NLE24 Grammar development One of the most basic skills in NLE is the ability to write a CFG for some fragment of a language (e.g., the dates) We’ll briefly cover some of the issues to be addressed when writing small CFG grammars

25 CFG in PYTHON NLTK, 8.3

26 ANALISI SINTATTICA TOP-DOWN search: the parse tree has to be rooted in the start symbol S – EXPECTATION-DRIVEN parsing – Esempio; RECURSIVE DESCENT BOTTOM-UP search: the parse tree must be an analysis of the input – DATA-DRIVEN parsing – Esempio: SHIFT-REDUCE

27 TOP-DOWN PARSING CON NLTK Recursive descent parsing (NLTK, 8.3) – nltk.RecursiveDescentParser(grammar) – nltk.app.rdparser()

28 BOTTOM-UP PARSING CON NLTK Shift-reduce (NLTK, 8.3, p. 305) – nltk.app.srparser() – ShiftReduceParser(grammar)

29 MODELLI PIU’ AVANZATI DI PARSING Left corner (NLTK) Chart (NLTK)

30 DEPENDENCIES E DEPENDENCY GRAMMAR (NLTK, 8.5)

31 IL PROBLEMA DELL’AMBIGUITA’ Ambiguity – Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867 Avoiding reparsing

32 COMMON STRUCTURAL AMBIGUITIES COORDINATION ambiguity – OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN ATTACHMENT ambiguity: – Gerundive VP attachment ambiguity I saw the Eiffel Tower flying to Paris – PP attachment ambiguity I shot an elephant in my pajamas

33 PP ATTACHMENT AMBIGUITY

34 AMBIGUITY: SOLUTIONS Use a PROBABILISTIC GRAMMAR (not covered in this module) Use semantics

35 SCRIVERE UNA GRAMMATICA NLTK, 8.6


Download ppt "TEORIE E TECNICHE DEL RICONOSCIMENTO Linguistica computazionale in Python: -Analisi sintattica (parsing)"

Similar presentations


Ads by Google