Grammars August 26, 2005 11/27/2018.

Grammars August 26, 2005 11/27/2018

Outline for Grammar/Parsing
Context-Free Grammars and Constituency Some common CFG phenomena for English Sentence-level constructions NP, PP, VP Coordination Subcategorization Top-down and Bottom-up Parsing 11/27/2018

Agreement How can we modify our grammar to handle these agreement phenomena? We may expand our with multiple set of rules 3SgNP  … Non3SgNP  … But this will double the size of the grammar. A better way to deal with agreement problems without exploding the size of the grammar by parameterizing each non-terminal with feature structures. 11/27/2018

Possible CFG Solution S -> NP VP NP -> Det Nominal
VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV Np … 11/27/2018

CFG Solution for Agreement
It works and stays within the power of CFGs But its ugly And it doesn’t scale all that well 11/27/2018

SubCategorization A verb phrase may consists of a verb and a number of constituents. VP  Verb -- disappear VP  Verb NP -- prefer a morning flight VP  Verb NP PP -- leave Ankara in the morning VP  Verb PP -- leaving on Monday VP  Verb S -- You said there is only one flight Although a verb phrase can have many possible of constituents, not every verb is compatible with every verb phrase. Verbs have preferences for the kinds of constituents they co-occur with. Transitive verbs Intransitive verbs Modern grammars distinguish too many subcategories (100 subcategories) 11/27/2018

Subcategorization Sneeze: John sneezed
Find: Please find [a flight to NY]NP Give: Give [me]NP[a cheaper fare]NP Help: Can you help [me]NP[with a flight]PP Prefer: I prefer [to leave earlier]TO-VP Said: You said [United has a flight]S … 11/27/2018

Some SubCategorization Frames
Frame Verb Example  eat,sleep I want to eat NP prefer I prefer a morning flight NP NP show Show me all flights from Ankara PPfrom PPto fly I would like to fly from Ankara to Istanbul NP PPwith help Can you help me with a flight VPto prefer I would prefer to go by THY. VPbare can I can go from Ankara S mean This means THY has a hub in Istanbul 11/27/2018

Subcategorization *John sneezed the book *I prefer United has a flight
*Give with a flight Subcat expresses the constraints that a predicate (verb for now) places on the number and syntactic types of arguments it wants to take (occur with). 11/27/2018

So? So the various rules for VPs overgenerate.
They permit the presence of strings containing verbs and arguments that don’t go together For example VP -> V NP therefore Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP 11/27/2018

Forward Pointer It turns out that verb subcategorization facts will provide a key element for semantic analysis (determining who did what to who in an event). 11/27/2018

Possible CFG Solution VP -> V VP -> V NP VP -> IntransV
VP -> V NP PP … VP -> IntransV VP -> TransV NP VP -> TransPP NP PP … 11/27/2018

Movement Core example My travel agent booked the flight 11/27/2018

Movement Core example [[My travel agent]NP [booked [the flight]NP]VP]S I.e. “book” is a straightforward transitive verb. It expects a single NP arg within the VP as an argument, and a single NP arg as the subject. 11/27/2018

Movement What about? Which flight do you want me to have the travel agent book? The direct object argument to “book” isn’t appearing in the right place. It is in fact a long way from where its supposed to appear. And note that its separated from its verb by 2 other verbs. 11/27/2018

CFGs: a summary CFGs appear to be just about what we need to account for a lot of basic syntactic structure in English. But there are problems That can be dealt with adequately, although not elegantly, by staying within the CFG framework. There are simpler, more elegant, solutions that take us out of the CFG framework (beyond its formal power) Syntactic theories: HPSG, LFG, CCG, Minimalism, etc 11/27/2018

Other Syntactic stuff Grammatical Relations Subject Object Complement
I booked a flight to New York The flight was booked by my agent. Object Complement I said that I wanted to leave 11/27/2018

Dependency Parsing Word to word links instead of constituency
Based on the European rather than American traditions But dates back to the Greeks The original notions of Subject, Object and the progenitor of subcategorization (called ‘valence’) came out of Dependency theory. Dependency parsing is quite popular as a computational model Since relationships between words are quite useful 11/27/2018

Parsing Parsing with a CFG is the task of assigning a correct parse tree (or derivation) to a string given some grammar. The correct means that it is consistent with the input and grammar It doesn’t mean that it’s the “right” tree in global sense of correctness. The leaves of the parse tree cover all and only the input, and that parse tree corresponds to a valid derivation according to the grammar. The parsing can be viewed as a search. The search space corresponds to the space of parse trees generated by the grammar. The search is guided by the structure of space and by the input. First, we will look at basic (bad) methods of the parsing. After seeing what’s wrong with them, we will look at better methods. 11/27/2018

Basic Top-Down Parsing
A top-down parser searches a parse tree by trying to build from the root node S (start symbol) down to leaves. First, we create the root node, then we create its children. We chose one of its children and then we create its children. We can search the search space of the parse trees: breadth first search -- level by level search depth first search -- first we search one of the children 11/27/2018

Top Down Space 11/27/2018

Basic Bottom-Up Parsing
In bottom-up parsing, the parser starts with the words of input, tries to build parse trees from words up. The parser is successful if the parser succeeds building a parse tree rooted in the start symbol that covers all of the input. 11/27/2018

Bottom-Up Space 11/27/2018

Top-Down or Bottom-Up? Each of top-down and bottom-up parsing techniques has its own advantages and disadvantages. The top-down strategy never wastes time exploring trees cannot result in the start symbol (starts from there). On the other hand, bottom-up strategy may waste time in those kind of trees. But the top-down strategy spends with trees which are not consistent with the input. On the other hand, bottom-up strategy never suggests trees that are not at least locally grounded in the actual input. None of these two basic strategies are good enough to be used in the parsing of natural languages. 11/27/2018

Search Control Issues How our search will take place?
Which node in the tree will be expanded next? Which applicable grammar rule will be tried first? The answers of these questions determine how to control our search in the search space of trees. Are we going to use depth-first or breath-first search? 11/27/2018

11/27/2018

A Top-Down Depth-First Left-to-Right Search
In this top-down search, we will use: depth-first strategy we will choose a node and explore its sub-trees left-to-right we will choose the left-most node to explore For the chosen node, we will choose one of applicable rules (the first one) and we will apply it into that node. If there is more than one applicable rule, we keep a pointer to other applicable rules in a stack; so that if our choice fails we can backtrack to other alternatives. Let us look at how this method for our grammar and the following input: Does this flight include a meal? 11/27/2018

Top-Down, Depth-First, Left-to-Right Search
11/27/2018

TopDownDepthFirstLeftoRight (II)
11/27/2018

TopDownDepthFirstLeftoRight (III)
flight flight 11/27/2018

TopDownDepthFirstLeftoRight (IV)

Top-Down Parsing with Bottom-Up Filtering
When we choose applicable rules, we can use bottom-up information. For example, in our grammar we have: S  NP VP S  Aux NP VP S  VP If we want to parse the input: Does this flight serve a meal? Although all three of these rules are applicable, the first and the third ones will definitely fail because NP and VP cannot derive to strings starting with does (an auxiliary verb here). Can we make this decision before we choose an applicable rule? Yes. We can use left-corner filtering. 11/27/2018

Adding Bottom-Up Filtering
11/27/2018

Filtering with Left Corners
The parser should not consider any grammar rule if the current input serve as the first word along the left edge of some derivation from this rule. the first word along the left edge of a derivation is called as the left-corner of the tree. B is a left-corner of A if the following relation holds: A * B In other words, B can be the left-corner of A if there is a derivation of A that begins with B. We will ask whether a part of speech (of the current input) can be left-corner of the current-node (non-terminal). 11/27/2018

Left Corner prefer (or Verb) is a left-corner of VP VP NP NOM
Verb Det Noun Noun prefer a morning flight 11/27/2018

Filtering with Left-Corners (cont.)
Do not consider any expansion where the current input can not serve as the left-corner of that expansion. Category Left-Corners S Det, ProperNoun, Aux, Verb NP Det, ProperNoun NOM Noun VP Verb PP Prep 11/27/2018

Problems with Basic Top-Down Parser
Even the top-down parser with bottom-up filtering has three problems that make it an insufficient solution to general-purpose parsing problem. Left-Recursion Ambiguity Inefficient Reparsing of Subtrees First we will talk about these three problems. Then we will present Early algorithm to avoid these problems. 11/27/2018

Left-Recursion When left-recursive grammars are used, top-down depth-first left-to-right parsers can dive into an infinite path. A grammar is left-recursive if it contains at least one non-terminal A such that: A * A This kind of structures are common in natural language grammars. NP  NP PP We can convert a left-recursive grammar into an equivalent grammar which is not left-recursive. A  A |  ==> A  A’ A’  A’ |  Unfortunately, the resulting grammar may no longer be the most grammatically natural way to represent syntactic structures. 11/27/2018

Left-Recursion What happens in the following situation S -> NP VP
S -> Aux NP VP NP -> NP PP NP -> Det Nominal … With the sentence starting with Did the flight… 11/27/2018

Ambiguity One morning I shot an elephant in my pyjamas.
How he got into my pajamas I don’t know. (Groucho Marx) 11/27/2018

Ambiguity Top-down parser is not efficient at handling ambiguity.
Local ambiguity lead to hypotheses that are locally reasonable but eventually lead nowhere. They lead to backtracking. Global ambiguity potentially leads to multiple parses for the same input (if we force it to do). The parsers without disambiguation tools must simply return all possible parses. But most of disambiguation tools require statistical and semantic knowledge. There will be many unreasonable parses. But most of applications do not want all possible parses, they want a single correct parse. The reason for many unreasonable parses, exponential number of parses are possible for certain inputs. 11/27/2018

Ambiguity - Example If we add the following rules to our grammar:
VP  VP PP NP  NP PP The following input: Show me the meal on flight 286 from Ankara to Istanbul. will have a lot of parses (14 parses?). Some of them are really strange parses. If we have PP  Prep NP Number of NP parses Number of PPs 2 2 5 3 14 4 11/27/2018

Lots of ambiguity Church and Patil (1982)
Number of parses for such sentences grows at rate of number of parenthesizations of arithmetic expressions Which grow with Catalan numbers PPs Parses 1 2 2 5 3 14 4 132 5 469 11/27/2018

Avoiding Repeated Work
Parsing is hard, and slow. It’s wasteful to redo stuff over and over and over. Consider an attempt to top-down parse the following as an NP A flight from Indanapolis to Houston on TWA Grammar Rules: NP  Det NOM NP  NP PP NP  ProperNoun 11/27/2018

flight 11/27/2018

11/27/2018

Repeated Parsing of Subtrees
The parser often builds valid trees for portion of the input, then discards them during backtracking, only to find that it has to rebuild them again. The parser creates small parse trees that fail because they do not cover all the input. The parser backtracks to cover more input, and recreates subtrees again and again. The same thing is repeated more than once unnecessarily. 11/27/2018

Dynamic Programming Does not do repeated work
We want a parsing algorithm (using dynamic programming technique) that fills a table with solutions to subproblems that: Does not do repeated work Does top-down search with bottom-up filtering Solves the left-recursion problem Solves an exponential problem in O(N3) time. The answer is Earley Algorithm. 11/27/2018

Earley Algorithm Fills a table in a single pass over the input.
The table will be size N+1 (N is the number of words) Table entries represent Completed constituents and their locations In-progress constituents Predicted constituents Each possible subtree is represented only once, and it can be shared by all the parses that need it. 11/27/2018

States a subtree corresponding to a single grammar rule
A state in a table entry contains three kinds of information: a subtree corresponding to a single grammar rule information about the progress made in completing this subtree the position of subtree with respect to to the input. We use a dot in the state’s grammar rule to indicate the progress made in recognizing it. We call this resulting structure dotted rule. A state’s position are represented by by two numbers indicating that where the state starts and where its dot lies. 11/27/2018

Earley States The table-entries are called states and are represented with dotted-rules. S -> · VP A VP is predicted NP -> Det · Nominal An NP is in progress VP -> V NP · A VP has been found 11/27/2018

Earley States/Locations
We need to know where these things are in the input: S -> · VP [0,0] A VP is predicted at the start of the sentence NP -> Det · Nominal [1,2] An NP is in progress; the Det goes from 1 to 2 VP -> V NP · [0,3] A VP has been found starting at 0 and ending at 3 11/27/2018

Graphically 11/27/2018

Earley Algorithm March through chart left-to-right.
At each step, apply 1 of 3 operators Predictor Create new states representing top-down expectations Scanner Match word predictions (rule with word after dot) to words Completer When a state is complete, see what rules were looking for that completed constituent 11/27/2018

Predictor Given a state With a non-terminal to right of dot
That is not a part-of-speech category Create a new state for each expansion of the non-terminal Place these new states into same chart entry as generated state, beginning and ending where generating state ends. So predictor looking at S -> . VP [0,0] results in VP -> . Verb [0,0] VP -> . Verb NP [0,0] 11/27/2018

Scanner Given a state With a non-terminal to right of dot
That is a part-of-speech category If the next word in the input matches this part-of-speech Create a new state with dot moved over the non-terminal So scanner looking at VP -> . Verb NP [0,0] If the next word, “book”, can be a verb, add new state: VP -> Verb . NP [0,1] Add this state to chart entry following current one Note: Earley algorithm uses top-down input to disambiguate POS! Only POS predicted by some state can get added to chart! 11/27/2018

Completer Applied to a state when its dot has reached right end of role. Parser has discovered a category over some span of input. Find and advance all previous states that were looking for this category copy state, move dot, insert in current chart entry Given: NP -> Det Nominal . [1,3] VP -> Verb. NP [0,1] Add VP -> Verb NP . [0,3] 11/27/2018

Earley: how do we know we are done?
How do we know when we are done?. Find an S state in the final column that spans from 0 to n+1 and is complete. If that’s the case you’re done. S –> α · [0,n+1] 11/27/2018

Earley So sweep through the table from 0 to n+1…
New predicted states are created by starting top-down from S New incomplete states are created by advancing existing states as new constituents are discovered New complete states are created in the same way. 11/27/2018

Earley More specifically… Predict all the states you can upfront
Read a word Extend states based on matches Add new predictions Go to 2 Look at N+1 to see if you have a winner 11/27/2018

Example Book that flight
We should find… an S from 0 to 3 that is a completed state… 11/27/2018

Example 11/27/2018

Earley example cont’d 11/27/2018

What is it? What kind of parser did we just describe (trick question).
Earley parser… yes Not a parser – a recognizer The presence of an S state with the right attributes in the right place indicates a successful recognition. But no parse tree… no parser That’s how we solve (not) an exponential problem in polynomial time 11/27/2018

Converting Earley from Recognizer to Parser
With the addition of a few pointers we have a parser Augment the “Completer” to point to where we came from. 11/27/2018

Augmenting the chart with structural information
11/27/2018

Retrieving Parse Trees from Chart
All the possible parses for an input are in the table We just need to read off all the backpointers from every complete S in the last column of the table Find all the S -> X . [0,N+1] Follow the structural traces from the Completer Of course, this won’t be polynomial time, since there could be an exponential number of trees So we can at least represent ambiguity efficiently 11/27/2018

Earley and Left Recursion
Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search. Never place a state into the chart that’s already there Copy states before advancing them 11/27/2018

Earley and Left Recursion: 1
S -> NP VP NP -> NP PP Predictor, given first rule: S -> · NP VP [0,0] Predicts: NP -> · NP PP [0,0] stops there since predicting same again would be redundant 11/27/2018

Earley and Left Recursion: 2
When a state gets advanced make a copy and leave the original alone… Say we have NP -> · NP PP [0,0] We find an NP from 0 to 2 so we create NP -> NP · PP [0,2] But we leave the original state as is 11/27/2018

Dynamic Programming Approaches
Earley Top-down, no filtering, no restriction on grammar form CYK Bottom-up, no filtering, grammars restricted to Chomsky-Normal Form (CNF) Details are not important... Bottom-up vs. top-down With or without filters With restrictions on grammar form or not 11/27/2018

How to do parse disambiguation
Probabilistic methods Augment the grammar with probabilities Then modify the parser to keep only most probable parses And at the end, return the most probable parse 11/27/2018

Probabilistic CFGs The probabilistic model
Assigning probabilities to parse trees Getting the probabilities for the model Parsing with probabilities Slight modification to dynamic programming approach Task is to find the max probability tree for an input 11/27/2018

Probability Model Attach probabilities to grammar rules
The expansions for a given non-terminal sum to 1 VP -> Verb .55 VP -> Verb NP .40 VP -> Verb NP NP .05 Read this as P(Specific rule | LHS) 11/27/2018

Probability Model (1) A derivation (tree) consists of the set of grammar rules that are in the tree The probability of a tree is just the product of the probabilities of the rules in the derivation. 11/27/2018

Probability Model (1.1) The probability of a word sequence (sentence) is the probability of its tree in the unambiguous case. It’s the sum of the probabilities of the trees in the ambiguous case. 11/27/2018

Getting the Probabilities
From an annotated database (a treebank) So for example, to get the probability for a particular VP rule just count all the times the rule is used and divide by the number of VPs overall. 11/27/2018

Assumptions We’re assuming that there is a grammar to be used to parse with. We’re assuming the existence of a large robust dictionary with parts of speech We’re assuming the ability to parse (i.e. a parser) Given all that… we can parse probabilistically 11/27/2018

Typical Approach Bottom-up (CYK) dynamic programming approach
Assign probabilities to constituents as they are completed and placed in the table Use the max probability for each constituent going up 11/27/2018

Parsing with Early Algorithm
New predicted states are based on existing table entries (predicted or in-progress) that predict a certain constituent at that spot. New in-progress states are created by updating older states to reflect the fact that the previously expected completed constituents have been located. New complete states are created when the dot in an in-progress state moves to the end. 11/27/2018

More Specifically 1. Predict all the states
2. Read an input. See what predictions you can match Extend matched states, add new predictions Go to next state (state 2) 3. At the end, see if state[N+1] contains a complete S 11/27/2018

Example: Chart[0]    S [0,0] Dummy start state
S   NP VP [0,0] Predictor NP   Det NOM [0,0] Predictor NP   ProperNoun [0,0] Predictor S   Aux NP VP [0,0] Predictor S   VP [0,0] Predictor VP   Verb [0,0] Predictor VP   Verb NP [0,0] Predictor 11/27/2018

Example: Chart[1] Verb  book  [0,1] Scanner
VP  Verb  [0,1] Completer S  VP  [0,1] Completer VP  Verb  NP [0,1] Completer NP   Det NOM [1,1] Predictor NP   ProperNoun [1,1] Predictor 11/27/2018

Example: Chart[2] Det  that  [1,2] Scanner
NP  Det  NOM [1,2] Completer NOM   Noun [2,2] Predictor NOM   Noun NOM [2,2] Predictor 11/27/2018

Example: Chart[3] Noun  flight  [2,3] Scanner
NOM  Noun  [2,3] Completer NOM  Noun  NOM [2,3] Completer NP  Det NOM  [1,3] Completer VP  Verb NP  [0,3] Completer S  VP  [0,3] Completer NOM   Noun [3,3] Predictor NOM   Noun NOM [3,3] Predictor 11/27/2018

Earley Algorithm The Earley algorithm has three main functions that do all the work. Predictor: Adds predictions into the chart. It is activated when the dot (in a state) is in the front of a non-terminal which is not a part of speech. Completer: Moves the dot to the right when new constituents are found. It is activated when the dot is at the end of a state. Scanner: Reads the input words and enters states representing those words into the chart. It is activated when the dot (in a state) is in the front of a non-terminal which is a part of speech. The Early algorithm uses theses functions to maintain the chart. 11/27/2018

Predictor procedure PREDICTOR((A    B , [i,j]))
for each (B  ) in GRAMMAR-RULES-FOR(B,grammar) do ENQUEUE((B   , [j,j]), chart[j]) end 11/27/2018

Completer procedure COMPLETER((B    , [j,k]))
for each (A    B , [i,j]) in chart[j] do ENQUEUE((A   B  , [i,k]), chart[k]) end 11/27/2018

Scanner procedure SCANNER((A    B , [i,j]))
if (B  PARTS-OF-SPEECH(word[j]) then ENQUEUE((B  word[j]  , [j,j+1]), chart[j+1]) end 11/27/2018

Enqueue procedure ENQUEUE(state,chart-entry)
if state is not already in chart-entry then Add state at the end of chart-entry) end 11/27/2018

Early Code function EARLY-PARSE(words,grammar) returns chart
ENQUEUE((   S, [0,0], chart[0]) for i from 0 to LENGTH(words) do for each state in chart[i] do if INCOMPLETE?(state) and NEXT-CAT(state) is not a PS then PREDICTOR(state) elseif INCOMPLETE?(state) and NEXT-CAT(state) is a PS then SCANNER(state) else COMPLETER(state) end return(chart) 11/27/2018

Retrieving Parse Trees from A Chart
To retrieve parse trees from a chart, the representation of each state must be augmented with an additional field to store information about the completed states that generated its constituents. To collect parse trees, we have to update COMPLETER such that it should add a pointer to the older state onto the list of previous-states of the new state. Then, the parse tree can be created by retrieving these list of previous-states (starting from the completed state of S). 11/27/2018

Chart[0] - with Parse Tree Info
S0    S [0,0] [] Dummy start state S1 S   NP VP [0,0] [] Predictor S2 NP   Det NOM [0,0] [] Predictor S3 NP   ProperNoun [0,0] [] Predictor S4 S   Aux NP VP [0,0] [] Predictor S5 S   VP [0,0] [] Predictor S6 VP   Verb [0,0] [] Predictor S7 VP   Verb NP [0,0] [] Predictor 11/27/2018

S8 Verb  book  [0,1] [] Scanner S9 VP  Verb  [0,1] [S8] Completer S10 S  VP  [0,1] [S9] Completer S11 VP  Verb  NP [0,1] [S8] Completer S12 NP   Det NOM [1,1] [] Predictor S13 NP   ProperNoun [1,1] [] Predictor 11/27/2018

S14 Det  that  [1,2] [] Scanner S15 NP  Det  NOM [1,2] [S14] Completer S16 NOM   Noun [2,2] [] Predictor S17 NOM   Noun NOM [2,2] [] Predictor 11/27/2018

S18 Noun  flight  [2,3] [] Scanner S19 NOM  Noun  [2,3] [S18] Completer S20 NOM  Noun  NOM [2,3] [S18] Completer S21 NP  Det NOM  [1,3] [S14,S19] Completer S22 VP  Verb NP  [0,3] [S8,S21] Completer S23 S  VP  [0,3] [S22] Completer S24 NOM   Noun [3,3] [] Predictor S25 NOM   Noun NOM [3,3] [] Predictor 11/27/2018

Global Ambiguity S  Verb S  Noun Chart[0]
S0    S [0,0] [] Dummy start state S1 S   Verb [0,0] [] Predictor S2 S   Noun [0,0] [] Predictor Chart[1] S3 Verb  book  [0,1] [] Scanner S4 Noun  book  [0,1] [] Scanner S5 S  Verb  [0,1] [S3] Predictor S6 S  Noun  [0,1] [S4] Predictor 11/27/2018

Problems with CFGs agreement subcategorization number/person agreement
We know that CFGs cannot handle certain things which are available in natural languages. In particular, CFGs cannot handle very well: agreement subcategorization We will look at a constraint-based representation schema which will allow us to represent fine-grained information such as: number/person agreement semantic categories like mass/count 11/27/2018

Agreement Problem What is the problem with the following CFG rules:
S  NP VP NP  Det NOMINAL NP  Pronoun Answer: Since these rules do not enforce number and person agreement constraints, they over-generate and allow the following constructs: * They sleeps * He sleep * A dogs * These dog 11/27/2018

An Awkward Solution to Agreement Problem
One way to handle the agreement phenomena in a strictly context-free approach is to encode the constraints into the non-terminal categories and then into CFG rules. For example, our grammar will be: S  SgS | PlS SgS  SgNP SgVP PlS  PlNP PlVP SgNP  SgDet SgNOMINAL SgNP  SgPronoun PlNP  PlDet PlNOMINAL PlNP  PlPronoun This solution will explode the number of non-terminals and rules The resulting grammar will not be a clean grammar. 11/27/2018

Subcategorization Problem
What is the problem with the following CFG rules: VP  Verb VP  Verb NP Answer: Since these rules do not enforce subcategorization constraints, they over-generate and allow the following constructs: * They take * They sleep a glass 11/27/2018

An Awkward Solution to Subcategorization Problem
Again, one way to handle the subcategorization phenomena in a strictly context-free approach is to encode the constraints into the non-terminal categories and then into CFG rules. For example, our grammar will be: VP  IntransVP | TransVP IntransVP  IntransVerb TransVP  TransVerb NP This solution will again explode the number of non-terminals and rules Remember that we may almost 100 subcategorization for English verbs. The resulting grammar will not be a clean grammar. 11/27/2018

A Better Solution A better solution for agreement and subcategorization problems is to treat terminals and non-terminals as complex objects with associated properties (called features) that can be manipulated. So, we may code rules as follows: (not CF rules anymore) S  NP VP Only if the number of the NP is equal to the number of the VP. Where number of are features of NP and VP, and they are manipulated (they are checked to see whether they are equal or not) by the rule above. 11/27/2018

Feature Structures A feature is an atomic symbol.
We can encode the properties associated with grammatical constituents (terminals and non-terminals) by using Feature Structures. A feature structure is a set of feature-value pairs. A feature is an atomic symbol. A value is either an atomic value or another feature structure. A feature structure can be illustrated by a matrix-like diagram (called attribute-value matrix). 11/27/2018

Example - Feature Structures
11/27/2018

Reentrant Feature Structures
We will allow multiple features in a feature structure to share the same values. They share the same structures not just that they have same value. 11/27/2018

Feature Path A feature path is a list of features through a feature structure leading to a particular value. For example, <HEAD AGREEMENT NUMBER> leads to SG <HEAD SUBJECT AGREEMENT PERSON> leads to 3 We will use feature paths in the constraints of the rules. S  NP VP <NP AGREEMENT> = <VP AGREEMENT> 11/27/2018

DAG Representation of Feature Structures
A feature structure can also be represented by using a DAG (directed acyclic graph).  NP CAT  AGREEMENT NUMBER  SG PERSON  3 11/27/2018

DAG of A Reentrant Feature Structure
  SG  S  3 CAT HEAD AGREEMENT SUBJECT PERSON NUMBER 11/27/2018

Unification of Feature Structures
By the unification of feature structures, we will: Check the compatibility of two feature structures. Merge the information in two feature structures. The result of a unification operation of two feature structures can be: unifiable -- they will merge into a single feature structure fails -- if two feature structures are not compatible. We will look at how does this unification process perform the above tasks. 11/27/2018

Unification Example We say that two feature structures can be unified if two feature structures that make them up are compatible. succeeds fails Unification Operator 11/27/2018

Unification Example (cont.)
The unification process can bind an undefined value to a value, or can merge the information in two feature structures. 11/27/2018

Unification Example -- Complex Structures
11/27/2018

Subsumption A more abstract (less specific) feature structure subsumes an equally or more specific one. Subsumption is represented by the operator  A feature structure F subsumes a feature structure G ( F  G) if and only if : For every structure x in F, F(x)  G(x) (where F(x) means the value of the feature x of the feature structure F). For all paths p and q in F such that F(p)=F(q), it is also the case that G(p)=G(q). 11/27/2018

Subsumption Example Consider the following feature structures: (1) (2)
(3) (1)  (3) (2)  (3) but there is no subsumption relation between (1) and (2) 11/27/2018

Feature Structures in The Grammar
We will incorporate the feature structures and the unification process as follows: All constituents (non-terminals) will be associated with feature structures. Sets of unification constraints will be associated with grammar rules, and these rules must be satisfied for the rule to be satisfied. These attachments accomplish the following goals: To associate feature structures with both lexical items and instances of grammatical categories. To guide the composition of feature structures for larger grammatical constituents based on the feature structures of their component parts. To enforce compatibility constraints between specified parts of grammatical constraints. 11/27/2018

Unification Constraints
Each grammar rule will be associated with a set of unification constraints. 0  1 … n {set of unification constraints} Each unification constraint will be in one of the following forms. < i feature path> = Atomic value < i feature path> = < j feature path> 11/27/2018

Unification Constraints -- Example
For example, the following rule S  NP VP Only if the number of the NP is equal to the number of the VP. will be represented as follows: <NP NUMBER> = <VP NUMBER> 11/27/2018

Agreement Constraints
S  NP VP <NP NUMBER> = <VP NUMBER> S  Aux NP VP <Aux AGREEMENT> = <NP AGREEMENT> NP  Det NOMINAL <Det AGREEMENT> = <NOMINAL AGREEMENT> <NP AGREEMENT> = <NOMINAL AGREEMENT> NOMINAL  Noun <NOMINAL AGREEMENT> = <Noun AGREEMENT> VP  Verb NP <VP AGREEMENT> = <Verb AGREEMENT> 11/27/2018

Agreement Constraints -- Lexicon Entries
Aux  does <Aux AGREEMENT NUMBER> = SG <Aux AGREEMENT PERSON> = 3 Aux  do <Aux AGREEMENT NUMBER> = PL Det  these <Det AGREEMENT NUMBER> = PL Det  this <Det AGREEMENT NUMBER> = SG Verb  serves <Verb AGREEMENT NUMBER> = SG <Verb AGREEMENT PERSON> = 3 Verb  serve <Verb AGREEMENT NUMBER> = PL Noun  flights <Noun AGREEMENT NUMBER> = PL Noun  flight <Noun AGREEMENT NUMBER> = SG 11/27/2018

Head Features Certain features are copied from children to parent in feature structures. For example, AGREEMENT feature in NOMINAL is copied into NP. The features for most grammatical categories are copied from one of the children to the parent. The child that provides the features is called head of the phrase, and the features copied are referred to as head features. A verb is a head of a verb phrase, and a nominal is a head of a noun phrase. We may reflect these constructs in feature structures as follows: NP  Det NOMINAL <Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT> <NP HEAD> = <NOMINAL HEAD> VP  Verb NP <VP HEAD> = <Verb HEAD> 11/27/2018

SubCategorization Constraints
For verb phrases, we can represent subcategorization constraints using three techniques: Atomic Subcat Symbols Encoding Subcat lists as feature structures Minimal Rule Approach (using lists directly) We may use any of these representations. 11/27/2018

Atomic Subcat Symbols VP  Verb <VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = INTRANS VP  Verb NP <VP HEAD SUBCAT> = TRANS VP  Verb NP NP <VP HEAD SUBCAT> = DITRANS Verb  slept <Verb HEAD SUBCAT> = INTRANS Verb  served <Verb HEAD SUBCAT> = TRANS Verb  gave <Verb HEAD SUBCAT> = DITRANS 11/27/2018

Encoding Subcat Lists as Features
Verb  gave <Verb HEAD SUBCAT FIRST CAT> = NP <Verb HEAD SUBCAT SECOND CAT> = NP <Verb HEAD SUBCAT THIRD> = END VP  Verb NP NP <VP HEAD> = <Verb HEAD> <VP HEAD SUBCAT FIRST CAT> = <NP CAT> <VP HEAD SUBCAT SECOND CAT> = <NP CAT> <VP HEAD SUBCAT THIRD> = END We are only encoding lists using positional features 11/27/2018

Minimal Rule Approach In fact, we do not use symbols like SECOND, THIRD. They are just used to encode lists. We can use lists directly (similar to LISP). <SUBCAT FIRST CAT> = NP <SUBCAT REST FIRST CAT> = NP <SUBCAT REST REST> = END 11/27/2018

Subcategorization Frames for Lexical Entries
We can use two different notations to represent subcategorization frames for lexical entries (verbs). Verb  want <Verb HEAD SUBCAT FIRST CAT> = NP <Verb HEAD SUBCAT FIRST CAT> = VP <Verb HEAD SUBCAT FIRST FORM> = INFINITITIVE 11/27/2018

Implementing Unification
The representation we have used cannot facilitate the destructive merger aspect of unification algorithm. For this reason, we add additional features (additional edges to DAGs) into our feature structures. Each feature structure will consists of two fields: Content Field -- This field can be NULL or may contain ordinary feature structure. Pointer Field -- This field can be NULL or may contain a pointer into another feature structure. If the pointer field of a DAG is NULL, the content field of DAG contains the actual feature structure to be processed. If the pointer field of a DAG is not NULL, the destination of that pointer represents the actual feature structure to be processed. 11/27/2018

Extended Feature Structures
 11/27/2018

Extended DAG  C P Num Per Null 3 SG 11/27/2018

Unification of Extended DAGs
 C P Num Null SG  C P Per Null 3 11/27/2018

Unification of Extended DAGs (cont.)
 C P Num Null SG Per 3 11/27/2018

Unification Algorithm
function UNIFY(f1,f2) returns fstructure or failure f1real  real contents of f1 /* dereference f1 */ f2real  real contents of f2 /* dereference f2 */ if f1real is Null then { f1.pointer  f2; return f2; } else if f2real is Null then { f2.pointer  f1; return f1; } else if f1real and f2real are identical then { f1.pointer  f2; return f2; } else if f1real and f2real are complex feature structures then { f2.pointer  f1; for each feature in f2real do { otherfeature  Find or create a feature corresponding to feature in f1real; if UNIFY(feature.value,otherfeature.value) returns failure then return failure; } return f1; } else return failure; 11/27/2018

Example - Unification of Complex Structures
11/27/2018

Example - Unification of Complex Structures (cont.)
• Null C Agr Num SG Sub Per 3 11/27/2018

Parsing with Unification Constraints
Let us assume that we have augmented our grammar with sets of unification constraints. What changes do we need to make a parser to make use of them? Building feature structures and associate them with sub-trees. Unifying feature structures when sub-trees are created. Blocking ill-formed constituents 11/27/2018

Earley Parsing with Unification Constraints
What do we have to do to integrate unification constraints with Early Parser? Building feature structures (represented as DAGs) and associate them with states in the chart. Unifying feature structures as states are advanced in the chart. Blocking ill-formed states from entering the chart. The main change will be in COMPLETER function of Earley Parser. This routine will invoke the unifier to unify two feature structures. 11/27/2018

Building Feature Structures
NP  Det NOMINAL <Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT> <NP HEAD> = <NOMINAL HEAD> corresponds to 11/27/2018

Augmenting States with DAGs
Each state will have an additional field to contain the DAG representing the feature structure corresponding to the state. When a rule is first used by PREDICTOR to create a state, the DAG associated with the state will simply consist of the DAG retrieved from the rule. For example, S   NP VP, [0,0],[],Dag1 where Dag1 is the feature structure corresponding to S  NP VP. NP   Det NOMINAL, [0,0],[],Dag2 where Dag2 is the feature structure corresponding to S  Det NOMINAL. 11/27/2018

What does COMPLETER do? When COMPLETER advances the dot in a state, it should unify the feature structure of the newly completed state with the appropriate part of the feature structure being advanced. If this unification process is succesful, the new state gets the result of the unification as its DAG, and this new state is entered into the chart. If it fails, nothing is entered into the chart. 11/27/2018

A Completion Example Parsing the phrase that flight after that is processed. NP  Det  NOMINAL, [0,1],[SDet],Dag1 Dag1 A newly completed state NOMINAL  Noun , [1,2],[SNoun],Dag2 Dag2 To advance in NP, the parser unifies the feature structure found under the NOMINAL feature of Dag2, with the feature structure found under the NOMINAL feature of Dag1. 11/27/2018

Earley Parse function EARLY-PARSE(words,grammar) returns chart
ENQUEUE((   S, [0,0], chart[0],dag) for i from 0 to LENGTH(words) do for each state in chart[i] do if INCOMPLETE?(state) and NEXT-CAT(state) is not a PS then PREDICTOR(state) elseif INCOMPLETE?(state) and NEXT-CAT(state) is a PS then SCANNER(state) else COMPLETER(state) end return(chart) 11/27/2018

Predictor and Scanner procedure PREDICTOR((A    B , [i,j],dagA))
for each (B  ) in GRAMMAR-RULES-FOR(B,grammar) do ENQUEUE((B   , [i,j],dagB), chart[j]) end procedure SCANNER((A    B , [i,j],dagA)) if (B  PARTS-OF-SPEECH(word[j]) then ENQUEUE((B  word[j]  , [j,j+1],dagB), chart[j+1]) 11/27/2018

Completer and UnifyStates
procedure COMPLETER((B    , [j,k],dagB)) for each (A    B , [i,j],dagA) in chart[j] do if newdag  UNIFY-STATES(dagB,dagA,B)  fails then ENQUEUE((A   B  , [i,k],newdag), chart[k]) end procedure UNIFY-STATES(dag1,dag2,cat) dag1cp  CopyDag(dag1); dag2cp  CopyDag(dag2); UNIFY(FollowPath(cat,dag1cp),FollowPath(cat,dag2cp)); 11/27/2018

Enqueue procedure ENQUEUE(state,chart-entry)
if state is not subsumed by a state in chart-entry then Add state at the end of chart-entry end 11/27/2018

Grammars August 26, 2005 11/27/2018.

Similar presentations

Presentation on theme: "Grammars August 26, 2005 11/27/2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grammars August 26, 2005 11/27/2018.

Similar presentations

Presentation on theme: "Grammars August 26, 2005 11/27/2018."— Presentation transcript:

Similar presentations

About project

Feedback