1 Basic Parsing with Context- Free Grammars Slides adapted from Julia Hirschberg and Dan Jurafsky
2 Homework – Getting Started Data – News articles in TDT4 – Make sure you are looking in ENG sub-directory – You need to represent each article in.arff form – You need to write a program that will extract “features” from each article – (Note that files with POS tags are now available in Eng_POSTAGGED) The.arff file contains independent variables as well as the dependent variable
3 An example Start with classifying into topic Suppose you want to start with just the words Two approaches 1. Use your intuition to choose a few words that might disambiguate 2. Start with all words
4 What would your.arff file look like? Words are the attributes. What are the values? Binary: present or not Frequency: how many times it occurs TF*IDF: how many times it occurs in this document (TF = term frequency) divided by how many times it occurs in all documents (DF = document frequency
5 news_2865.input NWIRE ENG NYT BT_9 NT_34 Reversing a policy that has kept medical errors secret for more than two decades, federal officials say they will soon allow Medicare beneficiaries to obtain data about doctors who botched their care. Tens of thousands of Medicare patients file complaints each year about the quality of care they receive from doctors and hospitals. But in many cases, patients get no useful information because doctors can block the release of assessments of their performance. Under a new policy, officials said, doctors will no longer be able to veto disclosure of the findings of investigations. Federal law has for many years allowed for review of care received by Medicare patients, and the law says a peer review organization must inform the patient of the ``final disposition of the complaint'' in each case. But the federal rules used to carry out the law say the peer review organization may disclose information about a doctor only ``with the consent of that practitioner.'' The federal manual for peer review organizations includes similar language about disclosure. Under the new plan, investigators will have to tell patients whether their care met ``professionally recognized standards of health care'' and inform them of any action against the doctor or the hospital. Patients could use such information in lawsuits and other actions against doctors and hospitals that provided substandard care. The new policy came in response to a lawsuit against the government
6 All words for news_2865.input Class BT_9dependent Reversing1independent A100 Policy20 That50 Has75 Kept3 Commonwealth0 (news_2816.input) Independent0 States0 Preemptive0 Refugees0
7 Try it! Open your file Select attributes using Chi-square You can cut and paste resulting attributes to a file Classify How does it work? Try n-grams, POS or date next in same way – How many features would each give you?
8 CFG: Example Many possible CFGs for English, here is an example (fragment): – S NP VP – VP V NP – NP DetP N | AdjP NP – AdjP Adj | Adv AdjP – N boy | girl – V sees | likes – Adj big | small – Adv very – DetP a | the the very small boy likes a girl
9 Modify the grammar
10
12 Derivations of CFGs String rewriting system: we derive a string (=derived structure) But derivation history represented by phrase- structure tree (=derivation structure)!
13 Formal Definition of a CFG G = (V,T,P,S) V: finite set of nonterminal symbols T: finite set of terminal symbols, V and T are disjoint P: finite set of productions of the form A , A V and (T V)* S V: start symbol
14 Context? The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language All it really means is that the non-terminal on the left- hand side of a rule is out there all by itself (free of context) A -> B C Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found
15 Key Constituents (English) Sentences Noun phrases Verb phrases Prepositional phrases
16 Sentence-Types Declaratives: I do not. S -> NP VP Imperatives: Go around again! S -> VP Yes-No Questions: Do you like my hat? S -> Aux NP VP WH Questions: What are they going to do? S -> WH Aux NP VP
18
19
20 NPs NP -> Pronoun – I came, you saw it, they conquered NP -> Proper-Noun – New Jersey is west of New York City – Lee Bollinger is the president of Columbia NP -> Det Noun – The president NP -> Det Nominal Nominal -> Noun Noun – A morning flight to Denver
21 PPs PP -> Preposition NP – Over the house – Under the house – To the tree – At play – At a party on a boat at night
22
23
24
26
27 Recursion We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly) NP -> NP PP[[The flight] [to Boston]] VP -> VP PP[[departed Miami] [at noon]]
28 Recursion Of course, this is what makes syntax interesting Flights from Denver Flights from Denver to Miami Flights from Denver to Miami in February Flights from Denver to Miami in February on a Friday Flights from Denver to Miami in February on a Friday under $300 Flights from Denver to Miami in February on a Friday under $300 with lunch
29 Recursion [[Flights] [from Denver]] [[[Flights] [from Denver]] [to Miami]] [[[[Flights] [from Denver]] [to Miami]] [in February]] [[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a Friday]] Etc. NP -> NP PP
30 Implications of recursion and context-freeness If you have a rule like – VP -> V NP – It only cares that the thing after the verb is an NP It doesn’t have to know about the internal affairs of that NP
31 The point VP -> V NP (I) hate flights from Denver flights from Denver to Miami flights from Denver to Miami in February flights from Denver to Miami in February on a Friday flights from Denver to Miami in February on a Friday under $300 flights from Denver to Miami in February on a Friday under $300 with lunch
32 Grammar Equivalence Can have different grammars that generate same set of strings (weak equivalence) – Grammar 1: NP DetP N and DetP a | the – Grammar 2: NP a N | NP the N Can have different grammars that have same set of derivation trees (strong equivalence) – With CFGs, possible only with useless rules – Grammar 2: NP a N | NP the N – Grammar 3: NP a N | NP the N, DetP many Strong equivalence implies weak equivalence
33 Normal Forms &c There are weakly equivalent normal forms (Chomsky Normal Form, Greibach Normal Form) There are ways to eliminate useless productions and so on
34 Chomsky Normal Form A CFG is in Chomsky Normal Form (CNF) if all productions are of one of two forms: A BC with A, B, C nonterminals A a, with A a nonterminal and a a terminal Every CFG has a weakly equivalent CFG in CNF
35 “Generative Grammar” Formal languages: formal device to generate a set of strings (such as a CFG) Linguistics (Chomskyan linguistics in particular): approach in which a linguistic theory enumerates all possible strings/structures in a language (=competence) Chomskyan theories do not really use formal devices – they use CFG + informally defined transformations
36 Nobody Uses Simple CFGs (Except Intro NLP Courses) All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another All successful parsers currently use statistics about phrase structure and about dependency Derive dependency through “head percolation”: for each rule, say which daughter is head
37 Penn Treebank (PTB) Syntactically annotated corpus of newspaper texts (phrase structure) The newspaper texts are naturally occurring data, but the PTB is not! PTB annotation represents a particular linguistic theory (but a fairly “vanilla” one) Particularities – Very indirect representation of grammatical relations (need for head percolation tables) – Completely flat structure in NP (brown bag lunch, pink-and- yellow child seat ) – Has flat Ss, flat VPs
Example from PTB ( (S (NP-SBJ It) (VP 's (NP-PRD (NP (NP the latest investment craze) (VP sweeping (NP Wall Street))) : (NP (NP a rash) (PP of (NP (NP new closed-end country funds), (NP (NP those (ADJP publicly traded) portfolios) (SBAR (WHNP-37 that) (S (NP-SBJ *T*-37) (VP invest (PP-CLR in (NP (NP stocks) (PP of (NP a single foreign country)))))))))))
39 Types of syntactic constructions Is this the same construction? – An elf decided to clean the kitchen – An elf seemed to clean the kitchen An elf cleaned the kitchen Is this the same construction? – An elf decided to be in the kitchen – An elf seemed to be in the kitchen An elf was in the kitchen
40 Types of syntactic constructions (ctd) Is this the same construction? There is an elf in the kitchen – *There decided to be an elf in the kitchen – There seemed to be an elf in the kitchen Is this the same construction? It is raining/it rains – ??It decided to rain/be raining – It seemed to rain/be raining
41 Types of syntactic constructions (ctd) Conclusion: to seem: whatever is embedded surface subject can appear in upper clause to decide: only full nouns that are referential can appear in upper clause Two types of verbs
Types of syntactic constructions: Analysis an elf S NP VP V to decide S NPVP V to be PP in the kitchen S VP V to seem S NPVP V to be PP in the kitchen an elf
Types of syntactic constructions: Analysis an elf S NP VP V decided S NP PRO VP V to be PP in the kitchen S VP V seemed S NPVP V to be PP in the kitchen an elf
Types of syntactic constructions: Analysis an elf S NP VP V decided S NP PRO VP V to be PP in the kitchen S VP V seemed S NPVP V to be PP in the kitchen an elf
Types of syntactic constructions: Analysis an elf S NP VP V decided S NP PRO VP V to be PP in the kitchen S NP i VP V seemed S NPVP V to be PP in the kitchen an elf titi
46 Types of syntactic constructions: Analysis to seem: lower surface subject raises to upper clause; raising verb seems (there to be an elf in the kitchen) there seems (t to be an elf in the kitchen) it seems (there is an elf in the kitchen)
47 Types of syntactic constructions: Analysis (ctd) to decide: subject is in upper clause and co-refers with an empty subject in lower clause; control verb an elf decided (an elf to clean the kitchen) an elf decided (PRO to clean the kitchen) an elf decided (he cleans/should clean the kitchen) *it decided (an elf cleans/should clean the kitchen)
48 Lessons Learned from the Raising/Control Issue Use distribution of data to group phenomena into classes Use different underlying structure as basis for explanations Allow things to “move” around from underlying structure -> transformational grammar Check whether explanation you give makes predictions
The Big Picture Empirical Matter Formalisms Data structures Formalisms Algorithms Distributional Models Maud expects there to be a riot *Teri promised there to be a riot Maud expects the shit to hit the fan *Teri promised the shit to hit the or Linguistic Theory Content: Relate morphology to semantics Surface representation (eg, ps) Deep representation (eg, dep) Correspondence uses descriptive theory is about explanatory theory is about predicts
50 Syntactic Parsing
51 Syntactic Parsing Declarative formalisms like CFGs, FSAs define the legal strings of a language -- but only tell you ‘this is a legal string of the language X’ Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses
52 Parsing as a Form of Search Searching FSAs – Finding the right path through the automaton – Search space defined by structure of FSA Searching CFGs – Finding the right parse tree among all possible parse trees – Search space defined by the grammar Constraints provided by the input sentence and the automaton or grammar
CFG for Fragment of English S NP VPVP V S Aux NP VPPP -> Prep NP NP Det NomN old | dog | footsteps | young NP PropNV dog | include | prefer Nom -> Adj NomAux does Nom N NomPrep from | to | on | of Nom NPropN Bush | McCain | Obama Nom Nom PPDet that | this | a| the VP V NP TopDBotUpE.g. LC’s
Parse Tree for ‘The old dog the footsteps of the young’ for Prior CFGPrior CFG S NPVP NPV DET NOM N PP DETNOM N Theolddog the footsteps of the young
55 Top-Down Parser Builds from the root S node to the leaves Expectation-based Common search strategy – Top-down, left-to-right, backtracking – Try first rule with LHS = S – Next expand all constituents in these trees/rules – Continue until leaves are POS – Backtrack when candidate POS does not match input string
56 Rule Expansion “The old dog the footsteps of the young.” Where does backtracking happen? What are the computational disadvantages? What are the advantages?
57 Bottom-Up Parsing Parser begins with words of input and builds up trees, applying grammar rules whose RHS matches Det N V Det N Prep Det N The old dog the footsteps of the young. Det Adj N Det N Prep Det N The old dog the footsteps of the young. Parse continues until an S root node reached or no further node expansion possible
58 What’s right/wrong with…. Top-Down parsers – they never explore illegal parses (e.g. which can’t form an S) -- but waste time on trees that can never match the input Top-Down parsers Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root) Bottom-Up parsers For both: find a control strategy -- how explore search space efficiently? – Pursuing all parses in parallel or backtrack or …? – Which rule to apply next? – Which node to expand next?
59 Some Solutions Dynamic Programming Approaches – Use a chart to represent partial results CKY Parsing Algorithm – Bottom-up – Grammar must be in Normal Form – The parse tree might not be consistent with linguistic theory Early Parsing Algorithm – Top-down – Expectations about constituents are confirmed by input – A POS tag for a word that is not predicted is never added Chart Parser
60 Earley Parsing Allows arbitrary CFGs Fills a table in a single sweep over the input words – Table is length N+1; N is number of words – Table entries represent Completed constituents and their locations In-progress constituents Predicted constituents
61 States The table-entries are called states and are represented with dotted-rules. S -> · VPA VP is predicted NP -> Det · NominalAn NP is in progress VP -> V NP · A VP has been found
62 States/Locations It would be nice to know where these things are in the input so… S -> · VP [0,0]A VP is predicted at the start of the sentence NP -> Det · Nominal [1,2]An NP is in progress; the Det goes from 1 to 2 VP -> V NP · [0,3]A VP has been found starting at 0 and ending at 3
63 Graphically
64 Earley As with most dynamic programming approaches, the answer is found by looking in the table in the right place. In this case, there should be an S state in the final column that spans from 0 to n+1 and is complete. If that’s the case you’re done. – S –> α · [0,n+1]
65 Earley Algorithm March through chart left-to-right. At each step, apply 1 of 3 operators – Predictor Create new states representing top-down expectations – Scanner Match word predictions (rule with word after dot) to words – Completer When a state is complete, see what rules were looking for that completed constituent
66 Predictor Given a state – With a non-terminal to right of dot – That is not a part-of-speech category – Create a new state for each expansion of the non-terminal – Place these new states into same chart entry as generated state, beginning and ending where generating state ends. – So predictor looking at S ->. VP [0,0] – results in VP ->. Verb [0,0] VP ->. Verb NP [0,0]
67 Scanner Given a state – With a non-terminal to right of dot – That is a part-of-speech category – If the next word in the input matches this part-of-speech – Create a new state with dot moved over the non-terminal – So scanner looking at VP ->. Verb NP [0,0] – If the next word, “book”, can be a verb, add new state: VP -> Verb. NP [0,1] – Add this state to chart entry following current one – Note: Earley algorithm uses top-down input to disambiguate POS! Only POS predicted by some state can get added to chart!
68 Completer Applied to a state when its dot has reached right end of role. Parser has discovered a category over some span of input. Find and advance all previous states that were looking for this category – copy state, move dot, insert in current chart entry Given: – NP -> Det Nominal. [1,3] – VP -> Verb. NP [0,1] Add – VP -> Verb NP. [0,3]
69 Earley: how do we know we are done? How do we know when we are done?. Find an S state in the final column that spans from 0 to n+1 and is complete. If that’s the case you’re done. – S –> α · [0,n+1]
70 Earley More specifically… 1. Predict all the states you can upfront 2. Read a word 1. Extend states based on matches 2. Add new predictions 3. Go to 2 3. Look at N+1 to see if you have a winner
71 Example Book that flight We should find… an S from 0 to 3 that is a completed state…
72 Example
73 Example
74 Example
75 Details What kind of algorithms did we just describe – Not parsers – recognizers The presence of an S state with the right attributes in the right place indicates a successful recognition. But no parse tree… no parser That’s how we solve (not) an exponential problem in polynomial time
76 Converting Earley from Recognizer to Parser With the addition of a few pointers we have a parser Augment the “Completer” to point to where we came from.
Augmenting the chart with structural information S8 S9 S10 S11 S13 S12 S8 S9 S8
78 Retrieving Parse Trees from Chart All the possible parses for an input are in the table We just need to read off all the backpointers from every complete S in the last column of the table Find all the S -> X. [0,N+1] Follow the structural traces from the Completer Of course, this won’t be polynomial time, since there could be an exponential number of trees So we can at least represent ambiguity efficiently
79 Left Recursion vs. Right Recursion Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP)
Solutions: – Rewrite the grammar (automatically?) to a weakly equivalent one which is not left-recursive e.g. The man {on the hill with the telescope…} NP NP PP (wanted: Nom plus a sequence of PPs) NP Nom PP NP Nom Nom Det N …becomes… NP Nom NP’ Nom Det N NP’ PP NP’ (wanted: a sequence of PPs) NP’ e Not so obvious what these rules mean…
81 – Harder to detect and eliminate non-immediate left recursion – NP --> Nom PP – Nom --> NP – Fix depth of search explicitly – Rule ordering: non-recursive rules first NP --> Det Nom NP --> NP PP
82 Another Problem: Structural ambiguity Multiple legal structures – Attachment (e.g. I saw a man on a hill with a telescope) – Coordination (e.g. younger cats and dogs) – NP bracketing (e.g. Spanish language teachers)
83 NP vs. VP Attachment
84 Solution? – Return all possible parses and disambiguate using “other methods”
85 Summing Up Parsing is a search problem which may be implemented with many control strategies – Top-Down or Bottom-Up approaches each have problems Combining the two solves some but not all issues – Left recursion – Syntactic ambiguity Next time: Making use of statistical information about syntactic constituents – Read Ch 14