CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 20– Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 28 th Feb, 2011
Need for Parsing Sentences are linear structures, on the face of it Is that the right view? Is there a hierarchy- a tree- hidden behind the linear structure? Is there a principle in branching What are the constituents and when should the constituent give rise to children? What is the hierarchy building principle?
Deeper trees needed for capturing sentence structure NP PPAP big The of poems with the blue cover [The big book of poems with the Blue cover] is on the table. book This wont do! PP
PPs are at the same level: flat with respect to the head word book NP PPAP big The of poems with the blue cover [The big book of poems with the Blue cover] is on the table. book No distinction in terms of dominance or c-command PP
Constituency test of Replacement runs into problems One-replacement: I bought the big [book of poems with the blue cover] not the small [one] One-replacement targets book of poems with the blue cover Another one-replacement: I bought the big [book of poems] with the blue cover not the small [one] with the red cover One-replacement targets book of poems
More deeply embedded structure NP PP AP big The of poems with the blue cover N1N1 N book PP N2N2 N3N3
To target N 1 I want [ NP this [ N big book of poems with the red cover] and not [ N that [ N one]]
Other languages NP PPAP big The of poems with the blue cover [niil jilda vaalii kavita kii kitaab] book English NP PP AP niil jilda vaalii kavita kii kitaab PP badii Hindi PP
Other languages: contd NP PPAP big The of poems with the blue cover [niil malaat deovaa kavitar bai ti] book English NP PP AP niil malaat deovaa kavitar bai PP motaa Bengali PP ti
Grammar and Parsing Algorithms
A simplified grammar S NP VP NP DT N | N VP V ADV | V
A segment of English Grammar S (C) S S {NP/S} VP VP (AP+) (VAUX) V (AP+) ({NP/S}) (AP+) (PP+) (AP+) NP (D) (AP+) N (PP+) PP P NP AP (AP) A
Example Sentence People laugh Lexicon: People - N, V Laugh - N, V These are positions This indicate that both Noun and Verb is possible for the word People
Top-Down Parsing State Backup State Action ((S) 1) ((NP VP)1) - - 3a. ((DT N VP)1) ((N VP) 1) - 3b. ((N VP)1) ((VP)2) - Consume People 5a. ((V ADV)2) ((V)2) - 6. ((ADV)3) ((V)2) Consume laugh 5b. ((V)2) ((.)3) - Consume laugh Termination Condition : All inputs over. No symbols remaining. Note: Input symbols can be pushed back. Position of input pointer
Discussion for Top-Down Parsing This kind of searching is goal driven. Gives importance to textual precedence (rule precedence). No regard for data, a priori (useless expansions made).
Bottom-Up Parsing Some conventions: N 12 S 1? -> NP 12 ° VP 2? Represents positions End position unknown Work on the LHS done, while the work on RHS remaining
Bottom-Up Parsing (pictorial representation) S -> NP 12 VP 23 ° People Laugh N 12 N 23 V 12 V 23 NP 12 -> N 12 ° NP 23 -> N 23 ° VP 12 -> V 12 ° VP 23 -> V 23 ° S 1? -> NP 12 ° VP 2?
Problem with Top-Down Parsing Left Recursion Suppose you have A-> AB rule. Then we will have the expansion as follows: ((A)K) -> ((AB)K) -> ((ABB)K) ……..
Combining top-down and bottom-up strategies
Top-Down Bottom-Up Chart Parsing Combines advantages of top-down & bottom- up parsing. Does not work in case of left recursion. e.g. – People laugh People – noun, verb Laugh – noun, verb Grammar – S NP VP NP DT N | N VP V ADV | V
Transitive Closure People laugh 123 S NP VPNP N VP V NP DT NS NP VPS NP VP NP NVP V ADVsuccess VP V
Arcs in Parsing Each arc represents a chart which records Completed work (left of ) Expected work (right of )
Example People laughloudly 1234 S NP VPNP N VP V VP V ADV NP DT NS NP VPVP V ADVS NP VP NP NVP V ADVS NP VP VP V