Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.

Similar presentations


Presentation on theme: "Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן."— Presentation transcript:

1 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן

2 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6802 The CYK Parsing Algorithm Cocke-Younger-Kasami Assumes the grammar is in CNF (and depends on this!) Based on a “dynamic programming” approach: –Build solutions compositionally from sub-solutions –Store sub-solutions and re-use them whenever necessary Recognition version: decide whether S  * w

3 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6803 CNF Chomsky Normal Form (CNF): –The grammar is ε-free –Each production of the grammar is either of the form A →B C or A →a (i.e., either 2 non-terminal symbols or 1 terminal symbol on RHS) Any CFG can be converted into a weakly equivalent CFG in Chomsky Normal Form

4 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6804 The CYK Parsing Algorithm Principles of the Algorithm: –Input is: w = x 1 …x n –We denote w ij = x i x i+1 …x i+j-1 the sub string of w of length j starting with x i –For every w ij and for every variable A in the grammar, the algorithm determines if A  * w ij

5 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6805 The CYK Parsing Algorithm The algorithm works on substrings of increasing length: We start with substrings of length 1: w i1 =x i, for 1  i  n –A  * w i1 if A  x i is a rule in the grammar. We then continue with substrings of length 2,3,...

6 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6806 The CYK Parsing Algorithm For a substring w ij we consider all possible ways of breaking it into two parts w ik and w i+k j-k A  * w ij if: –A  BC is a rule in the grammar –B  * w ik –C  * w i+k j-k

7 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6807 CYK Finally, since w = w 1n we need to verify that S  * w 1n

8 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6808 CYK We keep the results for every w ij in a table V ij Each table entry V ij contains the set of variables A such that A  * w ij Note that we only need to fill in entries up to the diagonal – the longest substring starting at is of length n-i+1

9 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6809 Example Baby grammar –S  NP VP –NP  Det Nominal |ProperNoun | Det Noun | NP PP –Nominal  Noun Noun | Nominal  Noun Nominal –VP  Verb NP | VP  VP PP –PP  Preposition NP –Noun  boy | cat | dog | cake | candle –Verb  likes | ate | drinks –Det  a | the –Preposition  from| to | on | near |with

10 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68010 12345 theboyatethecake 1 2X 3XX 4XXX 5XXXX DetNounVerbDetNoun NP- - -- VP -- S

11 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68011 Time Complexity of CYK O(n 3 )

12 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68012 Adding Parsing to CYK We need to construct parse trees for strings in L(G) Idea: –Keep back-pointers to the table entries that we combine –At the end - reconstruct a parse from the back- pointers This allows us to find all parse trees (possibly an exponential number)

13 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68013 12345678 0theboyatethecakewithacandle 1 2X 3XX 4XXX 5XXXX 6XXXXX 7XXXXXX 8XXXXXXX DetNounVerbP Noun NP - - - - VP - - S DetNounDet - NP PP NP VP -- -- ---- S- - - -

14 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68014 CNF Why did we want a Grammar in CNF?

15 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68015 Chart Parsing of non-CNF Grammars Version from Allen’s NLU book Bottom up – left to right Starts with lexical constituents (POS) Matches found constituents to extend right hand sides of rules, until all right-hand side matched Stores constituents in a chart, to allow repeated use

16 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68016

17 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68017

18 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68018

19 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68019

20 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68020

21 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68021

22 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68022

23 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68023

24 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68024

25 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68025

26 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68026

27 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68027 Complexity O(N 3 ) Constant depends on grammar parameters


Download ppt "Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן."

Similar presentations


Ads by Google