Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שמונה Context Free Parsing אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with FS grammars Reasons why finite-state grammars may be inadequate: Cannot represent constituency adequately Cannot represent structural ambiguity Cannot deal with recursion Recursion occurs when an expansion of a non- terminal includes the non-terminal itself, eg.Nominal Nominal PP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context-free grammars A context-free grammar consists of –a set of rules or productions –a lexicon
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books A baby CF grammar for NPs NP Det Nominal NP ProperNoun NP Pronoun Nominal Noun Nominal Noun Nominal Det a Det the ProperNoun oren Pronoun I Noun flight Noun cat Noun chair
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context-free rules 2 classes of symbols: terminal and non-terminal symbols Each context-free rule is of the form: A γ a single non-terminal symbol an ordered list of one or more terminals and non-terminals
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context-free grammars CFGs can be used for analyzing or for generating „ “ means: rewrite the symbol on the left with the string of symbols on the right Example: NP Det Nominal Det Noun a Noun a flight
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context-free grammars A sequence of rule expansions is called the derivation of a string of words. Formally, a particular CF language is a set of strings which can be derived from a particular CF grammar A derivation is represented by a parse tree
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books The combined baby CFG S NP VP NP Pronoun |Det Nominal |ProperNoun Nominal Noun|Noun Nominal VP Verb NP | Verb NP PP | Verb PP PP Preposition NP Noun flight|flights |trip |morning |... Verb is |prefer |like|need |want |fly |... Adjective cheapest|non-stop|first|direct|... Pronoun me|I |you|it|... ProperNoun Los Angeles |Chicago |Alaska |... Det a |the|an |this | these|that|... Preposition from|to |on |near|... Conjunction and |or|but|...
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Formal definition of a CFG A context-free grammar is a 4-tuple: –A set of non-terminal symbols N –A set of terminal symbols Σ(disjoint from N) –A set of productions P, each of the form A α, where A is a non-terminal and α is a string of symbols from the infinite set of strings (Σ.N)* –A designated start symbol S
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CF language If A β is a production of P and α and γ are any strings in the set (Σ.N)*,we say that αAγ directly derives αβγ, or αAγ αβγ Derivation is a generalization of direct derivation –Let α1, α2,..., αm be strings in (Σ.N)*, m 1, such that α1 α2, α2 α3,..., α m-1 α m –we say that α1 derives αm, or α1 * αm
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CF language The language L(G) generated by a grammar G is the set of strings composed of terminal symbols which can be derived from the designed start symbol S: L(G)= {W| w is in Σ* and S * w}
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Grammar equivalence Two grammars are strongly equivalent if they generate the same set of strings and if they assign the same phrase structure to each sentence Two grammars are weakly equivalent if they generate the same set of strings but do not assign the same phrase structure to each sentence
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Chomsky normal form A CFG is in Chomsky normal form –if it is ε-free –if each production is either of the form A BC or A α –Any CFG can be converted into a weakly- equivalent Chomsky normal form grammar –Example: A BCD eq A BX, X CD
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Sentence Level Constructions Declarative Sentences –I want a flight from Boston to Chicago Imperative Sentences –Show me all flights available Yes-No Questions –Do any of these flights have stops? Wh-Questions –What Airlines fly from Boston to Chicago?
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Declarative Structure Subject NP followed by VP Statements or assertions Examples: –The flight should leave at eleven a.m. –I need a flight to Seattle. S NP VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Imperative Structure Begin with a VP and have no subject Commands Examples: –Show me the lowest fare. –List all flights from Dallas to Denver. S VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Yes-No-Question Structure Auxiliary verb, subject NP, VP Questions Examples: –Does this flight have stops? S Aux NP VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Wh-Question Structure Contain a wh-phrase constituent that includes a wh-word (who, what, when, where, why, which, how) Questions Two classes: –Wh-subject-question –Wh-non-subject-question
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Wh-Subject-Question Wh-phrase is the subject of the sentence Same as declarative structure except the first NP contains some wh-word Examples: –What airlines fly from Dallas to Denver? –Which flights serve breakfast? S →Wh-NP VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Wh-Non-Subject-Question Wh-phrase is not the subject of the sentence Aux appears before the subject NP Examples: –What flights do you have from Dallas to Denver? –When does this flight leave Dallas? S →Wh-NP Aux NP VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Noun Phrase (NP) Components of an NP: –Head: central noun –Pre-nominal (pre-head) modifiers –Post-nominal (post-head) modifiers
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Pre-head Modifiers Determiners –a stop, the flights, this fare, those flights Determiners can be omitted –Mass nouns: "Water is precious" Pre-determiners –Appear before the determiners –Number or amount: "all the flights"
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Pre-head Modifiers Others that can appear between the determiner and the head noun: Cardinal numbers: one stop, two friends Ordinal numbers: the first child, the last day Quantifiers: many stops, several friends Adjective phrase (AP): a nonstop flight, the least expensive fare NP (Det) (Card) (Ord) (Quant) (AP) Nominal
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Post-head Modifiers Three kinds: –Prepositional phrases –Non-finite clauses –Relative clauses
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Post-head Modifiers Prepositional phrases (PP) Examples: –any stopovers [ for Delta seven fifty one ] –all flights [ from Dallas ] [ to Denver ] –arrival [ in Dallas ] [ before seven p.m. ] Nominal Nominal PP (PP) (PP)
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Post-head Modifiers Non-finite clauses –-ed the aircraft used by this flight –infinitive The last flight to arrive in Dallas –-ing (gerundive) flights arriving after eleven a.m. those leaving on Monday
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Post-head Modifiers To add gerundive non-finite clauses: Nominal →NominalGerundVP GerundVP→GerundV GerundVP→GerundVNP GerundVP→GerundVPP GerundVP→GerundVNP PP GerundV→arriving | leaving | …
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Post-head Modifiers Relative clauses ––A clause that begins with a relative pronoun (that, who) –Examples: a flight that serves breakfast flights that leave in the morning –Nominal →NominalRelClause –RelClause→who VP –RelClause→that VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Coordination NPs, VPs, and sentences can be joined with conjunctions (and, or) NP →NP and NP VP →VP and VP S →S and S Examples: ––Please repeat [NP[NP the flights] and [NP the costs]] ––What flights do you have [VP[VP leaving Dallas] and [VP arriving in Denver]] ––[S[S I’m interested in a flight to Dallas] and [S I’m also interested in going to Baltimore]]
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with context-free grammars Agreement: Subject noun and verb have to agree in person and number –What flights leave in the morning? –What flight leaves in the morning? –But simple CFGs over generate: *What flights leaves in the morning? *What flight leave in the morning?
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with context-free grammars Agreement contd. Possible rules which account for agreement: –S 3sgNP 3sgVP –S Non3sgNP Non3sgVP –Instead of S NP VP Not satisfactory due to rule proliferation –Massive redundancy –Lack of linguistically significant generalizations
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with context-free grammars Subcategorization Possible complements of a verb are called the subcategorization frame of that verb –The problem disappeared. –He finds a solution. –The lady wants to fly to Paris. But again simple CFGs over generate: –*The teacher disappeared the problem. –*He finds. –*She finds to fly to Paris
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CF Parsing Syntactic parsing means recognizing a sentence and assigning a structure to it CF grammars are a declarative formalism Many possible CF parsing algorithms
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Parsing as a search problem Syntactic parsing can be viewed as a searching through the space of all possible parse trees to find the correct parse tree(s) whose root is the start symbol S and which cover exactly the input words The search space is defined by the grammar The search strategy can be –top-down or goal-driven –bottom-up or data-driven
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Top-Down Parsing A top-down parsing algorithm starts with the root S and builds trees down to the input words
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Top-down parsing S NPVP S AuxVP S S NP S VP DetNom S NPVP PropN S NPVP DetNom S VP VNP Aux S NPVPAux PropN S VP V
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Bottom-up parsing A bottom-up parsing algorithm starts with the input words and builds trees up toward the root S
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Top-down vs. Bottom-up Parsing Top-down parser: –Only generates trees that result in S –Spends much time on trees that are not consistent with the input Bottom-up parser: –Spends much time on generating trees that do not result in S –Only generates trees that are consistent with the input
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Top-down depth-first left-to- right parser Depth-first strategy only explores one state at a time Further choices: –Which node of a tree to expand (e.g. left- most unexpanded node of the current tree) –The order in which grammar rules are applied (e.g. according to their textual order in the grammar)
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books A miniature English Grammar
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books TD-DF-LR parsing “Does this flight include a meal?”
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with the basic parser Left-recursion: rules of the type: NP NP PP solution: rewrite each rule of the form A A | using a new symbol: A A ’ A A ’ |
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Left Recursion
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Problems with the basic parser Ambiguity: attachment ambiguity, coordination ambiguity, noun-phrase bracketing ambiguity Attachment ambiguity: –“I saw the Grand Canyon flying to New York” Coordination ambiguity: –“old men and women”
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Ambiguity Ambiguity is a problem to all parsers Parsing requires disambiguation Number of possible parses should be reduced (potentially exponential number of parses) e.g. Show me the meal on Flight 386 from Munich to Denver(14 parses)
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Ambiguity Example: President Kennedy today pushed aside other White House business to devote all his time and attention to working on the Berlin crisis address he will deliver tomorrow night to the American people over nationwide television and radio. Solutions: return all parses or include disambiguation in the parser.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Parse Tree Noun Nom NounDet VerbPronoun Ipreferamorning flight NP S VP
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context Free Grammars “Stronger” then Finite State Machines Capture constituency and ordering Modern Linguistic Theories of grammar are only vaguely based on context-free grammars.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Example of a CFG: –NP Det Nominal –Nominal Noun | Noun Nominal –Det a | the –Noun flight | trip Generator or recognizer
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CFG Derivation: –NP Det Nominal Det Noum a Noun a flight Parse Tree:
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books Context Free Grammars G = (N, ,P,S): –N: A set of terminals (either lexical items or parts of speech). – : A sets of non terminals (The constituents of the language). –A sets of rules of the form A A N, ( N) * –A designated start symbol S N
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CFG If A is a production of P, and and are any strings in ( N) *, then A directly derives If 1 2 …. m then way say that 1 derives m or 1 * m
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CFG The language generated by a grammar G, denoted L(G), is the set of strings composed of terminal symbols which can be derived from the start symbol S of G L(G) = { w | w ∈ ∑* and S ⇒ * w } Parsing: Mapping from a string of words to its Parse Tree.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CFG Strong equivalence: Two grammars G1 and G2 are strongly equivalent if they generate the same set of strings (i.e., L(G1) = L(G2)) and they assign the same phrase structure to each string (allowing only for renaming of non-terminal symbols) Weak equivalence: Generate the same set of strings but don’t assign the same phrase structure to each string
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books CNF Chomsky Normal Form (CNF): –The grammar is ε-free –Each production of the grammar is either of the form A →B C or A →a (i.e., either 2 non-terminal symbols or 1 terminal symbol on RHS) Any CFG can be converted into a weakly equivalent CFG in Chomsky Normal Form
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books