Amirkabir University of Technology Computer Engineering Faculty AILAB Grammars for Natural Language Ahmad Abdollahzadeh Barfouroush Mehr 1381
An Example of a NLU System Structure Words (Input) Parsing Syntatic Structure and Logical form Contextual Interpretation Final Meaning Application Reasoning Lexicon Grammars Discourse Context Application Context Meaning of response Utterance Planning Syntatic Structure and Logical form of response Realization Words (Response)
Grammar and Parsing To Examine how the syntatic structure of a sentence can be computed, you must consider two things: Grammar: A formal specification of the allowable structures in the language. Parsing: The method of analysing a sentence to determine its structure according to the grammar.
Grammar and Language A Grammar G generates a characteristic language L(G) and assigns structures to all s L(G). For grammar G and start symbol S,L(G) = {x | S derives x} For X S and let , ,Y {S U N}* (i.e. sequence of terminal/non-terminal symbols) X immidiately derives Y iff X Y G X derives Z if X immidiately derives Z or X immidiately derives Y and X derives Z A grammar does not tell us how to generate L(G) or how to discover such structures.
Grammar Definition Grammer G is defined by a four tuple and is written as in the form of G = (N,S,P,S 0 ) where, - N is non-terminal symbols set - S is terminal symbols set - P is rewrite rules of the form where and are strings Of symbols. -S 0 is start symbol - In this definition N and S are two separate sets. - Only non-terminals are re-writeable and can occure in both sides of a rule.
An Example of a Grammar 1- S NP VP 2- VP V NP 3- NP NAME 4- NP ART N 5- NAME Amir 6- V ate 7- ART the 8- N biscuite S0 = S N = {N,ART,NAME,NP,VP,P} P = {rules in 1 to 8} S = {ate,the,Amir,biscuite}
Sentence Structure Two Methods for representing sentence structures are: Parse tree Lists
Parse Tree Man ate the apple S NPVP NameV NP ManateART N the apple
Parse Tree Includes information about - precedence between constituents – dominance between constituents Constitutes a trace of the rule applications used to derive a sentence. Does not tell you the order in which the rules were used
Lists Man ate the apple ( S (NP (NAME Man)) (VP (V ate) (NP (ART the) (N apple))))
Chomskey’s Hierachy Different classes of grammar result from various restrictions on the form of rules. Grammers can be compared according to range of languages each formalism can describe.
Types of Grammar in Hierarchy Regular or Rigth Linear (Type 3): Every rewrite rule is of the form X aY or X a, where a is sequence of terminals. Context-free Grammar (CFG) (Type 2): Every rewrite rule is of the form X a, where X is in N and a is in (S U N)+. Context-sensitive Grammar (Type 1): Every rewrite rule is of the form g1Xg2 g1ag2, where X is N, g1, g2 and a are in (N U S)+. Unrestricted (Type 0): Every rewrite rule is of the form a b, where There is no restriction on rule.
Categorized Grammer Grammer G is defined by a five tuple and is written as in the form of G = (N,S,T,P,S 0 ) where, - N is non-terminal symbols set - S is terminal symbols set - P is rewrite rules of the form where and are strings Of symbols. -S 0 is start symbol - T is category terminal or lexical symbols written as T1,T2,..,Tn. - S is written as S = T1,T2,…,Tn. - Every categorized terminal is written as Ti a i1 | a i2 | … | a in
An Example of a Categorized Grammar 1- S NP VP9- VP Verb 2- NP Art NP210- VP Verb NP 3- NP NP211- VP VP PP 4- NP2 Noun 5- NP2 Adj NP2 6- NP2 NP2 PP 7- PP Prep NP 8- PP Prep NP PP S0 = S N = {S,NP,VP,NP2,PP} T = {Art, Noun,Adj,Prep,Verb} Art = {a,the}, Noun = {Man,Woman,boy,cow,chicken} Verb = {eat,run,put} Adj = {old, young,heavy} Prep = {in,by,of,over}
Criteria for Evaluating Grammars Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand?How many rules are? Does it contain generalisations or special cases? How ambiguous is it?
Overgeneration and Undergeneration Overgeneration: A grammar should generate only sentences in the language.It should reject sentences not in the language. Undergeneration: A grammar should generate all sentences in the language.There should not be sentences in the language that are not generated by the grammar.
Appropriate Structures - A grammar should assign linguistically plausible structures. S -- N VP N VP -- V ART ADJ N -- [John] V -- [ate] ART -- [a] ADJ -- [juicy] N -- [hamburger]
Understandability/Generality - Understandability: The grammar should be simple. - Generality: The range of sentences the grammar analyzes correctly.
Ambiguity NP NP PP PP Prep NP (the man)(on the hill with a telescope by the sea) (the man on the hill)(with a telescope by the sea) (the man on the hill with a telescope)(by the sea) etc.
Context-free Grammars (CFG) CFG formalism is poweful enough to descibe must of the structure in natural languages. CFG is restricted enough so that efficient parsers can be built to analyse sentences.
CFGs: Advantages and Disadvantages Advantages Easytowrite Declarative Linguistically natural (sometimes) Well understood Formal properties Computationally effective Disadvantages Notion of “head” is absent Categories are unanalysable
Chomsky Normal Form (CNF) Suppose G = (N,S,P,S 0 ) is a context-free grammar. G is in Chomsky Normal Form if every rule in P be in one of the following forms: 1) X YZ for {X,Y,Z} in N or 2) X a for a in S There is an algorithm that shows every CFG can be equal to a CNF grammar.
An Algorithm for Converting CFG to CNF For every grammar G = (N,S,P,S 0 ) there is a equivalent grammar G’ in Chomsky Normal Form. 1- Transfer every rule in X YZ for {X,Y,Z} in N or X a for a in S to CNF. 2- Consider every rule in the form X Y1,a1,Y2,…,Yn. All terminal symbols ai are replaced by Xi and new rule Xi ai is added to P’. 3- Step 2 produces rules in the form X Y1,Y2,…,Yn. If n 2 non-terminals are added as follow: X Y1, Z1 = Z1 Y2, Z2 = Zn-1 Yn-1,Yn
Greibach Normal Form (GNF) Suppose G = (N,S,T,P,S 0 ) is a categorized context- free grammar. A rule in the form X a1a2…an is in GNF if a1 is in S or T and a2,…,an be non-terminals. If all rules in P be in GNF, then G is GNF. So, G should not contain rules in the form X e. By direct substitution we can reach to GNF.
An Example of CFG GNF CFG 1- S NP VP 2- S NP VP PREPS 3- NP Det NP2 4- NP NP2 5- NP2 Noun 6- NP2 Det NP2 7- NP2 NP3 PREPS 8- NP3 Noun 9- PREPS PP 10- PREPS PP PREPS 11- PP Prep NP 12- VP Verb
An Example of CFG GNF GNF 1a- S Det NP2 VP 5) NP2 Noun 1b- S Noun VP 6) NP2 Adj NP2 1c- S Adj NP2 VP 7) NP2 Noun PREPS 1d- S Noun PREPS VP 8) NP3 Noun 2a- S Det NP2 VP PREPS 9) PREPS Prep NP 2b- S Noun VP PREPS 10) PREPS Prep NP PREPS 2c- S Adj NP2 VP PREPS 11) PP Prep NP 2d- S Noun PREPS VP PREPS 12) VP Verb 3- NP Det NP2 4a- NP NP2 4b- NP Adj NP2 4c- NP Noun PREPS
Phrase Structure
Problems with phrase structure The shooting of the hunters was terrible. (The shooting) (of the hunters) (was terrible.) The boy hit the ball The ball was hit by the boy.
Surface Structure
Surface vs. Deep Structure Surface structure:Surface structure: the phrase structure of the current utterance Deep structure:Deep structure: a canonical phrase structure that has the same meaning as the surface structure Transformational grammar:Transformational grammar: rules that transform a deep phrase structure into surface phrase structures with the same meaning
Surface Structure
Deep Structure: The boy hit the ball AND The ball was hit by the boy NP VP VNP The boy hit the ball Sentence
Transformational Grammar (1965) Generates surface structure from deep structure. Syntatic Component Phrase-strcuture rules Deep Structure Transformational Rules Surface Structure Semantic component Phonological component
Example of TG Context-free grammar generates deep structure, then a set of trasformations transform deep structure to surface structure S NPVP ART NAUX V NP Thecat will catch man
Example of TG Yes/No Question transformation SS NP VPAUX NP VP ? ARTN AUX V NP ART N V NP S AUXNPVP ? ART N V NP Will the cat catch man Transformation
Transformational Grammar Base component: Generates the deep strcuture. Transformational component: Transforms the deep structure to surface structure by using transformational rules. Transformational rules change the sentence elements, insert or delete elements and/or replace one element with another element. Example of a rule: NP + V + ed + NP Did + NP + V + NP + ?
Grammer Types (1) Constraint-based Lexicalist Grammar (CBLG) - Sag, I. A. and Waswo, Syntatic Theory – a formal introduction, CSLI Publications, Categorical Grammar (CG) - Konig, E., LexGram, A Practical Categorical Grammar Formalism, Journal of Language and Computation, Dependency Grammar (DG) - Sag, I. A. and Waswo, Syntatic Theory – a formal introduction, CSLI Publications, 1999.
Grammer Types (2) Link Grammar - Sleator, D., Temperley D., Parsing English with Link Grammar, Carnegie Mellon Univ, Lexical Functional Grammar (LFG) - Sag, I. A. and Waswo, Syntatic Theory – a formal introduction, CSLI Publications, Tree-Adjoining Grammar (TAG) - Allen, James, Natural Language Understanding, 1995
Grammer Types (3) Generalized Phrase Structure Grammar (GPSG) - Sag, I. A. and Waswo, Syntatic Theory – a formal introduction, CSLI Publications, Head-driven Phrase Structure Grammar - Pollard, C., and Sag I. A., Head-driven Phrase Structure Grammar, Chicago Univ Press, This page provides information about Head-Driven Phrase Structure Grammar (HPSG) related activities at the Center for the Study of Language and Information (CSLI) at Stanford University, and pointers to other resources on the web.Center for the Study of Language and InformationStanford University
Grammer Types (4) Probabilistic Feature Grammar (PFG) - Goodman, Joshua, Probabilistic Feature Grammar, Harward university,