20 G M aaba acba aaba.. What is it about? Models of Language Generation Models of Language Recognition
21 Language: (1)The words, their pronunciation, and the methods of combining them used and understood by a community. (2) A system of signs and symbols and rules for using them that is used to carry information. - from a Webster dictionary -
22 Formal Languages and Grammars: definition (1) A phrase structured (also called type 0) grammar is a 4-tuple G =, where V T : terminal alphabet (called morphemes by linguists), V N : nonterminals (also called variables, or syntactic categories), V = V T V N : total alphabet, S V N : the start symbol, and P : a finite set of production (also called rewriting) rules of the form which means generates (or produces) , where V * V N V * and V *. Practice1 Practice2
23 Formal Languages and Grammars: definition (cont’ed) Notice that V * V N V * is the set of strings of total alphabet which has at least one nonterminal symbol. For two strings w 1 and w 2, we write w 1 w 2 to denote w 2 can be derived from w1 by applying a production rule of a grammar G. We write w 1 w 2 to denote w 2 can be derived by applying some finite number of production rules including zero. The language of a grammar G, denoted by L(G), is the set of strings over V T that can be generated by G staring with the start symbol S, i.e., L(G) = { x | x V T * and S x }. Following the convention we will use uppercase letters for nonterminal symbols and lowercase letters for terminal symbols. * *
24 Foramal Languages and Grammars: definition (cont’ed) (2) Context-sensitive (type 1) grammars are type 0 grammars with the the following restriction: | | | (i.e., noncontracting) except for S . (3) Context-free (type 2) grammars are type 0 grammars with the restriction | | = 1, i.e., the left side of every production rule has only one symbol, which is nonterminal. (4) Regular (type 3) grammars are type 2 grammars with the restriction = xB or = x, for some x * and B V N.
25 EXAMPLES type 0 : G =, where P = { S ACaB | aAD AC Ca aaCaE Ea CB DB | E AE aD Da } L(G) = { | n 0 }
26 EXAMPLES type 1 : G = P = { S aSBC | aBC CB BCbB bb aB abbC bc cC cc } L(G) = {a i b i c i | i 1 }
27 EXAMPLES type 2 : G = P = { S ASB | A 0B 1 } L(G) = {0 i 1 i | i 0 } type 3 : G = P = { S 0S | A A 1A | } L(G) = { 0 i 1 j | i, j 0 }
28 Remarks on Grammars The following remarks summarize subtle conceptual aspects concerning formal grammars and their languages that we have defined in the class. Let G = be a grammar. The set of rules P does not have any order explicitly defined that must be observed when a string is derived. Recall that the language L(G) is the set of terminal strings that can be generated by applying a finite sequence of production rules. However, it is not true that every sequence of production rules produces a terminal string. We may end up with a string which has a nonterminal symbol that can never derive a terminal (or null) string. For example, consider the grammar below, which is type 1. ( For convenience, we will only show the set of production rules written according to the convention, because we can identify V T, V N and the start symbol, which is S.) (1) S ABC (2) AB ab (3) BC bc (4) bC bc Clearly, only rules (1) (2) (4) applied in this order will derive terminal string abc, which is the only member of the language of the grammar. If you apply (1) followed by (3), you will be stuck with Abc, which cannot be a member of the language because the string has a nonterminal symbol A.
29 Remarks on Grammars (cont’ed) Rule (3) of the grammar above is useless in the sense that it does not contribute to the generation of the language. We can delete the rule from the grammar without affecting the language of the grammar. In general, the decision problem of whether an arbitrary grammar has a useless rule or not is unsolvable. However, if we restrict the problem to the class of context-free grammars (type 2), we can effectively clean up such useless rules, if any. We will learn how to do this. The grammars that we have defined in the class are sequential in the sense that only one rule is allowed to apply at a time. Notice that in the above grammar, if we apply rules AB ab and BC bc simultaneously on string ABC, which is derived from S, we will get terminal string abbc, which is not a member of the language according to our definition. There is a class of grammar where more than one rule can be applied simultaneously. We call such rules parallel rewriting rules. In general it is very difficult to study parallel rewriting grammars. However, the language of a context-free grammar does not depend on how you apply the rules. We get the same language independent of the mode of rule application, sequential or parallel. Why? The answer is left for the reader.
30 Remarks on Grammars (cont’ed) Context-free grammars are defined as type 0 ( not type 1) grammar with the restriction of | | = 1. It follows that a context-free grammar can have a contracting rule, like A , while type 1 grammars cannot have contracting rules except for S Later we will see that all context-free grammars which have -production rules can be converted to a grammar which has production S if the grammar produces the null string. By definition, a regular grammar cannot have rules of either one of the following from, where A, B, C are arbitrary nonterminals, and a, b are terminals. A bBCA abBaA Ba We can define the same class of regular languages using production rules restricted to the forms A Bx or A x. Notice that the nonterminal symbols on the right side of a production rule, if any, must be at the left end of the string. We call these rules left linear and the rules defined in the class right linear. However, the definition does not allow a type 3 grammar to have both left linear and right linear forms (e.g., S aB, B Sb | b ).