About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4
About grammars Simplifying grammars Normal forms for grammars Grammar Ambiguity
Grammar Productions Formal definition of a grammar provides much leeway Productions can be simplified or restricted to make proofs about CFGs simpler
Simplifications Removing useless symbols Those that cannot be derived from S and those that cannot reduce to a terminal string Removing є-productions A є Removing unit productions A B Normal forms e.g., Chomsky Normal Form
Useless symbols We want to ensure all productions in the grammar have no useless symbols, i.e., all symbols are generating and reachable Generating symbols All variables that could eventually derive a string of terminals; i.e., all A in V, such that there exists a string w of terminals where A * w Reachable symbols All variables that can be reached from the start symbol; i.e., all A in V, such that S * uAw, for some u and w
Removing useless productions Remove productions with non-generating symbols Requires identifying generating symbols recursively: right hand side of production contains only terminals and generating symbols Remove productions with non-reachable symbols Requires identifying reachable symbols recursively: S is reachable, and so are symbols that exist on the right hand side of productions with reachable symbols on the left hand side
Epsilon Productions є-productions: productions of the form A є Nullable symbols: symbols A where A є or A B 1 B 2 …B n such that each B i is nullable For each production that has a nullable symbol on the right hand side, add a production without that symbol; apply rule iteratively on resulting productions After this step, all є-productions can be removed Note, if the language L generated by the original grammar includes є, then the language generated by the resulting grammar will be L – {є}
Unit Productions Unit productions: all productions of the form A B Removing unit productions Identify unit pairs: pairs of variables (A, B) such that A * B, and the derivation involves only unit productions For each unit pair (A, B), add the production A w, whenever B w and w is not a variable Unit productions may now be removed
Chomsky Normal Form CNF: all productions are of the form A BC(B, C are variables) A a(a is a terminal) How do we convert a grammar to an equivalent CNF grammar?
Greibach Normal Form GNF: all productions are of the form A aB 1 B 2 …B n Note that A a is allowed Note that if the grammar is GNF, each step in a derivation of a string adds a terminal How do we convert a grammar to an equivalent GNF grammar?
Recall CFG to PDA conversion Transition function is based on the variables, productions and terminals of the grammar: (q 0, є, A) includes (q 0, w) whenever A w (q 0, a, a) = (q 0, є ) for each a in T Easier and more intuitive if the grammar is of GNF (q 0, a, A) = (q 0, B 1 B 2 …B n ) for each production A aB 1 B 2 …B n
Ambiguous grammar A grammar G is ambiguous if there exists a string for which two different parse trees exist (two different leftmost derivations) Example: S i = E E n E i E E + E E E * E Parse tree for i = n + n * n ?
Two leftmost derivations S i = E i = E + E i = n + E i = n + E * E i = n + n * E i = n + n * n S i = E i = E * E i = E + E * E i = n + E * E i = n + n * E i = n + n * n
Grammar and precedence S i = E E E + T E T T T * F T F F n F i Parse tree for i = n + n * n ? S i = E i = E + T i = T + T i = F + T i = n + T i = n + T * F i = n + F * F i = n + n * F i = n + n * n
Chomsky hierarchy Relaxing or adding restrictions to productions in a grammar leads towards a hierarchy of languages Note: Context-free grammar definition imposes that a production should take the form A w, where A T and w is a string over T V
Chomsky hierarchy Regular languages (type 3) A sB, A s (A, B V, s T) Context-free languages (type 2) A w (w is a string over T V) Context-sensitive languages (type 1) uAw uvw (u,v,w are strings over T V) Recursively enumerable languages (type 0) v w (productions are unrestricted)
Chomsky hierarchy regular recursive recursively enumerable context-free context-sensitive type 0 type 1 type 2 type 3