ELIMINATING LEFT RECURSIVENESS
Abbreviation. “cfg” stands for “context free grammar” Definition. A cfg is left recursive if it contains a production of the form A A α i.e. a production in which the lhs occurs as the head of the rhs.
Theorem 1. Eliminating left recursivenes. For any cfg G we can construct an equivalent cfg G’ such that G’ is not left-recursive
PROOF. For each nonterminal A that occurs as the lhs of a left-recursive production of G, do the following:
Let the left-recursive productions in which A occurs as lhs be A A 1 ………. A A r and the remaining productions in which A occurs as lhs be A 1 …………. A s
Let K A denote a symbol which does not already occur in the grammar. Replace the above productions by: A 1 K A |... | s K A K A ε | 1 K A |... | r K A Clearly the grammar G' produced is equivalent to G.
EXAMPLE. S R a | A a | a R a b A A R | A T | b T T b | a A non-left recursive grammar equiv. to the above is: S R a | A a | a R a b K A ε | R K A | T K A A b K A T a K T K T ε | b K T Clearly the grammar G’ obtained is equivalent to G, and has the required two properties
Definition. A grammar is indirectly left recursive if, for some production with left hand side (say) A, it is possible to derive a string whose head is A. Example. A Db D Ecc E A t
Theorem 2. Eliminating indirect left recursiveness. For any cfg G we can construct an equivalent cfg G’ such that G’ is not indirectly left-recursive
Note. The theorem 1 assertion is a subset of that of theorem 2. Theorem 2 follows directly from the following lemma.
LEMMA. For any cfg G, and any ordering of the nonterminals of G, denoted here as A 1,...,A m, we can construct an equivalent cfg G’ such that: If A i A j is any production of G’, then i < j
PROOF. The lemma is true for i 1, since we can eliminate left recursion in productions (if any) in which A 1 occurs as lhs. Assume that the lemma is true for all i t where t<m, and let the grammar so formed from G be denoted as G t.
Consider any production of G t with A t+1 as LHS and A j as the head of its rhs, where j < t+1, e.g. A t+1 A j . By the inductive assumption, all productions of the form A j have as their head either a terminal or A k for some k > j.
So if we substitute for A j in A t+1 A j all the rhs’s with A j as the LHS, then we will get productions of the form: A t+1 a rhs with a terminal as head, or A t+1 a rhs with A k as its head where k > j
By iterating the above process, we will end up with a grammar G t ’ in which all productions with A t+1 as lhs either have a terminal as the head of the rhs or a nonterminal A k for some k t+1. The lemma follows by induction
Productions of this kind in which k=t+1 are left recursive, and can be eliminated to produce a grammar which we will call G t+1.
EXAMPLE GRAMMARS G 1 S A A | 0 A S S | 1 G 2 A 1 A 2 A 3 A 2 A 3 A 1 | b A 3 A 1 A 1 | a