Lecture 17 Naveen Z Quazilbash Simplification of Grammars
Overview Attendance Motivation Simplification of Grammars Eliminating useless variables Eliminating null productions Eliminating unit productions Quiz result
Motivation for grammar simplification Parsing Problem Given a CFG G and string w, determine if w ϵ L(G). Fundamental problem in compiler design and natural language processing If G is in general form then the procedure maybe very inefficient. So the grammar is “transformed” into a simpler form to make the parsing problem easier.
Simplification of Grammars It involves the removal of: 1. Useless variables 2. ε -productions 3. Unit productions
Useless variables: There are two types of useless variables: 1. Variables that cannot be reached 2. Variables that do not derive any strings
3. ε -productions E.g.: A ε Note that if we remove these productions, the language no longer includes the empty string.
4. Unit productions: They are of the form A B Or A A
1) Unreachable Variables E.g.: S BS|B|E A DA|D|S B CB|C C aC|a D bD|b E cE|c
To find unreachable variables, draw a dependency graph Dependency Graph: Vertices of the graph are variables The graph doesn’t include alphabet symbols, such as “a” or “b” If there is a production A …..B…, i.e., the left side is A and the right side includes B, then there is an edge A B
A variable is reachable if there is a path from S to this variable S itself is always reachable After identifying unreachable variables, remove all productions with unreachable left side.
S BS|B|E A DA|D|S B CB|C C aC|a D bD|b E cE|c Drawing its dependency graph: Reachable: S, B, C, E S DAE CB
Grammar without unreachable variables: S BS|B|E B CB|C C aC|a E cE|c Ex: Determine its language!!
2) Variables that don’t terminate A variable A terminates if either: There is a production A …. with no variables on the right, e.g. A aabc, OR There is a production A … where all variables on the right terminate; e.g. A aBbaC, where B and C terminate. Note: to find all variables that terminate, keep looking for such productions until you cannot find any new ones.
TASK Example: S A|BC|DE A aA|bA B bB|b C EF D dD|BD|BA E aE|a F cFc|c Remove all productions that include a variable that doesn’t terminate. Note: We remove a production if it has such a variable on either side.
Solution xS A|BC|DE XA aA|bA xB bB|b xC EF XD dD|BD|BA xE aE|a xF cFc|c
S BC B bB|b C EF E aE|a F cFc|c Ex: Determine its language.
3) Eliminating ε -Productions Nullable variables: A variable is nullable if either: There is a production A ε, or There is a production A B 1 B 2 …B n (only variables, no symbols), where all variables on the right side are nullable. Note: to find all nullable variables, keep looking for such productions, until you cannot find any new ones.
TASK S SAB|SBC|BC A aA|a B bB|bC|C C cC| ε First we find variables that can lead to the empty string: C=> ε B=>C=> ε S=>BC=>B=>C=> ε
xS SAB|SBC|BC A aA|a xB bB|bC|C xC cC| ε Thus, S, B, and C can lead to ε ; they are called nullable variables
For each production that has nullable variables, consider all possible ways to skip some of these variables and add the corresponding productions. E.g. W aWXaYZb, suppose that X, Y and Z are nullable; then there are 8 ways to skip some of them. W aWab|aWXab|aWaYb|aWaZb|aWXaYb|aWXaZb| aWaYZb|aWXaYZb
Back to our grammar where S,B and C are nullable: S A|AB|SA|SAB|S|B|C|SB|BC|SBC A aA|a B b|bB|bC|C C c|cC| ε Now, we can remove the ε- productions without changing the language. The only possible change is losing the empty string, if it is in the original language.
So our grammar without null productions becomes: S A|AB|SA|SAB|S|B|C|SB|BC|SBC A aA|a B b|bB|bC|C C c|cC
4) Eliminating Unit Productions S Aa|B A a|bc|B B A|bb|C|cC C a|C First, for every variable, we find all single variables that can be reached from it: For S: S=>B=>A, S=>B=>C For A: A=>B=>C For B: B=>A, B=>C For C: NONE (C itself doesn’t count)
For finding reachable single variables, what should we do?
Use Dependency Graph! Drawing Dependency Graph: Vertices of the graph are variables. If there is a unit production A B, then there is an edge A B. A single variable is reachable from A if there is a pth from A to B.
Dependency Graph: S A B C
To construct an equivalent grammar without unit productions: Remove all unit productions For each pair A=>*B, where B is a single variable reachable from A, consider all productions B p 1 |p 2 |…|p n ; and add the corresponding productions A p 1 |p 2 |…|p n. for example, since A=>*B and B bb|cC, add the productions A bb|cC
S Aa|B A a|bc|B B A|bb|C|cC C a|C S Aa B bb|cC A a|bc CaCa Note that the variable B has become useless and we need to remove it! S bb|cC|a|bc|a B a|bc|a A bb|cC|a C a Old non-unit productions new productions
Summary Main steps of simplifying a grammar: Remove useless variables, which cannot be reached or do not terminate. Remove ε- productions. Remove unit productions. Remove useless variables again!