Simplifications of Context-Free Grammars
Assumption The language discussed here does not contain the empty string. Let L be any context-free language, let G = (V, T, S, P) be a context-free grammar for L – { }. Then the grammar obtain by adding to V the new variable S0, Making the start variable, and adding to P the productions: generate L.
A Substitution Rule Equivalent grammar Substitute
A Substitution Rule Substitute Equivalent grammar
In general: Substitute equivalent grammar
Nullable Variables Nullable Variable:
Removing Nullable Variables Example Grammar: Nullable variable
Why? Language generated by the grammar? (ab*)(ab*)* How many derivation steps for the string aaaa?
Language generated by the grammar? (ab*)(ab*)* How many derivation steps for the string aaaa?
Final Grammar Substitute
Algorithm to construct the set of nullable variables Input: context-free grammar G = (V, T, S, P) 1. 2. Repeat 2.1. PREV := NULL 2.2 for each variable do if there is an A rule and , then until NULL = PREV
Example Iteration NULL PREV {B, C} 1 {A, B, C} 2
Once the set of nullable variables has been found, a new set of production rules P’ is constructed. Construction: For each production in the grammar G - the production is put into P’ - as well as all those generated by replacing nullable variables with in all possible combinations - if all are nullable, the production is not added
Example Nullable variables: A, B, C New Productions:
Unit-Productions Unit Production: (a single variable in both sides)
Removing Unit Productions Observation: Is removed immediately
How about:
Example Grammar:
Substitute
Remove
Substitute
Remove repeated productions Final grammar
Algorithm to remove unit productions Draw a dependency graph with an edge (A, B) whenever the grammar has a unit production The new grammar G’ is generated by first putting into P’ all non-unit productions of P. Then for all A and B satisfying , we add to P’ Where is the set of all rules in P’ with B on the left
Another Example Add original non-unit productions: Use A->a|bc to substitute: Use B->bb to substitute:
Useless Productions Useless Production Some derivations never terminate...
Another grammar: Useless Production Not reachable from S
then variable is useful In general: contains only terminals if then variable is useful otherwise, variable is useless
A production is useless if any of its variables is useless Productions useless Variables useless
Removing Useless Productions Example Grammar:
First: find all variables that can produce strings with only terminals Round 1: Round 2:
Keep only the variables that produce terminal symbols: (the rest variables are useless) Remove useless productions
Second: Find all variables reachable from Use a Dependency Graph not
Keep only the variables reachable from S (the rest variables are useless) Final Grammar Remove useless productions
Removing All Step 1: Remove Nullable Variables Step 2: Remove Unit-Productions Step 3: Remove Useless Variables
Normal Forms for Context-free Grammars
Chomsky Normal Form Each productions has form: or variable variable terminal
Examples: Chomsky Normal Form Not Chomsky Normal Form
Conversion to Chomsky Normal Form Example: Not Chomsky Normal Form
Introduce variables for terminals:
Introduce intermediate variable:
Introduce intermediate variable:
Final grammar in Chomsky Normal Form: Initial grammar
In general: From any context-free grammar (which doesn’t produce ) not in Chomsky Normal Form we can obtain: An equivalent grammar in Chomsky Normal Form
The Procedure First remove: Nullable variables Unit productions Useless productions
Then, for every symbol : Add production In productions: replace with New variable:
Replace any production with New intermediate variables:
Theorem: For any context-free grammar (which doesn’t produce ) there is an equivalent grammar in Chomsky Normal Form
Observations Chomsky normal forms are good for parsing and proving theorems It is very easy to find the Chomsky normal form for any context-free grammar
Greinbach Normal Form All productions have form: symbol variables
Examples: Greinbach Normal Form Not Greinbach Normal Form
Conversion to Greinbach Normal Form:
Theorem: For any context-free grammar (which doesn’t produce ) there is an equivalent grammar in Greinbach Normal Form
Observations Greinbach normal forms are very good for parsing It is hard to find the Greinbach normal form of any context-free grammar
Compilers
Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v ELSE: WHILE: cmp x,3 ... Machine Code Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } ...... Compiler
Compiler Lexical analyzer parser input output machine code program
A parser knows the grammar of the programming language
Parser PROGRAM STMT_LIST STMT_LIST STMT; STMT_LIST | STMT; STMT EXPR | IF_STMT | WHILE_STMT | { STMT_LIST } EXPR EXPR + EXPR | EXPR - EXPR | ID IF_STMT if (EXPR) then STMT | if (EXPR) then STMT else STMT WHILE_STMT while (EXPR) do STMT
The parser finds the derivation of a particular input derivation Parser input E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5 E -> E + E | E * E | INT 10 + 2 * 5
derivation tree derivation E E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5 E + E 10 E E * 2 5
derivation tree E machine code E + E mult a, 2, 5 add b, 10, a 10 E E * 2 5
Parsing
Parser input string derivation grammar
Example: Parser derivation input ?
Exhaustive Search Phase 1: Find derivation of All possible derivations of length 1
Phase 2 Phase 1
Phase 2 Phase 3
Final result of exhaustive search (top-down parsing) Parser input derivation
Time complexity of exhaustive search Suppose there are no productions of the form Number of phases for string :
For grammar with rules Time for phase 1: possible derivations
Time for phase 2: possible derivations
Time for phase : possible derivations
Extremely bad!!! Total time needed for string : phase 1 phase 2|w|
There exist faster algorithms for specialized grammars S-grammar: symbol string of variables Pair appears once
S-grammar example: Each string has a unique derivation
For S-grammars: In the exhaustive search parsing there is only one choice in each phase Time for a phase: Total time for parsing string :
For general context-free grammars: There exists a parsing algorithm that parses a string in time (we will show it in the next class)
The CYK Parser
The CYK Membership Algorithm Input: Grammar in Chomsky Normal Form String Output: find if
The Algorithm Input example: Grammar : String :
Therefore: Time Complexity: Observation: The CYK algorithm can be easily converted to a parser (bottom up parser)