7. Properties of Context-Free Languages CIS 5513 - Automata and Formal Languages – Pei Wang
Chomsky normal form A CFL can be generated by many CFGs Every CFL {ɛ} can be generated by a CFG in Chomsky normal form (CNF), where each rule is in the form of A → BC or A → a, i.e., every variable becomes either two variables or one terminal Every CFG can be converted into CNF in several steps
Removing ɛ-productions A symbol A is nullable if A * ɛ, i.e., there is a production A → ɛ, or A → B1B2 … Bk where B1B2 … Bk are all nullable If A is nullable, then B → CAD should produce a variant B → CD, and A cannot derive ɛ anymore in B → CAD All the ɛ-productions can be eliminated by treating all the variables the above way
Removing ɛ-productions: example S → AB A → aAA | ɛ B → bBB | ɛ S, A, and B are all nullable. New grammar: S → AB | A | B A → aAA | aA | a B → bBB | bB | b
Removing unit productions A unit production has the form A → B, and (A, B) is a unit pair if A * B A unit pair can be removed by expanding the involved variables all the way until the result is not a unit production If there is a cycle of expansion like A → B → C → → A then all the variables involved can be merged
Removing unit productions: example I → a | b | Ia | Ib | I0 | I1 F → I | (E) T → F | T * F E → T | E + T changes to F → a | b | Ia | Ib | I0 | I1 | (E) T → a | b | Ia | Ib | I0 | I1 | (E) | T * F E → a | b | Ia | Ib | I0 | I1 | (E) | T * F | E + T
Removing useless symbols A symbol X is useful if it is both reachable and generating, i.e., S * αXβ * w Removing a useless symbol in a grammar will not change the language it generates Eliminate nongenerating symbols and all productions involving such symbols Eliminate unreachable symbols The order of the above steps matters
Useless symbols: example Given CFG: S → AB | a A → b B is not generating, so the grammar is S → a Now A is not reachable, so the grammar is
CFG to Chomsky normal form Convert a CFG into CNF (not unique): Eliminate ɛ-productions Eliminate unit productions Eliminate useless symbols Change non-CNF productions into CNF productions, i.e., A → BCD becomes A → BE, E → CD A → Fg becomes A → FG, G → g
Decision properties of CFL’s [Complexity-related topics will not be covered] Whether a CFL is empty can be decided by checking whether the start symbol of its grammar is generating Whether a string belongs to a CFL can be decided using dynamic programming to incrementally build up the string
Testing membership in a CFL The CYK algorithm: use a CFG in CNF to incrementally find all variables that generate the substrings The triangular table is filled bottom-up, where Xij comes from XikX(k+1)j for all possible k values, according to the grammar
Membership decision for CFL
Greibach normal form Every nonempty CFL without ɛ can be generated from a grammar each of whose production rule has the form A → aα where a is a terminal, and α is a string of zero or more variables This form can be obtained from PDA with a single state and accept by empty stack
Pumping lemma for CFL A sufficiently long string must be derived by using the same variable repeatedly in a path of the parse tree
Pumping lemma for CFL (cont) A part of the parse tree can be repeated: S * uAy A * vAx A * w
Languages that are not CFL The pumping lemma can be used to show that some languages are not CFL: L = {0m1m2m | m 1} : for the n in pumping lemma, pick the word z = 0n1n2n = uvwxy, since there are n 1’s in the middle, vwx cannot contains both 0 and 2, so repeat it will produce a word not in the language To prove L = {ww} is not CFL, pump the word 0n1n0n1n , then discuss the cases
Closure properties of CFL CFLs are closed under the operation of Substitution (replace a terminal by a CFL) Union Concatenation Closure (* and +) Reversal Homomorphism Inverse homomorphism
Closure properties of CFL (cont.) CFL’s are not closed under complement, intersection, and difference Example: {0n1n2i | n 1, i 1} and {0i1n2n | n 1, i 1} are both CFL’s, but their intersection is not Example: {0,1}* {ww} is CFL, but {ww} is not The intersection or difference of a CFL and a regular language is a CFL