Limitations of context-free languages The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages Limitations of context-free languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Non context-free languages Recall the pumping lemma for regular languages allows us to show some languages are not regular Are these languages context-free? L1 = {anbn: n ≥ 0} L2 = {x: x has same number of as and bs} L3 = {1n: n is prime} L4 = {anbncn: n ≥ 0} L5 = {x#xR: x ∈ {0, 1}*} L6 = {x#x: x ∈ {0, 1}*}
Some intuition L4 = {anbncn: n ≥ 0} Let’s try to show this is context-free read a / push 1 S → aBc ??? B → ?? read c / pop 1 context-free grammar pushdown automaton
More intuition Suppose we could construct some CFG for L4, e.g. We do some derivations of “long” strings S BC CSC aSC aBCC abCC abaC abaSB abaBCB ababCB ababaB ababab S BC B CS | b C SB | a . . .
More intuition If derivation is long enough, some variable must appear twice on same path in parse tree S S BC CSC aSC aBCC abCC abaC abaSB abaBCB ababCB ababaB ababab B C C S S B B C B C a b a b a b
More intuition Then we can “cut and paste” part of parse tree ababbabb ababab ✗ B C C S S B B S C a b B C B C a b a b a b
More intuition We can repeat this many times Every sufficiently large derivation will have a part that can be repeated indefinitely This is caused by cycles in the grammar ababab ababbabb ababbbabbb ✗ ✗ ababnabnbb
General picture u u y y v v x x w v x uvwxy xvvwxxy uv3wx3y w A A A A
Example L4 = {anbncn: n ≥ 0} If L4 has a context-free grammar G, then What happens for anbncn? No matter how it is split, uv2wx2y ∉ L4! If uvwxy can be derived in G, so can uviwxiy for every i a a a ... a a b b b ... b b c c c ... c c u v w x y
Pumping lemma for context-free languages Theorem: For every context-free language L There exists a number n such that for every string z in L, we can write z = uvwxy where |vwx| ≤ n |vx| ≥ 1 For every i ≥ 0, the string uviwxiy is in L. u v w x y
Pumping lemma for context-free languages So to prove L is not context-free, it is enough that For every n there exists z in L, such that for every way of writing z = uvwxy where |vwx| ≤ n and |vx| ≥ 1, the string uviwxiy is not in L for some i ≥ 0. u v w x y
Proving language is not context-free Just like for regular languages, need strategy that, regardless of adversary, always wins you this game adversary choose n write z = uvwxy (|vwx| ≤ n,|vx| ≥ 1) you choose z L choose i you win if uviwxiy L 1 2
Example L4 = {anbncn: n ≥ 0} a a a ... a a b b b ... b b c c c ... c c adversary choose n write z = uvwxy (|vwx| ≤ n,|vx| ≥ 1) you choose z L choose i you win if uviwxiy L 1 2 L4 = {anbncn: n ≥ 0} adversary n write z = uvwxy you z = anbncn i = ? 1 2 w u y x v a a a ... a a b b b ... b b c c c ... c c
Example Case 1: v or x contains two kinds of symbols Then uv2wx2y not in L because pattern is wrong Case 2: v and x both contain one kind of symbol Then uv2wx2y does not have same number of as, bs, cs a a a ... a a b b b ... b b c c c ... c c v x a a a ... a a b b b ... b b c c c ... c c v x
More examples Which of these is context-free? L1 = {anbn: n ≥ 0} L2 = {x: x has same number of as and bs} L3 = {1n: n is prime} L4 = {anbncn: n ≥ 0} L5 = {x#xR: x ∈ {0, 1}*} L6 = {x#x: x ∈ {0, 1}*}