C SC 473 Automata, Grammars & Languages Automata, Grammars and Languages Discourse 04 Context-Free Grammars and Pushdown Automata
C SC 473 Automata, Grammars & Languages 2 Backus-Naur Form Grammars (CFGs) Algol 60, Algol 68—first “block-structured” languages Ex: CF Grammar ::= ::= s | ::= begin end ::= ; | begin s ; begin s;s;s end ;s end Nonterminals =variables rules=productions terminals Start variable “S”
C SC 473 Automata, Grammars & Languages 3 Grammars are “Generators” “yields” or “derives in one step” Apply one production to one variable in the string nondeterministic
C SC 473 Automata, Grammars & Languages 4 One possible derivation. Variable being rewritten at each stage is underscored two choices at each derivation step: Which variable (nonterminal) to be rewritten? Which rule with that variable as LHS to be applied? All possible terminal strings obtainable in this way make up L(G) A Particular Derivation
C SC 473 Automata, Grammars & Languages 5 Why CFGs? Most natural or artificial (e.g. programming) languages are not regular We know that the latter language is not regular, so … Ex: C programs
C SC 473 Automata, Grammars & Languages 6 Derivation (Parse) Tree yield/frontier/terminal string =
C SC 473 Automata, Grammars & Languages 7 Derivation (Parse) Tree
C SC 473 Automata, Grammars & Languages 8 Derivation (Parse) Tree (cont’d)
C SC 473 Automata, Grammars & Languages 9 Context-Free Grammar Defn 2.2: A context-free grammar G is a 4-tuple is a finite set, the variables (nonterminals) is a finite set disjoint from V, the terminals is a finite set of rules, of the form is the start variable Ex: strings with balanced parentheses. Formally: Ex: informally Variables = upper case Terminals = lower case technically, an ordered pair ( A, w)
C SC 473 Automata, Grammars & Languages 10 Yields & Derives Relations Defn. The relation yields (derives in 1 step) is defined as follows: if is a rule in R, then Defn: derives in k steps: Defn: derives: In other words: Defn: A derivation (of n steps) from is any sequence of strings satisfying:
C SC 473 Automata, Grammars & Languages 11 Language Generated Defn. The language generated by G is the set of all terminal strings derived from S: A partial derivation is one that starts with S and ends in a non-terminal string containing variables in V Ex: Partial: Terminal or terminated:
C SC 473 Automata, Grammars & Languages 12 Derivations and Parse Trees Ex: Notice: completed (terminated) parse tree is the same for both derivations—though the sequence “grows” differently
C SC 473 Automata, Grammars & Languages 13 Derivation Parse Tree Proposition 1: For every (terminated or partial) derivation there is an unique parse tree T with frontier constructible from D. Proposition 2: For every parse tree T in G and any traversal order that is top-down (visits parents before children), there is an unique derivation for the frontier of T from S, and it is constructible from T. Corollary 3: For every parse tree T in G there is an unique leftmost derivation constructible from T. Pf: Pre-order traverse T, expanding variables as their nodes are visited.
C SC 473 Automata, Grammars & Languages 14 Ex: Leftmost Derivation
C SC 473 Automata, Grammars & Languages 15 Ex: Leftmost Derivation Preorder traversal
C SC 473 Automata, Grammars & Languages 16 Ex: Leftmost Derivation (cont’d)
C SC 473 Automata, Grammars & Languages 17 Ex: Leftmost Derivation (cont’d)
C SC 473 Automata, Grammars & Languages 18 Syntactic Ambiguity 2 distinct parse trees for same terminal string 2 distinct leftmost derivations for same terminal string Leftmost derivation parse tree 1-to-1 A CFG is unambiguous w L(G) w has an unique parse tree (unique leftmost derivation) terminal string =
C SC 473 Automata, Grammars & Languages 19 Ex: Ambiguous Grammar--English | fruit | flies | … …
C SC 473 Automata, Grammars & Languages 20 “Fruit flies like a banana” fruitflieslike abanana fruit flies like a banana | fruit | flies | … …
C SC 473 Automata, Grammars & Languages 21 Right Linear Grammars & Regular Languages Defn: A CFG is right-linear iff each rule is of one the forms A wB or A w where A, B are variables and w * Chomsky (1958) called these “Type 3” Thm: L is a regular language iff L=L(G) for some right- linear grammar G. There are algorithms for converting from finite automata to right-linear grammars, and conversely. DFA M NFA N Reg. Expr E Right-linear Grammar G = conversion algorithm
C SC 473 Automata, Grammars & Languages 22 Right-Linear & Regular (cont’d) Pf: ( ) Assume L=L(M) where is a DFA. Construct with R having rule if in and rule if is a final state. Claim: Pf: easy induction on n The proof direction follows since Pf: ( ) Assume L=L(G) where is right- linear. Construct NFA where is a new symbol. has the transition if in R and transition if
C SC 473 Automata, Grammars & Languages 23 Right-Linear & Regular (cont’d) Claim: Pf: easy induction on n The proof direction follows since
C SC 473 Automata, Grammars & Languages 24 Ex: Right-Linear FA Ex: f “useless” rules—can be eliminated
C SC 473 Automata, Grammars & Languages 25 Pushdown Automaton Defn 2.12: A pushdown automaton M is a 6-tuple is a finite set, the states is a finite, the input alphabet is a finite set, the stack alphabet is the transition function is the start state is the set of accept (final) states
C SC 473 Automata, Grammars & Languages 26 PushDown Automaton input * Finite Control seen to come current input symbol stack * Top Bottom (no end- marker supplied) configuration: (state,rest of input,Stack )
C SC 473 Automata, Grammars & Languages 27 PDA (cont’d) Finite Control configuration: Initially: start state
C SC 473 Automata, Grammars & Languages 28 PDA (cont’d) Finite Control configurations: Transition: Finite Control
C SC 473 Automata, Grammars & Languages 29 PDA (cont’d) Can have -move: consume no input Pop-move: erase top stack symbol Push-only move: ignore stack Any combination is possible
C SC 473 Automata, Grammars & Languages 30 Finite Control Finally: configuration: PDA (cont’d) Defn: recognizes iff for some, and some Defn:
C SC 473 Automata, Grammars & Languages 31 Example: PDA Recognizer for accepts does not accept (blocked)
C SC 473 Automata, Grammars & Languages 32 Example: PDA w/ nondeterminism Last example (palindromes with center-mark) was a deterministic PDA (DPDA) NPDA for does not accept (blocked) Nondeterministic “guess”
C SC 473 Automata, Grammars & Languages 33 Example: PDA Recall well-nested parentheses (()) (()()) DPDA!
C SC 473 Automata, Grammars & Languages 34 Example: PDA “guesses” which pattern “checks” whether guess is correct accepts iff correct guess that checks
C SC 473 Automata, Grammars & Languages 35 CFG PDA Thm 2.20: A language is CF a PDA recognizes it. There are algorithms for converting a grammar to an equivalent automaton, and conversely. Lemma 2.21: There is an algorithm for constructing, from any CFG G, a PDA M such that L(G) = L(M). Pf: In constructing a PDA, we can permit, without losing generality, “multi- push” moves such as where For we may break a multi-push into a sequence of single-push moves by introducing new states: Henceforth we will allow multi-push moves in our PDAs.
C SC 473 Automata, Grammars & Languages 36 CFG PDA Idea: use nondeterminism. Given G, construct PDA P to Load S on stack & simulate a leftmost derivation on the stack: When a variable symbol A comes to stack top, “guess” a grammar rule A , pop A and push When a terminal character comes to stack top, compare to next input symbol. If they match, pop the top and advance the input (“check off”) If they fail to match, jam (not an accepting computation) If the input holds a word in L(G) and P guesses the correct leftmost derivation (rules to apply), then all the input characters will be checked off against those at the top of the stack and the stack will empty as the last input is checked off. Otherwise at some point the PDA will jam
C SC 473 Automata, Grammars & Languages 37 CFG PDA (cont’d) Given construct States: Input alphabet: Stack alphabet: Start state: Accept states: Transition function: Initialize stack: Simulate rules: Check off terminals: Detect null stack & accept:
C SC 473 Automata, Grammars & Languages 38 CFG PDA (cont’d) Ex:
C SC 473 Automata, Grammars & Languages 39 CFG PDA (cont’d) G P
C SC 473 Automata, Grammars & Languages 40 CFG PDA (cont’d) G P CFG leftmost derivationPDA computation
C SC 473 Automata, Grammars & Languages 41 PDA CFG Lemma 2.27: There is an algorithm for constructing, from any PDA P, a CFG G such that L ( G ) = L ( P ). Pf: Given a PDA we can convert it into a PDA with the following simplified structure: it has only one accept state: add -transitions from multiple accept states it empties its stack just before entering the accept state: Loop on a state that just pops: each PDA transition is either a “pure push” or a “pure pop - introduce new intermediate states
C SC 473 Automata, Grammars & Languages 42 PDA CFG (cont’d) becomes Idea of proof: construct G with variables for each p and q in the set of states Q. Arrange that if generates terminal string x, then PDA P started in state p with an empty stack on input string x has a computation that reaches state q with an empty stack. And conversely, if P started in state p with an empty stack has a computation on input string x that reaches state q with an empty stack, then How does P, when started on an empty stack in state p, operate on an input string x, ending with an empty stack in state q ? First move must be a push Last move must be a pop
C SC 473 Automata, Grammars & Languages 43 PDA CFG (cont’d) Trace computation of P on x starting in state p with empty stack, and ending in state q with empty stack: (1) stack never empties input Stack height Fig. 1
C SC 473 Automata, Grammars & Languages 44 PDA CFG (cont’d) Trace computation of P on x starting in state p with empty stack, and ending in state q with empty stack: (2) stack empties somewhere input Stack height Fig. 2
C SC 473 Automata, Grammars & Languages 45 PDA CFG (cont’d) Construction. Given PDA construct with the following rules in R : If then
C SC 473 Automata, Grammars & Languages 46 PDA CFG (cont’d) Claim 2.30: If then Pf: by induction on a derivation in G length k. Base: k=1. The only derivations of length 1 are and we have Step: Assume (IH) true for derivations of k steps. Want Claim true for derivations of k+1 steps. Suppose that. The first derivation step is either of the form or Case. Then with So IH By construction, since is a rule of G,
C SC 473 Automata, Grammars & Languages 47 PDA CFG (cont’d) Case. Then with So IH Putting these together:
C SC 473 Automata, Grammars & Languages 48 PDA CFG (cont’d) Claim 2.31: If then Pf: by induction on a computation in P of length k: Base: k=0. The only computations of length 0 are where x = . By construction Step: Assume (IH) true for computations of k steps. Want Claim true for computations of k+1 steps. Suppose that. Two cases: either the stack does not empty in midst of this computation (Fig. 1) or it Becomes empty during the computation (Fig. 2). Call these Case 1 and Case 2.
C SC 473 Automata, Grammars & Languages 49 PDA CFG (cont’d) Case 1: See Fig.1. The symbol X pushed in the 1 st move Is the same as that popped in the last move. Let the 1 st and last moves be governed by the push/pop transitions: By construction, there is a rule in G Let x = ayb. Since then we must have By IH Then Using we conclude
C SC 473 Automata, Grammars & Languages 50 PDA CFG (cont’d) Case 2: See Fig.2. Let r be the intermediate state where the stack becomes empty. Then By the IH, and Since by construction there is a rule in G of the form then
C SC 473 Automata, Grammars & Languages 51 PDA CFG (cont’d) Ex: Rules of G: (1) push-pop pairs (1 st kind):
C SC 473 Automata, Grammars & Languages 52 PDA CFG (cont’d) Note: If ( p´ unreachable) then (abbreviated ). Such variables are useless; all rules involving them on left or right sides can be eliminated as useless productions. For this grammar (2) Rules of the 2 nd Kind (with useless rules removed—only 10/27 survive) in the order s,q,f :
C SC 473 Automata, Grammars & Languages 53 PDA CFG (cont’d) (2) Rules of the 3 rd Kind: Combining all rules with same LHS:
C SC 473 Automata, Grammars & Languages 54 PDA CFG (cont’d) Simplify: easy to see that Substituting this into rules: Eliminate useless rules like
C SC 473 Automata, Grammars & Languages 55 PDA CFG (cont’d) Another kind of useless rule: generate no terminal strings. Eliminate these variables any and rules mentioning them. Final simplified grammar is: Note: chose to use endmarkers # for clarity, but these could have been , (input symbols can be anything in ) leading to the familiar grammar
C SC 473 Automata, Grammars & Languages 56 Closure Properties Regular Ops. The CFLs are closed under , , Pf: Homework Intersection. The CFLs are not closed under intersection. Example: Consider the two CFLs Then We will later see (CF Pumping Lemma) that this last is not a CFL. However, if is regular and is CF, then is CF.
C SC 473 Automata, Grammars & Languages 57 Closure Properties (cont’d) Thm: The class of CFLs is closed under intersection with regular languages. Pf: Assume and Construction. Construct a “cross-product pda” M as follows: where the transition function is defined by: provided and Machine M simulates the two given machines “in parallel”, keeping each machine state in one component of the compound state [, ].
C SC 473 Automata, Grammars & Languages 58 What is Not Context-Free? PDA have a limited computing ability. They cannot, for example, recognize repeated strings like w#w or strings that “count” in more than 2 places, such as. We will show that some languages are not CF using a CF Pumping Lemma, which gives a property that all CFLs must have. Then, to show that a language L is not CF, we somehow argue that it lacks this pumping property. Closure properties of CFLs can sometimes be used to simplify non-CFLs and make a pumping argument easier.
C SC 473 Automata, Grammars & Languages 59 CF Pumping Lemma Thm [Pumping Lemma for CFLs]. Suppose that L is an infinite CF language. Then For comparison, here is the Regular P.L.:
C SC 473 Automata, Grammars & Languages 60 CF Pumping Lemma (cont’d) Pf: Let where CFG G is a CFG in Chomsky Normal Form (Text, Theorem 2.9), i.e. a CFG in which all rules are of the (schematic) forms A BC or A a (a ). If is “sufficiently long”, then any derivation tree T for w must contain a “long” path—more precisely: Claim 1: If the derivation tree T for has no path longer than h then Pf: Induction on h. Base: h = 1. Only possible tree is and Step: Assume Let T have all paths and be of form (in CNF)
C SC 473 Automata, Grammars & Languages 61 CF Pumping Lemma (cont’d) Then have all paths of length By IH, which implies. Conversely, if a generated string is at least long, then its parse tree must be at least high. G has variables. Choose If and then Claim 1 any parse tree T for w has a path of length at least Such a path has at least nodes. some variable appears twice on the path (note the leaf node is a terminal).
C SC 473 Automata, Grammars & Languages 62 CF Pumping Lemma (cont’d) Picture: Choose Bottom variables
C SC 473 Automata, Grammars & Languages 63 CF Pumping Lemma (cont’d) (1) Center portion is not too long: (2) Pumped portion not empty: cannot both = . or In CNF, no variable generates
C SC 473 Automata, Grammars & Languages 64 CF Pumping Lemma (cont’d) (3) Pumped strings in L : the following are all parse trees and:
C SC 473 Automata, Grammars & Languages 65 CF Pumping: Applications Ex: is not a CFL. Pf: Suppose it is CF. Then the Pumping Lemma p w L, |w| p uvxyz =w & vy & i 0 u v i x y i z L. Pick p as the constant guaranteed and choose n p/3 and Where is Cases: Assume first that
C SC 473 Automata, Grammars & Languages 66 CF Pumping: Applications In cases 1-3 has an imbalance. In case 4 it has a b before a. In case 5 it has a c before b. In case 6 it has an a after a c. In any case, there is a contradiction to the pumped word being in L. The case where is symmetric. Contradiction. Cor. The CFLs are not closed under complementation. Pf: is a CFL. But is not a CFL. Therefore cannot be CF. Ex: is not CF. Proof similar to regular case. Ex: is not CF.
C SC 473 Automata, Grammars & Languages 67 Pumping: Applications (cont’d) Ex: is not CF. Pf: Intersection with is not a CFL. Therefore cannot be CF. Ex: is not CF. Pf: By pumping on the word Similar to Text, Example Ex: is not CF. Pf: Pump on the latter language in a way similar to the previous example to show it is not CF.