CFGs and PDAs Sipser 2 (pages )
Last time…
CS 311 Mount Holyoke College 3 Context-free grammars A context-free grammar G is a quadruple (V, Σ, R, S), where – V is a finite set called the variables – Σ is a finite set, disjoint from V, called the terminals – R is a finite subset of V × ( V ∪ Σ )* called the rules – S ∈ V is called the start symbol For any A ∈ V and u ∈ (V ∪ Σ) *, we write A → G u whenever (A, u) ∈ R
CS 311 Mount Holyoke College 4 Arithmetic expressions and parse trees Consider G = (V, Σ, R, S), where – V ={,, } – Σ ={a, +, ×, (, )} – R ={ → G + |, → G × |, → G ( ) | a } – S = What about a × a +a ?
CS 311 Mount Holyoke College 5 Leftmost derivation A derivation of a string in a grammar is a leftmost derivation if: at every step the leftmost remaining variable is the one replaced
CS 311 Mount Holyoke College 6 Needlessly complicated? How about just → G + | × | | a A grammar G is ambiguous if some string w has two or more different leftmost derivations
CS 311 Mount Holyoke College 7 Regular languages are context-free
CS 311 Mount Holyoke College 8 Chomsky normal form A context-free grammar G is in Chomsky normal form –If every rule is of the form A → BC A → a where A,B,C ∈ V, B ≠ S ≠ C, and a ∈ Σ –We permit S → ε
CS 311 Mount Holyoke College 9 Chomsky normal form Theorem 2.9: Any context-free language is generated by a context-free grammar in Chomsky normal form Proof: 1.Make sure S appears only on the left 2.Remove empty rules: A → ε 3.Handle unit rules: A → B 4.Fix all the rest… For instance: – S → G ASA | aA – A → G b | ε
CS 311 Mount Holyoke College 10
CS 311 Mount Holyoke College 11 Balanced Brackets The grammar G = (V, Σ, R, S), where V = {S} Σ = {[, ]} R = { S → G ε, S → G SS, S → G [S]} generates all strings of balanced brackets Is the language L(G) is regular? –Why/Why not?
CS 311 Mount Holyoke College 12 Recognizing Context-Free Languages Grammars are language generators. It is not immediately clear how they might be used as language recognizers. The language L(G) of balanced brackets is not regular. It cannot be recognized by a finite state automaton. However, it is very similar to the BEGIN…END blocks of many procedural languages and, therefore, must be recognizable by some compiler or interpreter!
CS 311 Mount Holyoke College 13 Auxiliary storage We could recognize the language L(G) of balanced brackets by reading left to right, if we could remember left brackets along the way. [[][[]]] Must match some left bracket along the way
CS 311 Mount Holyoke College 14 Pushdown Automata The last left bracket seen matches the first right bracket. This suggests a stack storage mechanism. [ [ [ [ ] ] [ [ [ [ ] ] ] ] ] ] [ [ [ [ [ [ Finite control $ $ stack or pushdown store reading head
CS 311 Mount Holyoke College 15 Describing a pushdown machine
CS 311 Mount Holyoke College 16 Formally… A pushdown automaton is a sextuple M = (Q, Σ, Γ, δ, q 0, F), where – Q is a finite set of states – Σ is a finite alphabet (the input symbols) – Γ is a finite alphabet (the stack symbols) – δ: (Q × Σ ε ×Γ ε ) → P(Q × Γ ε ) is the transition function – q 0 ∈ Q is the initial state, and – F ⊆ Q is the set of accept states