CFGs: Formal Definition G = (V, S, P, S) V = variables a finite set S = alphabet or terminals a finite set P = productions a finite set S = start variable SV Productions’ form, where AV, a(VS)*: A a
CFGs: Derivations Derivations in one step: bAg G bag Aa P xS*, a,b,g(VS)* Can choose any variable for use for derivation step. Derivations in zero-or-more steps: G* is the reflexive and transitive closure of G . Language of a grammar: L(G) = {xS* | S G* x}
Parse Trees Sample derivations: S A | A B A e | a | A b | A A B b | b c | B c | b B Sample derivations: S AB AAB aAB aaB aabB aabb S AB AbB Abb AAbb Aabb aabb These two derivations use same productions, but in different orders. S A B b a Root label = start node. Each interior label = variable. Each parent/child relation = derivation step. Each leaf label = terminal or e. All leaf labels together = derived string = yield.
Left- & Rightmost Derivations S A | A B A e | a | A b | A A B b | b c | B c | b B Sample derivations: S AB AAB aAB aaB aabB aabb S AB AbB Abb AAbb Aabb aabb S A B b a These two derivations are special. 1st derivation is leftmost. Always picks leftmost variable. 2nd derivation is rightmost. Always picks rightmost variable.
Left / Rightmost Derivations In proofs… Restrict attention to left- or rightmost derivations. In parsing algorithms… E.g., recursive descent uses leftmost; yacc uses rightmost.
Derivation Trees Other derivation trees for this string? S A | A B A e | a | A b | A A B b | b c | B c | b B w = aabb Other derivation trees for this string? S A B b a S A B b a S A b a e Infinitely many others possible.
Defining ambiguity of grammar, not language. CFG ambiguous any of following equivalent statements: string w with multiple derivation trees. string w with multiple leftmost derivations. string w with multiple rightmost derivations. Defining ambiguity of grammar, not language.
Ambiguity & Disambiguation Given ambiguous grammar, would like an equivalent unambiguous grammar. Allows more knowledge about structure of derivation. Simplifies inductive proofs on derivations. Can lead to more efficient parsing algorithms. In programming languages, want to impose a canonical structure on derivations. E.g., for 1+23. Strategy: Force an ordering on all derivations.
Disambiguation Example Exp n | Exp + Exp | Exp Exp What is an equivalent unambiguous grammar? Exp Term | Term + Exp Term n | n Term Uses operator precedence left-associativity
Parsing Designations Major parsing algorithm classes are LL and LR The first letter indicates what order the input is read – L means left to right Second letter is direction in the “parsing tree” the derivation goes, L = top down, R = bottom up K of LL(k) or LR(k) is number of symbols lookahead in input during parsing Power of parsing techniques LL(k) < LR(k) LL(n) < LL(n+1), LR(n) < LR(n+1) Choice of LL or LR largely religious