CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language
CS466(Prasad)L7Parse2 Graph of a Grammar Represents leftmost derivations of a CFG. path –A path from node S to a node w is a leftmost derivation. NodesLeft sentential forms Arc labelsProduction rules RootStart Symbol LeavesSentences
CS466(Prasad)L7Parse3 S aSbB aaSabB abaB bbS bbC …
CS466(Prasad)L7Parse4 Properties of Graph of a Grammar Every node has a finite number of children. Simple breadth-first enumeration feasible. The number of leaves is infinite if the language is infinite. Typical case. There can be infinite long paths (derivations). Loops in depth-first traversals.
CS466(Prasad)L7Parse5 S aSSb aaSaab aSb abb Sbb … (Illustrates ambiguity in the grammar.) ab Directed Acyclic Graph
CS466(Prasad)L7Parse6 (Illustrates ambiguous grammar with cycles.) Cyclic structure S SS SSS
CS466(Prasad)L7Parse7 Parser A program that determines if a string by constructing a derivation. Equivalently, it searches the graph of G. –Top-down parsers Constructs the derivation tree from root to leaves. Leftmost derivation. –Bottom-up parsers Constructs the derivation tree from leaves to root. Rightmost derivation in reverse.
CS466(Prasad)L7Parse8 S SS S S SS a S SS ab Leftmost derivation Derivation Trees
CS466(Prasad)L7Parse9 S SS S S SS b S SS S S SS a ab b Rightmost Derivation in Reverse Rightmost derivation Derivation Trees SS
CS466(Prasad)L7Parse10 Top-down parsers: Breadth-first vs Depth-first Search the graph of a grammar breadth-first Uses: Queue (+) Always terminates with shortest derivation (-) Inefficient in general. Search the graph of a grammar depth-first Uses: Stack (-) Can get into infinite loops (e.g., left recursion) (+) Efficient in general.
CS466(Prasad)L7Parse11 Determining when Number of terminals in sentential form > length of w Prefix of sentential form preceding the leftmost non-terminal not a prefix of w. No rules applicable to sentential form.
CS466(Prasad)L7Parse12 Parsing Examples
CS466(Prasad)L7Parse13 Breadth-first top-down parser S A TA+T b(A) T+T A+T+T (T)(A+T) (b)((A)) … … … T+T+T A+T+T+T …… Queue-up left sentential forms level by level (T)+T (A)+T (b)+T (b)+b Parse successful
CS466(Prasad)L7Parse14 Depth-first top-down parser S A TA+T b(A) T+T A+T+T (T)(A+T) (b)((A)) … T+T+TA+T+T+T …… Use stack to pursue entire path from left Backtrack On failure Parse fails
CS466(Prasad)L7Parse15 Summary In BFTD version, all left derivations investigated in parallel. In DFTD version, one specific derivation is pursued to completion. Done, if succeeds. Otherwise, backtrack and investigate another path. (Incomplete strategy) (Used by Prolog interpreter)
CS466(Prasad)L7Parse16 Bottom-up parsing (b)+b (T)+b (b)+T (T)+T Not allowed (b)+A(T)+T … … (A)+b (A)+T(S)+bT+b A+b A+TSA Parse successful
CS466(Prasad)L7Parse17 Practical Parsers Language/Grammar designed to enable deterministic (directed and backtrack-free) searches. Uses lookahead tokens and/or exploits the context in the sentential form constructed so far. “Look before you leap.” vs “Procrastination principle.” –Top-down parsers : LL(k) languages E.g., Pascal, Ada, etc. Better error diagnosis and recovery. –Bottom-up parsers : LALR(1), LR(k) languages E.g., C/C++, Java, etc. Handles left recursion in the grammar. –Backtracking parsers E.g., Prolog interpreter.