1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016
2 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm
3 Derivation Trees Illustrate the derivation of a certain sentence from a grammar Different derivation trees with different derivation orders
4 Derivation Order Consider the following example grammar with 5 productions:
5 Leftmost derivation order of string : At each step, we substitute the leftmost variable
6 Rightmost derivation order of string : At each step, we substitute the rightmost variable
7 Rightmost derivation of : Leftmost derivation of :
8 Derivation Trees Consider the same example grammar: And a derivation of :
9 yield
10 yield
11 yield
12 yield
13 yield Derivation Tree (parse tree)
14 Give same derivation tree Sometimes, derivation order doesn’t matter Leftmost derivation: Rightmost derivation:
15 Ambiguity A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguity and non-inherent ambiguity
16 Grammar for mathematical expressions Example strings: Denotes any number
17 A leftmost derivation for
18 Another leftmost derivation for
19 Two derivation trees for
20 take
21 Good TreeBad Tree Compute expression result using the tree
22 Two different derivation trees may cause problems in applications which use the derivation trees: Evaluating expressions In general, in compilers for programming languages
23 Ambiguous Grammar: A context-free grammar is ambiguous if there is a string which has: two different derivation trees or two leftmost derivations (Two different derivation trees give two different leftmost derivations and vice-versa)
24 stringhas two derivation trees this grammar is ambiguous since Example:
25 stringhas two leftmost derivations this grammar is ambiguous also because
26 IF_STMTif EXPR then STMT if EXPR then STMT else STMT Another ambiguous grammar: VariablesTerminals Very common piece of grammar in programming languages
27 If expr1 then if expr2 then stmt1 else stmt2 IF_STMT expr1then elseifexpr2then STMT stmt1 if IF_STMT expr1thenelse ifexpr2then STMTstmt2 if stmt1 stmt2 Two derivation trees
28 In general, ambiguity is bad and we want to remove it Sometimes it is possible to find a non-ambiguous grammar for a language But, in general we cannot do so
29 Ambiguous Grammar Non-Ambiguous Grammar Equivalent generates the same language A successful example:
30 Unique derivation tree for
31 An un-successful example: every grammar that generates this language is ambiguous is inherently ambiguous:
32 Example (ambiguous) grammar for :
33 The string has always two different derivation trees (for any grammar) For example
34 Ambiguity: Summary A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguous language –All grammars are ambiguous Non-inherent ambiguous language –There exist at least one grammar that is not ambiguous Checking ambiguity of a grammar or a language: un-decidable problem
35 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm
36 Normal Forms Chomsky Normal Forms BNF Normal Forms
37 A → BC A → α A context free grammar is said to be in Chomsky Normal Form if all productions are in the following form: A, B and C are non terminal symbols α is a terminal symbol
38 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications
39 Eliminate Useless Symbols We need to determine if the symbol is useful by identifying if a symbol is generating and is reachable X is generating if X ω for some terminal string ω. X is reachable if there is a derivation X αXβ for some α and β * *
40 Example: Removing non-generating symbols S → AB | a A → b S → AB | a A → b Initial CFL grammar S → AB | a A → b S → AB | a A → b Identify generating symbols S → a A → b S → a A → b Remove non-generating
41 Example: Removing non-reachable symbols S → a Eliminate non-reachable S → a A → b S → a A → b Identify reachable symbols
42 The order is important. S → AB | a A → b S → AB | a A → b Looking first for non-reachable symbols and then for non-generating symbols can still leave some useless symbols. S → a A → b S → a A → b
43 Finding generating symbols If there is a production A → α, and every symbol of α is already known to be generating. Then A is generating S → AB | a A → b S → AB | a A → b We cannot use S → AB because B has not been established to be generating
44 Finding reachable symbols S is surely reachable. All symbols in the body of a production with S in the head are reachable. S → AB | a A → b S → AB | a A → b In this example the symbols {S, A, B, a, b} are reachable.
45 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications
46 Eliminate ε Productions In a grammar ε productions are convenient but not essential If L has a CFG, then L – {ε} has a CFG Nullable variable A ε *
47 If A is a nullable variable Whenever A appears on the body of a production A might or might not derive ε S → ASA | aB A → B | S B → b | ε Nullable: {A, B}
48 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions
49 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions
50 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions
51 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications
52 Eliminate unit productions A unit production is one of the form A → B where both A and B are variables A B * A → B, B → ω, then A → ω Identify unit pairs
53 Example: I → a | b | Ia | Ib | I0 | I1 F → I | (E) T → F | T * F E → T | E + T PairsProductions ( E, E )E → E + T ( E, T )E → T * F ( E, F )E → (E) ( E, I )E → a | b | Ia | Ib | I0 | I1 ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 ( F, F )F → (E) ( F, I )F → a | b | Ia | Ib | I0 | I1 ( I, I )I → a | b | Ia | Ib | I0 | I1 Basis: (A, A) is a unit pair of any variable A, if A A by 0 steps. * T = {*, +, (, ), a, b, 0, 1}
54 Example: PairsProductions …… ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 …… I → a | b | Ia | Ib | I0 | I1 E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1
55 Chomsky Normal Form (CNF) 1.Arrange that all bodies of length 2 or more to consists only of variables. 2.Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables. Starting with a CFL grammar with the preliminary simplifications performed
56 Step 1: For every terminal α that appears in a body of length 2 or more create a new variable that has only one production. E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1 I → a | b | Ia | Ib | I0 | I1 E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → )
57 Step 2: Break bodies of length 3 or more adding more variables E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → ) C 1 → PT C 2 → MF C 3 → ER
58 Normal Forms Chomsky Normal Forms BNF Normal Forms
59 BNF BNF stands for either Backus-Naur Form or Backus Normal Form BNF is used to describe the grammar of a programming language BNF is formal and precise –BNF is a notation for context-free grammars BNF is essential in compiler construction
60 BNF indicate a nonterminal that needs to be further expanded, e.g. Symbols not enclosed in are terminals; they represent themselves, e.g. if, while, ( The symbol ::= means is defined as The symbol | means or; it separates alternatives, e.g. ::= + | - This is all there is to “plain” BNF; but we will discuss extended BNF (EBNF) later in this lecture
61 BNF uses recursion ::= | or ::= | Recursion is all that is needed (at least, in a formal sense) "Extended BNF" allows repetition as well as recursion Repetition is usually better when using BNF to construct a compiler
62 BNF Examples I ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= if ( ) | if ( ) else
63 BNF Examples II ::= | ::= | + | -
64 BNF Examples III ::= | | ::= { } ::= |
65 BNF Examples IV ::= | |...
66 Extended BNF The following are pretty standard: –[ ] enclose an optional part of the rule Example: ::= if ( ) [ else ] –{ } mean the enclosed can be repeated any number of times (including zero) Example: ::= ( ) | ( {, } )
67 Variations The preceding notation is the original and most common notation –BNF was designed before we had boldface, color, more than one font, etc. –A typical modern variation might: – Use boldface to indicate multi-character terminals –Quote single-character terminals (because boldface isn’t so obvious in this case) Example: –if_statement ::= if "(" condition ")" statement [ else statement ]
68 Limitations of BNF No easy way to impose length limitations, such as maximum length of variable names No easy way to describe ranges, such as 1 to 31 No way at all to impose distributed requirements, such as, a variable must be declared before it is used Describes only syntax, not semantics
69 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm
70 The CYK Algorithm The membership problem: –Problem: Given a context-free grammar G and a string w –G = (V, ∑,P, S) where »V finite set of variables »∑ (the alphabet) finite set of terminal symbols »P finite set of rules »S start symbol (distinguished element of V) »V and ∑ are assumed to be disjoint –G is used to generate the string of a language –Question: Is w in L(G)?
71 The CYK Algorithm J. Cocke D. Younger, T. Kasami –Independently developed an algorithm to answer this question.
72 The CYK Algorithm Basics –The Structure of the rules in a Chomsky Normal Form grammar –Uses a “dynamic programming” or “table- filling algorithm”
73 Chomsky Normal Form Normal Form is described by a set of conditions that each rule in the grammar must satisfy Context-free grammar is in CNF if each rule has one of the following forms: –A BCat most 2 symbols on right side –A a, orterminal symbol –S εnull string where B, C Є V – {S}
74 Construct a Triangular Table Each row corresponds to one length of substrings –Bottom Row – Strings of length 1 –Second from Bottom Row – Strings of length 2. –Top Row – string ‘w’
75 Construct a Triangular Table X i, i is the set of variables A such that A w i is a production of G Compare at most n pairs of previously computed sets: (X i, i, X i+1, j ), (X i, i+1, X i+2, j ) … (X i, j-1, X j, j )
76 Construct a Triangular Table X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Table for string ‘w’ that has length 5
77 X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Construct a Triangular Table Looking for pairs to compare
78 Example CYK Algorithm Show the CYK Algorithm with the following example: –CNF grammar G S AB | BC A BA | a B CC | b C AB | a –w is baaba –Question Is baaba in L(G)?
79 Constructing The Triangular Table {B}{A, C} {B}{A, C} baaba Calculating the Bottom ROW S AB | BC A BA | a B CC | b C AB | a
80 Constructing The Triangular Table X 1, 2 = (X i, i,X i+1, j ) = (X 1, 1, X 2, 2 ) {B}{A,C} = {BA, BC} Steps: –Look for production rules to generate BA or BC –There are two: S and A –X 1, 2 = {S, A} S AB | BC A BA | a B CC | b C AB | a
81 Constructing The Triangular Table {S, A} {B}{A, C} {B}{A, C} baaba
82 Constructing The Triangular Table X 2, 3 = (X i, i,X i+1, j ) = (X 2, 2, X 3, 3 ) {A, C}{A,C} = {AA, AC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 3 = {B} S AB | BC A BA | a B CC | b C AB | a
83 Constructing The Triangular Table {S, A}{B} {A, C} {B}{A, C} baaba
84 Constructing The Triangular Table X 3, 4 = (X i, i,X i+1, j ) = (X 3, 3, X 4, 4 ) {A, C}{B} = {AB, CB} = Y Steps: –Look for production rules to generate Y –There are two: S and C –X 3, 4 = {S, C} S AB | BC A BA | a B CC | b C AB | a
85 Constructing The Triangular Table {S, A}{B}{S, C} {B}{A, C} {B}{A, C} baaba
86 Constructing The Triangular Table X 4, 5 = (X i, i,X i+1, j ) = (X 4, 4, X 5, 5 ) {B}{A, C} = {BA, BC} = Y Steps: –Look for production rules to generate Y –There are two: S and A –X 4, 5 = {S, A} S AB | BC A BA | a B CC | b C AB | a
87 Constructing The Triangular Table {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba
88 Constructing The Triangular Table X 1, 3 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 1, 1, X 2, 3 ), (X 1, 2, X 3, 3 ) {B}{B} U {S, A}{A, C}= {BB, SA, SC, AA, AC} = Y Steps: –Look for production rules to generate Y –There are NONE: S and A –X 1, 3 = Ø –no elements in this set (empty set) S AB | BC A BA | a B CC | b C AB | a
89 Constructing The Triangular Table Ø {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba
90 Constructing The Triangular Table X 2, 4 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 2, 2, X 3, 4 ), (X 2, 3, X 4, 4 ) {A, C}{S, C} U {B}{B}= {AS, AC, CS, CC, BB} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 4 = {B} S AB | BC A BA | a B CC | b C AB | a
91 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba
92 Constructing The Triangular Table X 3, 5 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 3, 3, X 4, 5 ), (X 3, 4, X 5, 5 ) {A,C}{S,A} U {S,C}{A,C} = {AS, AA, CS, CA, SA, SC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 3, 5 = {B} S AB | BC A BA | a B CC | b C AB | a
93 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba
94 Final Triangular Table {S, A, C} X 1, 5 Ø{S, A, C} Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba - Table for string ‘w’ that has length 5 - The algorithm populates the triangular table
95 Example (Result) Is baaba in L(G)? Yes We can see the S in the set X 1n where ‘n’ = 5 We can see the table the cell X 15 = (S, A, C) then if S Є X 15 then baaba Є L(G)
96 Theorem The CYK Algorithm correctly computes X i j for all i and j; thus w is in L(G) if and only if S is in X 1n. The running time of the algorithm is O(n 3 ).
97 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm