Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim
Outline Context-free grammar —Specify syntactic structure of the programming languages —Efficient and well-defined algorithm Context-free grammar’s features Grammar conversion Push-down Automata (2011-1) Compiler2
1. Introduction (1) Token structure regular expression (regular grammar) Structure of the programming languages context-free grammar —Simple and easy to understand —Automatically implement the recognizer from the grammar —Easy to translate (2011-1) Compiler3
1. Introduction (2) type 2CFG form : N. Chomsky 의 type 2 grammar Notational convention —Terminal symbol –Lower characters (a, b, c, …) and digits (0, 1, …, 9) –Operator symbols (+, -), comma, semi-colon, parenthesis, … –Symbols enclosed by ‘ ’ (‘if’, ‘then’) (2011-1) Compiler4 A , where A V N and V *
1. Introduction (3) —Nonterminal symbol –Upper characters (A, B, C, …) –Start symbol: S –Symbols enclosed by (,, …) —If no comment, left nonterminal of the first production is the start symbol —Alternation production: A 1, A 2 A 1 | 2 (2011-1) Compiler5
1. Introduction (4) Other symbols —X, Y, Z: terminal or nonterminal (X, Y, Z V) —u, v, z, : string composed of terminals string ( V T *) —, , : string composed of grammar symbol (, , V*) (2011-1) Compiler6
1. Introduction (5) 예 (2011-1) Compiler7 E E OP E | (E) | -E | id OP | | | / | ↑ V N = E, OP V T = (, ), , id, , , /, ↑ 'if' 'then' V N : symbol enclosed by V T : symbol enclosed by ‘ ’
2.1 Derivation (1) Derivation: 1 2 —Process from the the start symbol to the string Definition5.1 (2011-1) Compiler8 Leftmost derivation substitute left-most nonterminal left-sentential form Rightmost derivation substitute right-most nonterminal right-sentential form
2.1 Derivation (2) Example 4 —Leftmost derivation: E (E) (E+E) (a+E) (a+a) —Rightmost derivation: E (E) (E+E) (E+a) (a+a) (2011-1) Compiler9 E E + E | E * E | (E) | a
2.1 Derivation (3) Definition 5.2 (2011-1) Compiler10 Left parse order of production applied in the leftmost derivation top-down parsing Right parse order of production applied in the rightmost derivation bottom-up parsing
2.1 Derivation (4) Example 5: (a+a)*a —Left parse: —Right parse: (2011-1) Compiler11 1.E E + E 2.E E * E 3.E (E) 4.E a
2.2 Derivation Tree (1) Derivation tree —Represent the steps of the sentence derivation —Root, interior, terminal, leaf —Show the hierarchical structure of the sentence (2011-1) Compiler12
2.2 Derivation Tree (2) Definition 5.3 (2011-1) Compiler13 Derivation tree for context-free grammar G = {V T, V N, P, S} 1.Root node: S 2.Interior node: nonterminal symbol 3.Terminal node: terminal symbol or 4.If A A 1 A 2 …A k exists nodes A 1, A 2, …, A k become children of A
2.2 Derivation Tree (3) Derivation tree (ordered tree) —A X Y Z Example 6: left-most derivation for (a + a) (2011-1) Compiler14 A X Y Z A X Y Z E ( E ) E ( E ) E + E E ( E ) E + E a
2.2 Derivation Tree (4) Ambiguous tree: a+a*a (2011-1) Compiler15 E E + E a E * E a a E E * E E + E a a
2.3 Ambiguity (1) Definition 5.4 Example 7: if b then if b then a else a (2011-1) Compiler16 If a sentence generated by G has more than two derivation trees, grammar G is ambiguous. S if C then S else S b if C then S a b a S if C then S b if C then S else S b a a
2.3 Ambiguity (2) Deterministic parsing: unambiguous grammar Ambiguous non-ambiguous —Introduce a new nonterminal —Apply precedence & associativity rule (2011-1) Compiler17 (O) ambiguous nondeterministic (X)
2.3 Ambiguity (3) Example —Operator precedence: + < * —Left association —steps –The most basic operand F (factor): F (E) | a –Introduce T (term) for F which has *: T T * F | F –Expression E composed of + (2011-1) Compiler18 E E + E | E * E | (E) | a
(2011-1) Compiler19 E E + T T T * F F F a a a E E + T | T T T * F | F F (E) | a
2.3 Ambiguity (4) Ambiguous productions —Production: A AA —Sentential form: AAA —2 trees (2011-1) Compiler20 A A A A A A
3. Grammar Conversion (1) Grammar conversion —For efficient syntactic analysis —Substitution, expansion Definition 5.6 (2011-1) Compiler21 If L(G 1 ) = L(G 2 ), grammar G 1 and G 2 are equivalent
3. Grammar Conversion (2) Substitution —Remove specific production —add corresponding production (2011-1) Compiler22 A B , B V N, , V* B 1 |… | n A 1 | 2 | … | n
3. Grammar Conversion (3) Example 10 —Remove S aT —Add S aS | aSb |ac (2011-1) Compiler23 P = { S aT | bT, T S | Sb | c } P’ = { S aS | aSb | ac | bT, T S | Sb | c }
3. Grammar Conversion (4) Expansion —Split a production by introducing a new nonterminal symbol (2011-1) Compiler24 A A X, X A X , X or
3.1 Remove Useless Production (1) Useless production —Non-applicable production for sentence generation remove —Non-terminating nonterminal symbol —Inaccessible symbol (2011-1) Compiler25
3.1 Remove Useless Production (2) Definition 5.7 Definition 5.8 (2011-1) Compiler26 If there is no derivation like S * uXv * , V T * X is useless symbol - terminating nonterminal: A , * and V T * - accessible symbol: X when S * 1 X 2, 1, 2 V T *
3.1 Remove Useless Production (3) Removal methods —Remove productions with non-terminating nonterminal —Remove productions with Inaccessible symbol Algorithm for terminating nonterminal (2011-1) Compiler27 Algorithm terminating; begin V N ’ := {A | A P, V T *}; repeat V N ’ := V N ’ {A | A P, (V N ’ V T )*} until no change end.
3.1 Remove Useless Production (4) Example 11: P = {S A, S B, B a} —V N ’ = {B} V N ’ = {B, S} —V N - V N ’ = {A} —P’ = {S B, B a} (2011-1) Compiler28
3.1 Remove Useless Production (5) Algorithm for accessible symbol Example 12 (2011-1) Compiler29 Algorithm accessible; begin V’ := {S}; repeat V’ := V’ {X | some A X P, A V’} until no change end. G: S aB A aB A aC B C C b V’ = {S} V’ = {S, a, B} V’ = {S, a, B, C} V’ = {S, a, B, C, b} V – V’ = {A} P’ = {S aB, B C, C b}
3.1 Remove Useless Production (6) Steps of removing useless productions (2011-1) Compiler30 Terminating Nonterminal Accessible Symbol Context free productions Useful productions
3.1 Remove Useless Production (7) Example 13: P = {S aS, S A, S B, A aA, B a, C aa} —Get terminating nonterminals –V N ’ = {B, C} V N ’ = {B, C, S} –Non-terminating nonterminal = {A} –P’ = {S aS, S B, B a, C aa} —Accessible symbol –V’ = {S} V’ = {S, a, B} –Inaccessible symbol = {C} –P’’ = {S aS, S B, B a} (2011-1) Compiler31
3.2 Remove -Production (1) Definition 5.9 (2011-1) Compiler32 -free (1)P has no –production (2)Only S has –production and S must not appear on the right hand side of the other productions
3.2 Remove -Production (2) Algorithm for converting to –free grammar (2011-1) Compiler33 Algorithm -free; begin P’ := P – {A | A V N }; V N := {A | A + , A V N }; for A 0 B 1 1 B 2 …B k k P’, where and B i V N do if (B- 생성 규칙이 P’ 에 존재 ) A 0 B 1 1 B 2 …B k k 에 대하여 X i = 또는 X i =B i 의 조합에 의해 나올 수 있는 모든 생성 규칙을 P’ 에 추가 else A 0 B 1 1 B 2 …B k k 에서 X i = 인 생성 규칙을 P’ 에 추가 end for if S V N then P’ := P’ {S’ |S} end.
3.2 Remove -Production (3) —P’: set without -production —V N : set of nonterminals which can derive Nullable nonterminal A: A * (2011-1) Compiler34
3.2 Remove -Production (4) Get V N —From the production —From the derivation (2011-1) Compiler35 Algorithm Compute_ V N ; begin V N := {A | A P}; repeat V N := V N {B | B P, V N *} until no change end.
3.2 Remove -Production (5) Example 14 —P’ = {S aSbS | bSaS}, V N = {S} —S aSbS: S aSbS | abS | aSb | ab —S bSaS: S bSaS | baS | bSa | ba —P’ = {S aSbS | abS | aSb | ab | bSaS | baS | bSa | ba} —S’ S | , S aSbS | abS | aSb | ab | bSaS | baS | bSa | ba (2011-1) Compiler36 S aSbS | bSaS |
3.3 Remove Single Production (1) Single production —One nonterminal on the right hand side: A B —Unnecessary derivation slow parsing remove (2011-1) Compiler37
3.3 Remove Single Production (2) Algorithm for removing single production (2011-1) Compiler38 Algorithm Remove_Single_Production begin P’ := P – {A B | A, B V N }; for each A V N do V NA := {B | A + B}; for each B V NA do for each B P’ do (* not single production *) P’ := P’ {A } end for end.
3.3 Remove Single Production (3) Algorithm for computing V NA (2011-1) Compiler39 Algorithm Compute_V NA begin V NA := {B | A B P}; repeat V NA := V NA {C | B C P, B V NA } until no change end.
3.3 Remove Single Production (4) Example 15: —P’ = {E E+T, T T*F, F (E), F a} —V NE = {T, F} P’ = {E E+T | T*F | (E) | a, T T*F, F (E), F a} —V NT = {F} P’ = {E E+T | T*F | (E) | a, T T*F | (E) | a, F (E), F a} (2011-1) Compiler40 E E+T | TT T*F | FF (E) | a
3.3 Remove Single Production (5) Definition 5.10 (2011-1) Compiler41 cycle-free For all A V N, there is no derivation like A * A Proper Grammar (1)cycle-free (2) -free (3)No unnecessary symbols
3.4 Normal Form (1) Definition 5.11 (2011-1) Compiler42 Normal form Grammar (CNF: Chomsky Normal Form) (1) A BC (A, B, C V N ) (2) A a (a V T ) (3) If L(G), then S and S must not appear on the RHS
3.4 Normal Form (2) Context-free grammar CNF — -free grammar —A , production with || > 2: 2 symbols on RHS (2011-1) Compiler43 A X 1 ’ X 2 ’ … X k-1 ’X k ’
3.4 Normal Form (3) Example 16: —S a’ a’ a AB,S BA —A B’ BB,A a —B AS | b (2011-1) Compiler44 S aAB | BA A BBB | a B AS | b
4. CFG Notation (1) BNF (Backus-Naur Form) —Nonterminal symbol: —Terminal symbol: ‘, ’ — : ::= Example 17 (2011-1) Compiler45 E E+T | T T T*F | F F (E) | a ::= ‘+’ | ::= ‘*’ | ::= ‘(‘ ‘)’ | ‘a’
4. CFG Notation (2) EBNF (extended BNF) —Easy to read and simple —Meta symbol: simply represent the repetitive part and alternative part (2011-1) Compiler46 { } [ ]
4. CFG Notation (3) – ::=, | ::= {, } –Max/min # of repetition – ::= if then [else ] –BNF: ::= | ‘[‘ ‘]’ –EBNF: ::= [ ‘[‘ ‘]’ ] (2011-1) Compiler47
4. CFG Notation (4) Parenthesis and alternation: simple representation (2011-1) Compiler48 ::= + | - | * | / ::= (+|-|*|/)
4. CFG Notation (5) Syntax diagram —Show grammar by figure: easy to understand the syntactic structure —Notation –Nonterminal: rectangle –Terminal: circle, ellipse –Arc: link (2011-1) Compiler49 A a
4. CFG Notation (6) —Example –A ::= X 1 X 2 … Xn (2011-1) Compiler50... X1X1 X2X2 XnXn X1X1 X2X2 XnXn
4. CFG Notation (7) —A ::= 1 | 2 |...| n —A ::= {} —A ::= [] (2011-1) Compiler51 A.. 11 22 A A
4. CFG Notation (8) —A ::= ( 1 | 2 | ) Example 22 (2011-1) Compiler52 A 11 22 A ::= a | (B) B ::= AC C ::= {+A} C B A C A B a () A +
4. CFG Notation (9) —Synthesis (2011-1) Compiler53 A A A a () +
4. CFG Notation (10) Example 24: integer variable declaration in C —Format: keyword int variable list (comma) semi-colon —BNF – ::= int ; – ::=, | (2011-1) Compiler54
4. CFG Notation (11) —EBNF – ::= int {, } ; —Syntax diagram (2011-1) Compiler55 int; id, int_dcl