Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Context-Free Grammar and Parsing
Sung-Dong Kim Dept. of Computer Engineering, Hansung University
2
Abstract (1) Parsing Grammar rules of a context-free grammar
Task of determining the syntax, or structure of a program Syntax analysis Grammar rules of a context-free grammar Syntax of programming language (2010-1) Compiler
3
Abstract (2) Context-free grammar Parse tree (syntax tree) Recursive
Recognition algorithm Recursive calls Explicitly managed parsing stacks Parse tree (syntax tree) Basic structure (2010-1) Compiler
4
Abstract (3) Recognition algorithms This chapter Top-down parsing
Bottom-up parsing This chapter General description of parsing process Basic theory of context-free grammars Syntax of the TINY language (2010-1) Compiler
5
1. Parsing Process Determine the syntactic structure of a program from the tokens parse tree Single-pass compiler No explicit syntax tree Multipass compiler Further passes will use the syntax tree as input Parser Parse tree / Syntax tree Sequence of tokens (2010-1) Compiler
6
2. Context-Free Grammars
Specification for the syntactic structure of a programming language Example: integer arithmetic expression Backus-Naur form: BNF exp exp op exp | ( exp ) | number op + | - | * (2010-1) Compiler
7
2.2 Specification of CFG Rules (1)
Specify rules using the symbols that are usually tokens The rule defines the structure Name: left of the arrow Layout of the structure: right of the arrow exp exp op exp | ( exp ) | number op + | - | * Expression structure Operator (2010-1) Compiler
8
2.2 Specification of CFG Rules (2)
Another representation Using ( ) as metasymbols <exp> ::= <exp> <op> <exp> | ( <exp> ) | NUMBER <op> ::= + | - | * exp exp ( “+” | “-” | “*” ) exp | “(” exp “)” | number (2010-1) Compiler
9
2.2 Specification of CFG Rules (3)
Shortened notation E E O E | ( E ) | n O + | - | * E E O E | ( E ) | a O + | - | * (2010-1) Compiler
10
2.3 Derivations and the Languages (1)
CFG rules determine the set of syntactically legal strings of token symbols Examples (34-3)*42 ( number – number ) * number (34-3*42: illegal expression (2010-1) Compiler
11
2.3 Derivations and the Languages (2)
Sequence of replacements of structure names by choices on the right-hand sides of grammar rules Single structure name a string of token symbols (2010-1) Compiler
12
2.3 Derivations and the Languages (3)
Derivation of (34-3)*42 exp exp op exp exp op number exp * number ( exp ) * number ( exp op exp ) * number ( exp op number ) * number ( exp - number ) * number ( number – number ) * number (2010-1) Compiler
13
2.3 Derivations and the Languages (4)
Languages defined by the grammar Set of all strings of token symbols obtained by derivations from the exp symbol All syntactically legal expressions Productions Grammar rules L(G) = { s | exp * s} (2010-1) Compiler
14
2.3 Derivations and the Languages (5)
Nonterminals Structure names They must be replaced further on in a derivation Terminals Symbols in the alphabet They terminate a derivation (2010-1) Compiler
15
2.3 Derivations and the Languages (6)
Example 3.1 E ( E ) | a L(G) = { (n a)n | n is integer 0} (2010-1) Compiler
16
2.3 Derivations and the Languages (7)
Example 3.2 E ( E ) The grammar generates no string at all Example 3.3 E E + a | a L(G) = { a, a+a, a+a+a, …} E E + a E + a + a E + a + a + a … (2010-1) Compiler
17
2.3 Derivations and the Languages (8)
Example 3.4 statement if-stmt | other if-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1 other if (0) other if (1) other if (0) other else other if (1) other else other … (2010-1) Compiler
18
2.3 Derivations and the Languages (9)
Left recursive A Aa | a A Aα | β Right recursive A aA | a A αA | β ε-production empty ε A Aa | ε A aA | ε (2010-1) Compiler
19
2.3 Derivations and the Languages (10)
Example 3.6 Example 3.7 L(G) = { s, s;s, s;s;s, … } separator ‘;’ statement if-stmt | other if-stmt if ( exp ) statement else-part else-part else statement | ε exp 0 | 1 stmt-sequence stmt ; stmt-sequence | stmt stmt s (2010-1) Compiler
20
2.3 Derivations and the Languages (11)
Empty statement L(G’) = { s, s;s, s;s;s, … } terminator‘;’ stmt-sequence stmt ; stmt-sequence | ε stmt s stmt-sequence nonempty-stmt-sequence | ε nonempty-stmt-sequence stmt ; stmt-sequence | ε stmt s (2010-1) Compiler
21
3.1 Parse Tree (1) Many derivations for the same string
Derivation of (34-3)*42 Structure of a string of terminals exp exp op exp ( exp )op exp ( exp op exp ) op exp (number op exp ) op exp (number - exp ) op exp (number - number ) op (number - number ) * exp (number – number ) * number (2010-1) Compiler
22
3.1 Parse Tree (2) Parse tree Labeled tree Interior nodes Leaf nodes
Labeled by nonterminals Represent the steps in a derivation Leaf nodes Labeled by terminals Token appears Children: replacement of the associated nonterminal (2010-1) Compiler
23
3.1 Parse Tree (3) Another derivation exp op (1) (2) (3) (4) number +
exp exp op exp (1) number op exp (2) number + exp (3) number + number (4) Preorder numbering = leftmost derivation exp op number + (1) (4) (3) (2) exp exp op exp (1) exp op number (2) exp + number (3) number + number (4) Postorder numbering = rightmost derivation (2010-1) Compiler
24
3.1 Parse Tree (4) Parse tree for the (34 – 3) * 42 exp op number * )
- (2010-1) Compiler
25
3.2 Abstract Syntax Tree (1)
Principle of syntax-directed translation Meaning (semantics) should be directly related to its syntactic structure represented by the parse tree Example: parse tree should imply that the value 3 and 4 are to be added ABS Root: operation Leaf: value + 3 4 (2010-1) Compiler
26
3.2 Abstract Syntax Tree (2)
+ 3 4 34 - Another example (34 – 3) * 42 The () tokens disappeared Still represents the meaning ABS (syntax tree) Represent abstractions of the token sequences Token sequences cannot be recovered Contain all the information needed for translation, in a more efficient form (2010-1) Compiler
27
3.2 Abstract Syntax Tree (3)
3+4 = OpExp(Plus, ConstExp(3), ConstExp(4)) (34-3)*42 = OpExp(Times, OpExp(Minus, ConstExp(34), ConstExp(3)), ConstExp(42)) BNF-like rules exp OpExp(op,exp,exp) | ConstExp(integer) op Plus | Minus | Times (2010-1) Compiler
28
3.2 Abstract Syntax Tree (4)
Actual syntax tree structure typedef enum {Plus,Minus,Times} OpKind; typedef enum {OpK,ConstK} ExpKind; typedef struct streenode { ExpKind kind; OpKind op; struct streenode *lchild *rchild; int val; } STreeNode; typedef STreeNode *SyntaxTree; (2010-1) Compiler
29
4. Ambiguity (1) Ambiguous grammar
Grammar that generates a string with two distinct parse trees Ex: exp exp op exp | ( exp ) | number op + | - | * op exp number * - op exp number * - (2010-1) Compiler
30
4. Ambiguity (2) Dealing with ambiguities Disambiguating rule
Changing the grammar by removing the ambiguity Precedence Associativity: “left” or “right” Fully parenthesized expressions exp factor op factor | factor factor ( exp ) | number op + | - | * (2010-1) Compiler
31
4. Ambiguity (3) Precedence cascade
Grouping the operators into groups of equal precedence exp exp addop exp | term addop + | - term term multop term | factor multop * factor ( exp ) | number (2010-1) Compiler
32
4. Ambiguity (4) Associativity Left: exp exp addop term | term
term term multop factor | factor multop * factor ( exp ) | number (2010-1) Compiler
33
34-3*42 34-3-42 addop term exp number * - factor multop exp number -
(2010-1) Compiler
34
4. Ambiguity (5) Dangling else problem statement if-stmt | other
if-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1 if (0) if (1) other else other (2010-1) Compiler
35
if-stmt exp if statement other else ( ) 1 if-stmt exp if statement
statement other else ( ) 1 if-stmt exp if statement other else ( ) 1 (2010-1) Compiler
36
4. Ambiguity (6) Ambiguity removal Ex: if (x != 0)
Most closely nested rule Easy!!! if (x != 0) if (y == 1/x) ok = TRUE; else x = 1/x; (2010-1) Compiler
37
4. Ambiguity (7) Grammar conversion Difficult!!!
statement matched-stmt | unmatched-stmt matched-stmt if ( exp ) matched-stmt else matched-stmt | other unmatched-stmt if ( exp ) statement | if ( exp ) matched-stmt else unmatched-stmt exp 0 | 1 (2010-1) Compiler
38
unmatched-stmt exp if else ( ) statement matched-stmt other 1
else ( ) statement matched-stmt other 1 (2010-1) Compiler
39
4. Ambiguity (8) Requiring the presence of the else-part
Using bracketing keyword endif statement if-stmt | other if-stmt if ( exp ) statement else statement exp 0 | 1 statement if-stmt | other if-stmt if ( exp ) statement endif | if ( exp ) statement else statement endif exp 0 | 1 (2010-1) Compiler
40
5. Extended notations (1) EBNF Notations
Repetitive, optional constructs Left recursive: A A | Right recursive: A A | Curly brackets A {} A {} statement if-stmt | other if-stmt if ( exp ) statement [ else statement ] exp 0 | 1 (2010-1) Compiler
41
5. Extended notations (2) Syntax diagrams
Graphical representations for visually representing EBNF rules Rectangle: nonterminal Circle(oval): terminal Arrow: choice | sequencing Ex: factor ( exp ) | number exp number ( ) factor (2010-1) Compiler
42
5. Extended notations (3) Repetition Optional constructs
(2010-1) Compiler
43
5. Extended notations (4) Ex 3.10 BNF exp exp addop term | term
EBNF exp exp addop term | term addop + | - term term mulop factor | factor mulop * factor ( exp ) | number exp term { addop term } addop + | - term factor { mulop factor } mulop * factor ( exp ) | number (2010-1) Compiler
44
(2010-1) Compiler
45
5. Extended notations (5) Ex 3.11 BNF statement if-stmt | other
EBNF statement if-stmt | other if-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1 statement if-stmt | other if-stmt if ( exp ) statement [ else statement ] exp 0 | 1 (2010-1) Compiler
46
(2010-1) Compiler
47
6. Formal Properties of CF Languages (1)
Formal definition T of terminals N of nonterminals P of productions (grammar rules): A A N (T N)* S: start symbol (S N) G = (T, N, P, S) (2010-1) Compiler
48
6. Formal Properties of CF Languages (2)
Derivation step A , , (T N)* A P T N: set of symbols in (T N)*: sentential form Derivation S * w w T*: sentence (2010-1) Compiler
49
6. Formal Properties of CF Languages (3)
Language generated by G L(G) = { w T* | there exists a derivation S * w of G } Leftmost derivation In each derivation step A , T* Rightmost derivation In each derivation step A , T* (2010-1) Compiler
50
6. Formal Properties of CF Languages (4)
Parse tree: rooted labeled tree Each node: terminal or nonterminal or Root node: start symbol S Each leaf node: terminal or Each nonleaf node: nonterminal Node with label A N has children X1, …, Xn: A X1X2…Xn P (2010-1) Compiler
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.