Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Context-Free Grammar and Parsing

Similar presentations


Presentation on theme: "Chapter 3 Context-Free Grammar and Parsing"— Presentation transcript:

1 Chapter 3 Context-Free Grammar and Parsing
Sung-Dong Kim Dept. of Computer Engineering, Hansung University

2 Abstract (1) Parsing Grammar rules of a context-free grammar
Task of determining the syntax, or structure of a program Syntax analysis Grammar rules of a context-free grammar Syntax of programming language (2010-1) Compiler

3 Abstract (2) Context-free grammar Parse tree (syntax tree) Recursive
Recognition algorithm Recursive calls Explicitly managed parsing stacks Parse tree (syntax tree) Basic structure (2010-1) Compiler

4 Abstract (3) Recognition algorithms This chapter Top-down parsing
Bottom-up parsing This chapter General description of parsing process Basic theory of context-free grammars Syntax of the TINY language (2010-1) Compiler

5 1. Parsing Process Determine the syntactic structure of a program from the tokens  parse tree Single-pass compiler No explicit syntax tree Multipass compiler Further passes will use the syntax tree as input Parser Parse tree / Syntax tree Sequence of tokens (2010-1) Compiler

6 2. Context-Free Grammars
Specification for the syntactic structure of a programming language Example: integer arithmetic expression Backus-Naur form: BNF exp  exp op exp | ( exp ) | number op  + | - | * (2010-1) Compiler

7 2.2 Specification of CFG Rules (1)
Specify rules using the symbols that are usually tokens The rule defines the structure Name: left of the arrow Layout of the structure: right of the arrow exp  exp op exp | ( exp ) | number op  + | - | * Expression structure Operator (2010-1) Compiler

8 2.2 Specification of CFG Rules (2)
Another representation Using ( ) as metasymbols <exp> ::= <exp> <op> <exp> | ( <exp> ) | NUMBER <op> ::= + | - | * exp  exp ( “+” | “-” | “*” ) exp | “(” exp “)” | number (2010-1) Compiler

9 2.2 Specification of CFG Rules (3)
Shortened notation E  E O E | ( E ) | n O  + | - | * E  E O E | ( E ) | a O  + | - | * (2010-1) Compiler

10 2.3 Derivations and the Languages (1)
CFG rules determine the set of syntactically legal strings of token symbols Examples (34-3)*42  ( number – number ) * number (34-3*42: illegal expression (2010-1) Compiler

11 2.3 Derivations and the Languages (2)
Sequence of replacements of structure names by choices on the right-hand sides of grammar rules Single structure name  a string of token symbols (2010-1) Compiler

12 2.3 Derivations and the Languages (3)
Derivation of (34-3)*42 exp  exp op exp  exp op number  exp * number  ( exp ) * number  ( exp op exp ) * number  ( exp op number ) * number  ( exp - number ) * number  ( number – number ) * number (2010-1) Compiler

13 2.3 Derivations and the Languages (4)
Languages defined by the grammar Set of all strings of token symbols obtained by derivations from the exp symbol All syntactically legal expressions Productions Grammar rules L(G) = { s | exp * s} (2010-1) Compiler

14 2.3 Derivations and the Languages (5)
Nonterminals Structure names They must be replaced further on in a derivation Terminals Symbols in the alphabet They terminate a derivation (2010-1) Compiler

15 2.3 Derivations and the Languages (6)
Example 3.1 E  ( E ) | a L(G) = { (n a)n | n is integer  0} (2010-1) Compiler

16 2.3 Derivations and the Languages (7)
Example 3.2 E  ( E ) The grammar generates no string at all Example 3.3 E  E + a | a L(G) = { a, a+a, a+a+a, …} E  E + a  E + a + a  E + a + a + a  … (2010-1) Compiler

17 2.3 Derivations and the Languages (8)
Example 3.4 statement  if-stmt | other if-stmt  if ( exp ) statement | if ( exp ) statement else statement exp  0 | 1 other if (0) other if (1) other if (0) other else other if (1) other else other (2010-1) Compiler

18 2.3 Derivations and the Languages (9)
Left recursive A  Aa | a A  Aα | β Right recursive A  aA | a A  αA | β ε-production empty  ε A  Aa | ε A  aA | ε (2010-1) Compiler

19 2.3 Derivations and the Languages (10)
Example 3.6 Example 3.7 L(G) = { s, s;s, s;s;s, … }  separator ‘;’ statement  if-stmt | other if-stmt  if ( exp ) statement else-part else-part  else statement | ε exp  0 | 1 stmt-sequence  stmt ; stmt-sequence | stmt stmt  s (2010-1) Compiler

20 2.3 Derivations and the Languages (11)
Empty statement L(G’) = { s, s;s, s;s;s, … }  terminator‘;’ stmt-sequence  stmt ; stmt-sequence | ε stmt  s stmt-sequence  nonempty-stmt-sequence | ε nonempty-stmt-sequence  stmt ; stmt-sequence | ε stmt  s (2010-1) Compiler

21 3.1 Parse Tree (1) Many derivations for the same string
Derivation of (34-3)*42 Structure of a string of terminals exp  exp op exp  ( exp )op exp  ( exp op exp ) op exp  (number op exp ) op exp  (number - exp ) op exp  (number - number ) op  (number - number ) * exp  (number – number ) * number (2010-1) Compiler

22 3.1 Parse Tree (2) Parse tree Labeled tree Interior nodes Leaf nodes
Labeled by nonterminals Represent the steps in a derivation Leaf nodes Labeled by terminals Token appears Children: replacement of the associated nonterminal (2010-1) Compiler

23 3.1 Parse Tree (3) Another derivation exp op (1) (2) (3) (4) number +
exp  exp op exp (1)  number op exp (2)  number + exp (3)  number + number (4) Preorder numbering = leftmost derivation exp op number + (1) (4) (3) (2) exp  exp op exp (1)  exp op number (2)  exp + number (3)  number + number (4) Postorder numbering = rightmost derivation (2010-1) Compiler

24 3.1 Parse Tree (4) Parse tree for the (34 – 3) * 42 exp op number * )
- (2010-1) Compiler

25 3.2 Abstract Syntax Tree (1)
Principle of syntax-directed translation Meaning (semantics) should be directly related to its syntactic structure represented by the parse tree Example: parse tree should imply that the value 3 and 4 are to be added ABS Root: operation Leaf: value + 3 4 (2010-1) Compiler

26 3.2 Abstract Syntax Tree (2)
+ 3 4 34 - Another example (34 – 3) * 42 The () tokens disappeared Still represents the meaning ABS (syntax tree) Represent abstractions of the token sequences Token sequences cannot be recovered Contain all the information needed for translation, in a more efficient form (2010-1) Compiler

27 3.2 Abstract Syntax Tree (3)
3+4 = OpExp(Plus, ConstExp(3), ConstExp(4)) (34-3)*42 = OpExp(Times, OpExp(Minus, ConstExp(34), ConstExp(3)), ConstExp(42)) BNF-like rules exp  OpExp(op,exp,exp) | ConstExp(integer) op  Plus | Minus | Times (2010-1) Compiler

28 3.2 Abstract Syntax Tree (4)
Actual syntax tree structure typedef enum {Plus,Minus,Times} OpKind; typedef enum {OpK,ConstK} ExpKind; typedef struct streenode { ExpKind kind; OpKind op; struct streenode *lchild *rchild; int val; } STreeNode; typedef STreeNode *SyntaxTree; (2010-1) Compiler

29 4. Ambiguity (1) Ambiguous grammar
Grammar that generates a string with two distinct parse trees Ex: exp  exp op exp | ( exp ) | number op  + | - | * op exp number * - op exp number * - (2010-1) Compiler

30 4. Ambiguity (2) Dealing with ambiguities Disambiguating rule
Changing the grammar by removing the ambiguity Precedence Associativity: “left” or “right” Fully parenthesized expressions exp  factor op factor | factor factor  ( exp ) | number op  + | - | * (2010-1) Compiler

31 4. Ambiguity (3) Precedence cascade
Grouping the operators into groups of equal precedence exp  exp addop exp | term addop  + | - term  term multop term | factor multop  * factor  ( exp ) | number (2010-1) Compiler

32 4. Ambiguity (4) Associativity Left: exp  exp addop term | term
term  term multop factor | factor multop  * factor  ( exp ) | number (2010-1) Compiler

33 34-3*42 34-3-42 addop term exp number * - factor multop exp number -
(2010-1) Compiler

34 4. Ambiguity (5) Dangling else problem statement  if-stmt | other
if-stmt  if ( exp ) statement | if ( exp ) statement else statement exp  0 | 1 if (0) if (1) other else other (2010-1) Compiler

35 if-stmt exp if statement other else ( ) 1 if-stmt exp if statement
statement other else ( ) 1 if-stmt exp if statement other else ( ) 1 (2010-1) Compiler

36 4. Ambiguity (6) Ambiguity removal Ex: if (x != 0)
Most closely nested rule Easy!!! if (x != 0) if (y == 1/x) ok = TRUE; else x = 1/x; (2010-1) Compiler

37 4. Ambiguity (7) Grammar conversion Difficult!!!
statement  matched-stmt | unmatched-stmt matched-stmt  if ( exp ) matched-stmt else matched-stmt | other unmatched-stmt  if ( exp ) statement | if ( exp ) matched-stmt else unmatched-stmt exp  0 | 1 (2010-1) Compiler

38 unmatched-stmt exp if else ( ) statement matched-stmt other 1
else ( ) statement matched-stmt other 1 (2010-1) Compiler

39 4. Ambiguity (8) Requiring the presence of the else-part
Using bracketing keyword endif statement  if-stmt | other if-stmt  if ( exp ) statement else statement exp  0 | 1 statement  if-stmt | other if-stmt  if ( exp ) statement endif | if ( exp ) statement else statement endif exp  0 | 1 (2010-1) Compiler

40 5. Extended notations (1) EBNF Notations
Repetitive, optional constructs Left recursive: A  A |  Right recursive: A  A |  Curly brackets A   {} A  {}  statement  if-stmt | other if-stmt  if ( exp ) statement [ else statement ] exp  0 | 1 (2010-1) Compiler

41 5. Extended notations (2) Syntax diagrams
Graphical representations for visually representing EBNF rules Rectangle: nonterminal Circle(oval): terminal Arrow: choice | sequencing Ex: factor  ( exp ) | number exp number ( ) factor (2010-1) Compiler

42 5. Extended notations (3) Repetition Optional constructs
(2010-1) Compiler

43 5. Extended notations (4) Ex 3.10 BNF exp  exp addop term | term
EBNF exp  exp addop term | term addop  + | - term  term mulop factor | factor mulop  * factor  ( exp ) | number exp  term { addop term } addop  + | - term  factor { mulop factor } mulop  * factor  ( exp ) | number (2010-1) Compiler

44 (2010-1) Compiler

45 5. Extended notations (5) Ex 3.11 BNF statement if-stmt | other
EBNF statement if-stmt | other if-stmt if ( exp ) statement | if ( exp ) statement else statement exp  0 | 1 statement if-stmt | other if-stmt if ( exp ) statement [ else statement ] exp  0 | 1 (2010-1) Compiler

46 (2010-1) Compiler

47 6. Formal Properties of CF Languages (1)
Formal definition T of terminals N of nonterminals P of productions (grammar rules): A   A  N   (T  N)* S: start symbol (S  N) G = (T, N, P, S) (2010-1) Compiler

48 6. Formal Properties of CF Languages (2)
Derivation step  A      , ,   (T  N)* A    P T  N: set of symbols  in (T  N)*: sentential form Derivation S * w w  T*: sentence (2010-1) Compiler

49 6. Formal Properties of CF Languages (3)
Language generated by G L(G) = { w  T* | there exists a derivation S * w of G } Leftmost derivation In each derivation step  A      ,   T* Rightmost derivation In each derivation step  A      ,   T* (2010-1) Compiler

50 6. Formal Properties of CF Languages (4)
Parse tree: rooted labeled tree Each node: terminal or nonterminal or  Root node: start symbol S Each leaf node: terminal or  Each nonleaf node: nonterminal Node with label A  N has children X1, …, Xn: A  X1X2…Xn  P (2010-1) Compiler


Download ppt "Chapter 3 Context-Free Grammar and Parsing"

Similar presentations


Ads by Google