Download presentation
Presentation is loading. Please wait.
Published byJoy Wood Modified over 9 years ago
1
4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2 nd Ed. 2006) 1
2
2 Introduction Context-free Grammar The syntax of programming language constructs can be described by context-free grammar Important aspects A grammar serves to impose a structure on the linear sequence of tokens which is the program. Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.
3
3 The role of the parser
4
Context-Free Grammars A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: A finite terminal alphabet Σ A finite non-terminal alphabet N A start symbol S N A finite set of productions P 4
5
A Simple Expression Grammar 5
6
Leftmost Derivations A sentential form produced via a leftmost derivation is called a left sentential form. The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. Example: f(V+V) 6 E lm Prefix(E) lm f(E) lm f(V Tail) lm f(V+E) lm f(V+V Tail) lm f(V+V)
7
Rightmost Derivations As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. Called rightmost or canonical parse Example: f(V+V) 7 E rm Prefix(E) rm Prefix(V Tail) rm Prefix(V+E) rm Prefix(V+V Tail) rm Prefix(V+V) rm f(V+V)
8
Parse Tree It is rooted by the start symbol S Each node is either a grammar symbol or 8
9
Properties of CFGs The grammar may include useless symbols The grammar may allow multiple, distinct derivations (parse trees) for some input string. The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language. 9
10
Ambiguity (1) Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). Example: 1. Expr → Expr – Expr 2. | id This grammar allows two different parse tree for id - id - id. 10
11
Ambiguity (2) 11
12
Parsers and Recognizers Two approaches A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root. 12
13
13 Two approaches of Parser Deterministic left-to-right top-down LL method Deterministic left-to-right bottom-up LR method Left-to-right The sequence of tokens is processed from left to right Deterministic No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree
14
Parsers (Top- down) 14
15
15 Parsers (bottom-Up)
16
16 Pre-order and post-order (1) The top-down method constructs the syntax tree in pre- order The bottom-up method constructs the syntax tree in post- order
17
17 Pre-order and post-order (2)
18
18 Principles of top-down parsing The main task of a top-down parser is to choose the correct alternatives for known non-terminals
19
19 Principles of bottom-up parsing The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.
20
20 Creating a top-down parser manually Recursive descent parsing Simplest way but has its limitations
21
21 Recursive descent parsing program (1)
22
22 Recursive descent parsing program (2)
23
23 Drawbacks Three drawbacks There is still some searching through the alternatives The method often fails to produce a correct parser Error handling leaves much to be desired
24
24 Second problems (1) Example 1 Index_element will never be tried IDENTIFIER ‘ [ ‘
25
25 Second problems (2) Example 2 The recognizer will not recognize ab
26
26 Second problems (3) Example 3 Recursive descent parsers cannot handle left-recursive grammars
27
27 Creating a top-down parser automatically The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.
28
28 LL(1) parsing FIRST set The sets of first tokens produced by all alternatives in the grammar. We have to precompute the FIRST sets of all non-terminals The first sets of the terminals are obvious. Finding FIRST( ) is trivial when starts with a terminal. FIRST(N) is the union of the FIRST sets of its alternatives. First( )={a Σ| * a }
29
29 Predictive recursive descent parser The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.
30
30 Closure algorithm for computing the FIRST set (1) Data definitions
31
31 Closure algorithm for computing the FIRST set (2) Initializations
32
32 Closure algorithm for computing the FIRST set (3) Inference rules
33
33 FIRST sets example(1) Grammar
34
34 FIRST sets example(2) The initial FIRST sets
35
35 FIRST sets example(3) The final FIRST sets
36
Another Example of First Set 36
37
37 Another Example of First Set (II)
38
Algorithms of Computing First( α ) 38
39
39 The predictive parser (1)
40
40 The predictive parser (2)
41
41 Practice Find the FIRST sets of all alternative of the following grammar. E -> TE ’ E ’ ->+TE ’ | T->FT ’ T ’ ->*FT ’ | F->(E)| id
42
42 Nullable alternatives A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?
43
43 FOLLOW sets Follow sets Determining the set of tokens that can immediately follow a given non-terminal N. LL(1) parser ‘ LL ’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. ‘ (1) ’ because all choices are based on a one token look-ahead. Follow(A)={b Σ |S + Ab β}
44
44 Closure algorithm for computing the FOLLOW sets
45
45 The first and follow sets
46
Another Example of Follow Set 46
47
Another Example of Follow Set (II) 47
48
Algorithm of Follow(A) 48
49
49 Recall the predictive parser rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, } void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error(); } FOLLOW(rest_expr) = {EOF, ‘)’}
50
50 LL(1) conflicts Example The codes
51
51 LL(1) conflicts FIRST/FIRST conflict term IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ | ‘ ( ’ expression ‘ ) ’
52
52 LL(1) conflicts FIRST/FOLLOW conflict FIRST set FOLLOW set S A ‘ a ’ ‘ b ’ { ‘ a ’ } {} A ‘ a ’ | { ‘ a ’, } { ‘ a ’ }
53
53 LL(1) conflicts left recursion expression expression ‘ - ’ term | term Look-ahead token LL(1) method predicts the alternative A k for a non-terminal N FIRST(A k ) (if is nullable then FOLLOW(N)) LL(1) grammar No FIRST/FIRST conflicts No FIRST/FOLLOW conflicts No multiple nullable alternatives No non-terminal can have more than one nullable alternative.
54
54 Solve the LL(1) conflicts Two options Use a stronger parser Make the grammar LL(1)
55
55 Making a grammar LL(1) manual labour rewrite grammar adjust semantic actions three rewrite methods left factoring substitution left-recursion removal
56
56 Left-factoring term IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ factor out common prefix term IDENTIFIER after_identifier after_identifier | ‘ [ ‘ expression ‘ ] ’ ‘[’ FOLLOW(after_identifier)
57
57 Substitution A a | B c | S p A q replace non-terminal by its alternative S p a q | p B c q | p q Example S A ‘ a ’ ‘ b ’ A ‘ a ’ | replace non-terminal by its alternative S ‘ a ’ ‘ a ’ ‘ b ’ | ‘ a ’ ‘ b ’
58
58 Left-recursion removal Three types of left-recursion Direct left-recursion N N | … Indirect left-recursion Chain structure N A … A B … … Z N … Hidden left-recursion N N| … ( can produce )
59
59 Left-recursion removal N N | replace by N M M M | example expression expression ‘ - ’ term | term ... expression term expression_tail_option expression_tail_option ‘-’ term expression_tail_option | N
60
60 Practice make the following grammar LL(1) expression expression ‘ + ’ term | expression ‘ - ’ term | term term term ‘ * ’ factor | term ‘ / ’ factor | factor factor ‘ ( ‘ expression ‘ ) ’ | func-call | identifier | constant func-call identifier ‘ ( ‘ expr-list? ‘ ) ’ expr-list expression ( ‘, ’ expression)*
61
61 Answers substitution F ‘ ( ‘ E ‘ ) ’ | ID ‘ ( ‘ expr-list? ‘ ) ’ | ID | constant left factoring E E ( ‘ + ’ | ‘ - ’ ) T | T T T ( ‘ * ’ | ‘ / ’ ) F | F F ‘ ( ‘ E ‘ ) ’ | ID ( ‘ ( ‘ expr-list? ‘ ) ’ )? | constant left recursion removal E T (( ‘ + ’ | ‘ - ’ ) T )* T F (( ‘ * ’ | ‘ / ’ ) F )*
62
62 Undoing the semantic effects of grammar transformations While it is often possible to transform our grammar into a new grammar that is acceptable by a parser generator and that generates the same language, the new grammar usually assigns a different structure to strings in the language than our original grammar did Fortunately, in many cases we are not really interested in the structure but rather in the semantics implied by it.
63
63 Semantics Non-left-recursive equivalent
64
64 Automatic conflict resolution (1) There are two ways in which LL parsers can be strengthened By increasing the look-ahead Distinguishing alternatives not by their first token but by their first two tokens is called LL(2). Disadvantages: the parser code can get much bigger. By allowing dynamic conflict resolvers When the conflict arises during parsing, some of conditions are evaluated to solve it. The parser generator LLgen requires a conflict resolver to be placed on the first of two conflicting alternatives.
65
65 If-else statement in C else_tail_option: both FIRST set and FOLLOW set contain the token ‘ else ’ Conflict resolver Automatic conflict resolution (2)
66
66 The LL(1) push-down automation Transition table for an LL(1) parser
67
67 Push-down automation (PDA) Type of moves Prediction move Top of the prediction stack is a non-terminal N. N is removed from the stack Look up the prediction table Push the alternative of N into the prediction stack Match move Top of the prediction stack is a terminal Termination Parsing terminates when the prediction stack is exhausted.
68
68 Prediction move in an LL(1) PDA
69
69 Match move in an LL(1) PDA
70
70 Predictive parsing with an LL(1) PDA
71
71 PDA example (1) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression
72
72 PDA example (2) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
73
73 PDA example (3) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression
74
74 PDA example (4) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
75
75 PDA example (5) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression
76
76 PDA example (6) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
77
77 PDA example (7) Please continue!! Example of parsing (i+i)+i
78
Another Example (1) 78
79
Another Example (2) 79
80
LL Parser Table 80
81
Trace of an LL(1) Parse 81
82
Obtaining LL(1) Grammars Most LL(1) prediction conflicts can be grouped into two categories: common prefix and left recursion 82
83
Common Prefixes 83 Factoring method
84
Algorithm of Factoring 84
85
Left Recursion 85
86
Algorithm of Eliminating Left Recursion 86
87
87 LLgen LLgen is part of the Amsterdam Compiler Kit takes LL(1) grammar + semantic actions in C and generates a recursive descent parser The non-terminals in the grammar can have parameters, and rules can have local variables, both again expressed in C. LLgen features: repetition operators advanced error handling parameter passing control over semantic actions dynamic conflict resolvers
88
88 LLgen start from LR(1) grammar make grammar LL(1) use repetition operators %token DIGIT; main : [line]+ ; line : expr '\n' ; expr : term [ '+' term ]* ; term : factor [ '*' factor ]* ; factor : '(' expr ')‘ | DIGIT ; LLgen add semantic actions attach parameters to grammar rules insert C-code between the symbols
89
89 Minimal non-left-recursive grammar for expressions
90
90 LLgen code for a parser GrammarSemantics
91
91 LLgen code for a parser The code from previous page resides in a file called parser.g. LLgen converts the file to one called parser.c, which contains a recursive descent parser.
92
92 LLgen interface to lexical analyzer
93
93 LLgen interface to back-end LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens LLmessage() is invoked to notify the lexical analyzer
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.