Download presentation
Presentation is loading. Please wait.
1
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler
2
Yu-Chen Kuo2 2.1 Overview Programming language: –What its program look like (Syntax : context-free grammars) –What its program mean (Semantics : more difficult)
3
Yu-Chen Kuo3 2.2 Syntax Definition Context-free grammar Grammar : hierarchical structure –stmt if (expr) stmt else stmt –production –token: if, (, else –nonterminal: expr, stmt
4
Yu-Chen Kuo4 Context-free Grammar 1.A set of tokens (terminals) Digits Sign (+, -, <, =) if, while 2.A set of nonterminals 3.A set of productions nonterminal ternimal/nonterminal left side right side 4.First nonterminal symbol: start symbol
5
Yu-Chen Kuo5 Example 2.1: Grammars of expression ‘9-5+2’ Example 2.1: grammars of expression ‘9-5+2’ list list + digit list list – digit list digit digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 list list+digit | list-digit | digit nonterminal: list ( start symbol ), digit terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9
6
Yu-Chen Kuo6 Example 2.1: Grammars of expression ‘9-5+2’ Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production Empty string: All possible token strings form the language defined by the grammar
7
Yu-Chen Kuo7 Example 2.2: Parse Tree Show how the start symbol derives a string list list + digit list list – digit list digit digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9
8
Yu-Chen Kuo8 Parse Trees A XY Z 1.Root is labeled by start symbol 2.Each leaf is labeled by a token or 3.Each interior is labeled by a nonterminal 4.If A is the nonterminal node and X 1, X 2,..X n are the labels of children of that node from left to right, then A X 1, X 2,..X n, is a production
9
Yu-Chen Kuo9 Example 2.3: Pascal begin-end blocks block begin opt_stmts end opt_stmts stmt_list | stmt_list stmt_list ; stmt | stmt stmt if (expr) stmt else stmt | assignment stmt
10
Yu-Chen Kuo10 Ambiguity of A Grammar A grammar is said to be ambiguous if it can have more than one parser tree generating a given string.
11
Yu-Chen Kuo11 Ambiguity of A Grammar string string+string | string-string string 0|1|2|3|4|5|6|7|8|9 Two expressions (9-5)+2 and 9-(5+2)
12
Yu-Chen Kuo12 Associativity of Operators Left Associative: 9+5-2 (9+5)-2 –+, -, *, / –Parse tree grows down towards the left Right Associative: a=b=c a=(b=c) –Parse tree grows down towards the right
13
Yu-Chen Kuo13 Associativity of Operators right letter = right | letter letter a|b|c|…|z
14
Yu-Chen Kuo14 Precedence of Operators 9+5*2 9+(5*2) *, / has higher precedence than +, - *, /, +, - are all left associative term for *, / –term term * factor | term / factor | factor expr for +,- –expr expr + factor | expr – factor | factor factor digit |(expr)
15
Yu-Chen Kuo15 Precedence of Operators Syntax of expression expr expr + term | expr – term | term term term * factor | term / factor | factor factor digit |(expr) Syntax of statement for Pascal (ambiguous?) stmt id := expr | if expr then stmt | if expr then stmt else stmt | while expr do stmt | begin opt_stmts end
16
Yu-Chen Kuo16 2.3 Syntax-Directed Translation The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language A syntax-directed definition uses a context- grammar to specify the syntactic structure With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols The grammar and the set of semantic rules constitute the syntax-directed definition
17
Yu-Chen Kuo17 2.3 Syntax-Directed Translation A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation
18
Yu-Chen Kuo18 Postfix Notation 1.If E is a variable, then postfix(E)=E 2.If E is an expression of form E 1 op E 2, then the postfix(E)= E 1 E 2 op, where E 1 = postfix(E 1 )= and E 2 = postfix(E 2 ) 3.If E is an expression of the form (E 1 ), then postfix(E)= postfix (E 1 ) postfix(9-5+2)=95-2+
19
Yu-Chen Kuo19 Postfix Notation
20
Yu-Chen Kuo20 Robot’s position
21
Yu-Chen Kuo21 Robot’s position
22
Yu-Chen Kuo22 Robot’s position
23
Yu-Chen Kuo23 Depth-First Traversals
24
Yu-Chen Kuo24 Translation Schemes A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions A translation scheme is like a syntax- directed definition, except the order of evaluation of the semantic rules is explicitly shown
25
Yu-Chen Kuo25 Translation Schemes
26
Yu-Chen Kuo26 2.4 Parsing Parsing is the process of determining if a string of tokens can be generated by a grammar. For any context-free grammar, a parser will takes at most O(n 3 ) time to parse a string of n tokens, too expensive. Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time)
27
Yu-Chen Kuo27 2.4 Parsing Top-down parser: parser tree construction starts at the root and proceeds towards the leaves Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars)
28
Yu-Chen Kuo28 Top-Down Parsing The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. 1.At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. 2.Find the next node at which a subtree is to be constructed.
29
Yu-Chen Kuo29 Example type simple | id | array [simple] of type simple integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer
30
Yu-Chen Kuo30 Example (Cont.) type simple | id | array [simple] of type simple integer | char | num dotdot num e.x.; array [ num dotdot num ] of integer
31
Yu-Chen Kuo31 Example (Cont.) type simple | id | array [simple] of type simple integer | char | num dotdot num
32
Yu-Chen Kuo32 Example (Cont.) type simple | id | array [simple] of type simple integer | char | num dotdot num
33
Yu-Chen Kuo33 Predictive Parsing Recursive-descent parsing is a top-down parsing
34
Yu-Chen Kuo34 Predictive Parsing (Cont.) type simple | id | array [simple] of type simple integer | char | num dotdot num
35
Yu-Chen Kuo35 Predictive Parsing (Cont.) Use lookahead symbol and first symbol ( FIRST )of a production to unambiguously determine the procedure selected for each nonterminal. FIRST ( ): the set of tokens that appear as the first symbols of one or more strings generated from –FIRST (simple) = { integer, char, num } –FIRST ( id ) = { } –FIRST ( array [ simple] of type) = { array } A | , then FIRST ( ) FIRST ( ) in predictive parsing
36
Yu-Chen Kuo36 When to Use -Production stmt begin opt_stmts end opt_stmts stmt_list | While parsing opt_stmts, if lookahead symbol is not in FIRST (stmt_list), then –production is used, lookahead symbol is end ; otherwise, error
37
Yu-Chen Kuo37 Designing a Predictive Parser Consisting of a procedure for every nonterminal Each procedure does two things. 1.Decide which production to use by looking at the lookahead symbol. The production with right side is used if the lookahead symbol is in FIRST ( ). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with on the right side is used. 2.The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token.
38
Yu-Chen Kuo38 Eliminating Left Recursion expr expr + term | term –Loop forever expr( ) A A | A R R R | expr expr + term | term expr term rest rest + term rest |
39
Yu-Chen Kuo39 Eliminating Left Recursion (Cont.)
40
Yu-Chen Kuo40 A Translator for Simple Expressions
41
Yu-Chen Kuo41 Adapting the Translation Scheme Eliminate left recursion A A | A | A R R R | R | expr expr + term {print(‘+’)} expr term rest rest + term {print(‘+’)} rest | - term {print(‘-’)} rest | term 0 {print(‘0’)} term 9 {print(‘9’)}
42
Yu-Chen Kuo42 Adapting the Translation Scheme (Cont.)
43
Yu-Chen Kuo43 Procedures for the Nonterminals expr, term, and rest
44
Yu-Chen Kuo44 Optimizing the Translator Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; }
45
Yu-Chen Kuo45 Optimizing the Translator (Cont.)
46
Yu-Chen Kuo46 The Complete Program
47
Yu-Chen Kuo47 The Complete Program (Cont.)
48
Yu-Chen Kuo48 The Complete Program (Cont.)
49
Yu-Chen Kuo49 2.6 Lexical Analysis Removal of White Space and Comments –Blanks, tabs, newlines Constants –Adding production to the grammar for expressions –Creating a token num for constants –31 + 28 + 59 Recognizing Identifiers and Keywords –Keywords are reserved –begin /* keyword */ count = count + increment; /* id = id + id */ end
50
Yu-Chen Kuo50 Interface to the Lexical Analyzer A lexical analyzer reads characters, group into lexemes, and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler.
51
Yu-Chen Kuo51 Interface to the Lexical Analyzer In some situations, the lexical analyzer has to read some characters ahead before it can decide on the token to be returned to the parser. –Decide ‘>’ or ‘>=‘ –Push back if need –Using an input buffer and a pointer keeping track the next character The lexical analyzer produces a token and the parser consumes the token. Usually, the parser call the lexical analyzer to return tokens on demand.
52
Yu-Chen Kuo52 A Lexical Analyzer A lexical analyzer allows white space and numbers to appear within expressions.
53
Yu-Chen Kuo53 A Lexical Analyzer (Cont.) If a data structure does not be allowed to be returned, then tokens and their attributed have to be passed separately. Usually, lexan returns an integer encoding of a token Use integer ‘256’ to encode num tokenval: token attribute value –When scans an integer 13, token num (256) and tokenval (13) are returned to parser –When scans an identifier initial, token id (259) and tokenval (symbol table index p) are returned to parser
54
Yu-Chen Kuo54 A Lexical Analyzer (Cont.) Allowing numbers within expressions requires a change in grammar expr factor factor (expr) | num {print( num.value)}
55
Yu-Chen Kuo55 A Lexical Analyzer (Cont.)
56
Yu-Chen Kuo56 A Lexical Analyzer (Cont.)
57
Yu-Chen Kuo57 2.7 Incorporating a Symbol Table The symbol table is collected by the analysis phases (lexical:identifier, syntax: type) of the compiler and used by the synthesis phases (code generator). Primarily routines are saving and retrieving lexemes. –insert(s,t) : Returns index of new entry for string s, token t –lookup(s) : Returns index of the entry for string s, or 0 if s is not found. The lexical analyzer uses the lookup operation to determine if there is an entry for a lexeme in the symbol table. If no entry exists, then it uses the insert operation to create one.
58
Yu-Chen Kuo58 Handling Reserved Keywords Reserved keywords are inserted into the symbol table initially. For example, consider tokens div and mod with lexemes div and mod, respectively. We can initialize the symbol table using the calls –insert (“ div ”, div ); –insert (“ mod ”, mod ); Any subsequent call lookup(“ div ”) returns the token div, so div cannot be used as an identifier.
59
Yu-Chen Kuo59 A Symbol-Table Implementation 257 258 259 integer 259 real (type)
60
Yu-Chen Kuo60 Pseudo-code for a lexical analyzer
61
Yu-Chen Kuo61 Pseudo-code for a lexical analyzer (Cont.)
62
Yu-Chen Kuo62 2.9 Putting The Techniques Together (infix postfix translator) An infix-to-postfix translator for expressions Expressions consist of numbers, identifiers, and operators +,-, *, /, div, and mod. id : a sequence of letters and digits beginning with a letter num : a sequence of digits Tokens are separated by blanks, tabs, newlines (white space)
63
Yu-Chen Kuo63 infix postfix translator (Cont.)
64
Yu-Chen Kuo64 Modules the infix-to-postfix translator
65
Yu-Chen Kuo65 The Lexical Analysis Module lexer.c
66
Yu-Chen Kuo66 The Parser Module parser.c
67
Yu-Chen Kuo67 The Emitter Module emitter.c Emit(t, tval) –Output for token t with attribute value tval
68
Yu-Chen Kuo68 The Symbol-Table Module symbol.c and init.c Implement symtable data strucrure and functions –lookup(s) –insert(s, tok)
69
Yu-Chen Kuo69 The Error Module error.c Error reporting ( printf)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.