Download presentation
Presentation is loading. Please wait.
Published byNora Wright Modified over 8 years ago
1
1 A Simple Syntax-Directed Translator CS308 Compiler Theory
2
Lecture Outline We shall look at a simple programming language and describe the initial phases of compilation. We start off by creating a ‘simple’ syntax directed translator that maps infix arithmetic to postfix arithmetic. This translator is then extended to cater for more elaborate programs such as (check page 39 Aho) –While (true) { x=a[i]; a[i]=a[j]; a[j]=x; } Which generates simplified intermediate code (as on pg40 Aho) 2CS308 Compiler Theory
3
Two Main Phases (Analysis and Synthesis) Analysis Phase :- Breaks up a source program into constituent pieces and produces an internal representation of it called intermediate code. Synthesis Phase :- translates the intermediate code into the target program. During this lecture we shall focus on the analysis phase (compiler front end … see figure next slide) 3CS308 Compiler Theory
4
A Model of A Compiler Font End 4CS308 Compiler Theory
5
Syntax vs. Semantics The syntax of a programming language describes the proper form of its programs The semantics of the language defines what its programs mean. 5CS308 Compiler Theory
6
A note on Grammars (Context-free) A formal grammar is used to specify the syntax of a formal language (for example a programming language like C, Java) Grammar describes the structure (usually hierarchical) of programming languages. –For e.g. in Java an IF statement should fit in if ( expression ) statement else statement –statement -> if ( expression ) statement else statement –Note the recursive nature of statement. 6CS308 Compiler Theory
7
A CFG has four components A set of terminal symbols, sometimes referred to as ‘tokens’. A set of non-terminals, sometimes called ‘syntactic variables’. A set of productions. A designation of one of the non-terminals as the start symbol. 7CS308 Compiler Theory
8
A Grammar for ‘list of digits separated by + or -’ list -> list + digit list -> list – digit list -> digit digit -> 0|1|… |9 Accepts strings such as 9-5+2, 3-1, or 7. list and digit are non-terminals 0 | 1 | … | 9, +, - are the terminal symbols 8CS308 Compiler Theory
9
Parsing and derivations Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production If it cannot be derived from the start symbol then reporting syntax errors within the string. 9CS308 Compiler Theory
10
Parse Trees A parse tree pictorially shows how the start symbol of a grammar derives a string in the language A grammar can have more than one parse tree generating a given string of terminals (thus making it ambiguous) 10CS308 Compiler Theory
11
Parse Trees A parse tree is a tree with the following properties: –The root is labeled by the start symbol. –Each leaf is labeled by a terminal or by E. –Each interior node is labeled by a non-terminal. –If A is the non-terminal labeling some interior node and X l, X 2,, X n are the labels of the children of that node from left to right, then there must be a production A -> X 1 X 2 · · · X n. Here, X 1, X 2,..., X n each stand for a symbol that is either a terminal or a non- terminal. 11CS308 Compiler Theory
12
Ambiguity string -> string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 12CS308 Compiler Theory
13
Operator Associativity and Precedence To resolve some of the ambiguity with grammars that have operators we use: –Operator associativity :- in most programming languages arithmetic operators have left associativity. –E.g. 9+5-2 = (9+5)-2 –However = has right associativity, i.e. a=b=c is equivalent to a=(b=c) –Operator Precedence :- if an operator has higher precedence, then it will bind to it’s operands first. –eg. * has higher precedence then +, therefore 9+5*2 is equivalent to 9+(5*2) 13CS308 Compiler Theory
14
Syntax-Directed Translation Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 14CS308 Compiler Theory
15
Postfix Notation If E is a variable or constant, then the postfix notation for E is E itself. If E is an expression of the form E 1 op E 2, where op is any binary operator, then the postfix notation for E is E’ 1 E’ 2 op, where E’ 1 and E’ 2 are the postfix notations for E 1 and E 2, respectively. If E is a parenthesized expression of the form (E 1 ), then the postfix notation for E is the same as the postfix notation for E 1 · X=Y==Z+U*V?Y:0 15CS308 Compiler Theory
16
Synthesized Attributes Attributes associated with non-terminals and terminals in a grammar. An attribute is said to be synthesized if its value at a parse-tree node N is determined from attribute values of the children of N and at N itself. 16CS308 Compiler Theory
17
Semantic Rules for infix to postfix 17CS308 Compiler Theory
18
A syntax-directed translation scheme A notation for specifying a translation by attaching program fragments to productions in a grammar. The program fragments are called semantic actions. 18CS308 Compiler Theory
19
Parsing Parsing is the process of determining how a string of terminals can be generated by a grammar. Two classes : –Bottom-up, where construction starts at the leaves and proceeds towards the root; –Top-down, where construction starts at the root and proceeds towards the leaves. 19CS308 Compiler Theory
20
Top-Down Parsing The top-down construction of a parse is done by starting with the root, labeled with the starting non-terminal stmt, and repeatedly performing the following two steps. –At node N, labeled with non-terminal A, select one of the productions for A and construct children at N for the symbols in the production body. –Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded non-terminal of the tree. 20CS308 Compiler Theory
21
Top-Down Parsing 21 => CS308 Compiler Theory
22
Predictive Parsing Recursive descent parsing : a top-down method of syntax analysis in which a set of recursive procedures is used to process the input. Predictive parsing is a simple form of recursive-descent parsing, in which the lookahead symbol (the first symbol that can be generated by a production body) unambiguously determines the flow of control through the procedure body for each non-terminal. 22CS308 Compiler Theory
23
Predictive Parsing 23 Every non-terminal has such a procedure in predictive parser. CS308 Compiler Theory
24
Left Recursion Since the lookahead symbol changes only when a terminal is matched, no change to the input takes place between recursive calls of expr. A left-recursive production can be eliminated by rewriting the offending production. 24CS308 Compiler Theory
25
Eliminating Left Recursion 25 => CS308 Compiler Theory
26
Translators (using program fragments) 26 Left-recursion-eliminated CS308 Compiler Theory
27
A Translator for Simple Expressions formed by extending predictive parser 27CS308 Compiler Theory
28
Lexical Analysis A lexical analyzer reads characters from the input and groups them into "token objects.“ Token Terminal symbol Lexeme 28CS308 Compiler Theory
29
Extended translation scheme 29CS308 Compiler Theory
30
Reading Ahead Is it '>' or '>=' ?... The lexer needs to read one character in order to decide what token to return to the parser. One-character read ahead usually suffices, so a simple solution is to use a variable, call it peek, to hold the next input character. 30CS308 Compiler Theory
31
Constant Write tokens as tuples enclosed between <> –31 + 28 + 59 is transformed into the sequence Simulate parsing some number.... If ( peek holds a digit) { v = 0; Do { v = v * 10 + integer value of digit peek; Peek = next input character; } while ( peek holds a digit ); Return token 31CS308 Compiler Theory
32
Keywords and Identifiers 32 => Identifiers: Keywords: A character string forms an identifier only if it is not a keyword. CS308 Compiler Theory
33
Symbol Table Data structures that are used by compilers to hold information about the source-program constructs. Information is collected incrementally throughout the analysis phase and used for the synthesis phase. One symbol table per scope (of declaration)... { int x; char y; { bool y; x; y; } x; y; } { { x:int; y:bool; } x:int; y:char; } 33 => CS308 Compiler Theory
34
Intermediate Code Generation The front end of a compiler constructs an intermediate representation of the source program from which the back end generates the target program. Two kinds of intermediate representations –Tree, including parse trees and (abstract) syntax trees. –Linear representations, especially “three-address code.” 34CS308 Compiler Theory
35
Static Checking Done by a compiler front end To check that the program follows the syntactic and semantic rules Including: –Syntactic checking –Type checking 35CS308 Compiler Theory
36
Syntax Trees For statement stmt -> while ( expr ) stmt { stmt.n = new While(expr.n, stmt.n } n is a node in the syntax tree stmts -> stmts 1 stmt { stmts.n = new Seq(stmts1.n, stmt.n); } 36CS308 Compiler Theory
37
Syntax Trees For expressions 37 => CS308 Compiler Theory
38
Three-Address Code Three-address code is a sequence of instructions of the form x = y op Z Arrays will be handled by using the following two variants of instructions: 1. x [ y ] = z 2. x = y [ z ] Instructions for control flow: 1. if False x goto L 2. if True x goto L 3. goto L Instruction for copying value x = y 38CS308 Compiler Theory
39
Translation of Statements Use jump instructions to implement the flow of control through the statement. The translation of if expr then stmt l 39CS308 Compiler Theory
40
Translation of Statements 40CS308 Compiler Theory
41
Functions lvalue(x:Expr) and rvalue(x:Expr) a = a + 1, a is computed differently for the l-value and r-value Two functions used to distinguish then: –Rvalue which when applied to a nonleaf node x, generates the instructions to compute x into a temporary var, and returns a new node representing the temporary var. –Lvalue which when applied to a nonleaf, generates instructions to compute the subtrees below x, and returns a node representing the “address” for x R-values is what we usually think of as “values” while L-values are “locations” 41CS308 Compiler Theory
42
Translation of Expressions Expressions contain binary operators, array accesses, assignments, constants and identifiers. We can take the simple approach of generating one three-address instruction for each operator node in the syntax tree of an expression. Expression: i-j+k translates into t1 = i-j t2 = t1+k Expression: 2 * a[i] translates into t1 = a [ i ] t2 = 2 * t1 42CS308 Compiler Theory
43
Test Yourself Generate three-address codes for If(x[2*a]==y[b]) x[2*a+1]=y[b+1]; 43CS308 Compiler Theory
44
Summary Grammars Parse Trees and Syntax Tree –Ambiguity Postfix notation Lexical analyzer –Token –Synthesized Attributes Parsing –Predicative parsing Syntax-directed translation –Attaching rules to productions –Attaching program fragments to productions Intermediate code –Abstract syntax tree –Three-address code 44CS308 Compiler Theory
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.