COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow
Syntax Analysis Recursive Descent Parsing Uses one procedure per construct Checks production one item at a time Verify tokens Recursively check constructs Easy to generate by hand
Parsing Routines Parse_operator() Parse_expression() (see fig. 1.13) Parser environment & support (see fig. 1.14) No context handling needed
Code Generation Simple stack machine PUSH n, ADD, MULT, PRINT Generate code by recursive traversal of parse tree (see fig. 1.18)
Interpretation Directly execute code from parse tree (see fig. 1.19)
Complete Compiler Structure (see fig. 1.21) Lexical Analysis Syntax Analysis Context Handling Intermediate Code Generation IC Optimization Symbolic Code Generation Symbolic Code Optimization Machine Code Generation Executable Code Output
Other Issues Run-time system Short-cuts Compiler Architecture Use assembler Output in a compilable language (like C) Compiler Architecture Narrow compiler Process small parts of program in multiple stages Wide compiler Complete each stage
Properties of a Good Compiler Generates correct code Conform to language specification Handle large programs Good compile speed Good error reporting Fast, small generated code
Other Issues Portability Retargetability Optimizations
Grammars Based on productions Classification Translation or replacement rules α→β meaning α is replaced by β Derivation: xαy => xβy Classification Based on limitations on form of α and β Context free: α is a single non-terminal symbol Regular: CF and β contains at most one non-terminal symbol
Context-free Grammars Non-terminals: Language components A, B, C Terminals: Symbols in language x, y, z Strings of grammar symbols Greek letters Empty sequence (string): ε Start symbol: LHS of first production, S
Derivations.1 A derivation is a sequence of replacements using productions A sentential form is a string derived from the start symbol The language defined by a grammar is the set of all sentential forms containing only terminal symbols that can be derived from the start symbol A leftmost derivation always replaces the leftmost non-terminal
Derivations.2 Sample derivation (see fig. 1.26) Parse tree Root is start symbol NT node has RHS of production for children See fig. 1.27
Extended Grammar Forms Backus-Naur Form (BNF) N→α | β | γ for alternatives Extended BNF R+ for 1 or more occurrence R* for 0 or more occurrences R? for 0 or 1 occurrence (optional item) Parentheses for grouping
Properties of Grammars Left-recursive production (or right-) N → Nα (N → αN) Nullable symbol N *=> ε Useless symbol never produces only terminal symbols Ambiguous Grammar Has two parse trees for same string
Closure Algorithms.1 Compute transitive closure of a relationship If aRb & bRc, then aRc Calling graph Direct and Indirect (closure) calls See figs. 1.28, 1.29
Closure Algorithms.2 Basic components of algorithm Data definitions: what are information items Initializations: where do we start Inference rules: apply repeatedly to get closure See fig. 1.31 for directed graph closure for subprogram calls General iterative algorithm See fig. 1.33