Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models
Chapter 3 Program translation2 Syntax What is a valid string of the language? –First pass of a compiler Error messages (are they helpful?) –Compiler compiler (generator) such as YACC can automatically generate parser from BNF
Chapter 3 Program translation3 Good syntax criteria –Assist in Readability COBOL as self documenting Comments Length of identifiers Overloading of names –Examples of poor features for readability blank as concatenation operation in SNOBOL Identifier names in Basic –X1, Y Implicit typing Late binding
Chapter 3 Program translation4 Good syntax criteria(cont.) Assist in Writeability –Few and concise statements –Rich library–created by language and user –Support of abstraction –Orthogonality Examples of poor features for writeability –Large number of constructs –Lack of necessary constructs –Redundancy –Ambiguity Ex: if statement –Case sensitivity??
Chapter 3 Program translation5 Syntactic elements Character set –5, 6, 7, 8, 16 bit encoding schemes Identifiers –Symbols such as letters, digits, $, _, blank –Length limitation Operation symbols – various examples –LISP –prefix identifiers (ex: PLUS) –APL – special Greek characters –FORTRAN -.EQ.,.GT. –C - &&, == –Java & and &&, | and ||
Chapter 3 Program translation6 Syntactic elements (cont.) Keyword –identifier used as part of primitive program unit (ex: if, then, else, case) Reserved word –Keyword that cannot be assigned by programmer READ is not a reserved word in Pascal –Adding new reserved words to an update of a language can make old programs incorrect (upward compatability)
Chapter 3 Program translation7 Syntactic elements (cont.) Noise words –Used to improve readability-optional Ex: perform 5 [times] Comments –Used for documentation; readability Blanks –Completely ignored in FORTRAN Do 10 I = 1.5 Delimiters and brackets –Spaces, ;, paired ()[] {} begin end Fixed format vs free format
Chapter 3 Program translation8 Program Structure Expression –Precedence rules Statements –structured programming Modules/ functions/ subprograms/ classes –Nested units Static checks, efficient code for nonlocal references –Separate unit compilation. –Data and operations are compiled as a unit in classes –Interface issues – function specification to allow static checks (prototypes) –Specifications (.h files) separate from implementations
Chapter 3 Program translation9 Translation I- Lexical Analysis –Byte stream organized into lexemes, each of which is identified (tagged) –Numbers may be converted to binary –Identifiers are stored in symbol table –Tokens are output for syntactic analysis
Chapter 3 Program translation10 Translation II parsing – syntactic analysis Tokens organized into expressions, statements, etc. Is the input a valid string in the language? Generates parse tree, tables Produces error messages for invalid strings
Chapter 3 Program translation11 Translation III semantic analysis Produces error messages for invalid constructs –Ex: identifier not declared; type mismatch Compiled languages use and discard symbol table –Reference to variable as offset from data sections Information must be stored together with identifier (ex: type, range limitations) Macro substitutions Compiler directives –#define –#ifndef –Pragma suppress range_checks
Chapter 3 Program translation12 Translation IV optimization Semantic analysis output is typically one statement at a time Compiler can optimize code to optain results as efficient as assembly code –Ex:Save intermediate results in registers – remove constant operations from loop –Change 2-dimensional array storage Code generations Linking and Loading
Chapter 3 Program translation13 BNF (Backus Normal/Naur Form) Metalanguage ::= defined as | alternative <> nonterminal {} later introduced for iteration [] for optional sequence is implicit ex: ::= | ::= 0|1|2|3|4|5|6|7|8|9
Chapter 3 Program translation14 Context Free Grammars For balanced parenthesis S SS | (S) | () Problem: generate a parse tree for a string such as (()(()))((())()) from above Some language definition issues are context sensitive, such as: each identifier must be declared before use Implementation issues such as Pass by value or reference
Chapter 3 Program translation15 Syntax Charts Term at top left is defined by the following graph Graph branches for alternative Empty branch for optional Box around string for nonterminal Circle for terminal Arrow back for iteration ex: p. 96 in text Sequence is explicit
Chapter 3 Program translation16 Finite-State Automata Table used for lexical analysis Ex: valid floating point number (note that limitations on range and precision are not specified) (whole part) (decimal) (fractional) (exp) (exp value) Where whole part, fractional, and exp value have a looping arrow Digit is input to whole part. is input leading to decimal Digit leads from decimal to fractional E leads from fractional to exp Digit leads from exp to exp value