D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126
D Goforth COSC Stages of translation Lexical analysis - the lexer or scanner Syntactic analysis - the parser Code generation Linking Before Execution
D Goforth COSC Lexical analysis Translate stream of characters into lexemes Lexemes belong to categories called tokens Token identity of lexemes is used at the next stage of syntactic analysis
D Goforth COSC From characters to lexemes yVal = x – min ( 100, 4xVal ));
D Goforth COSC Examples: tokens and lexemes Some token categories contain only one lexeme: semi-colon ; Some tokens categorize many lexemes: identifier count, maxCost,… based on a rule for legal identifier strings
D Goforth COSC Tokens and Lexemes yVal = x – min ( 100, 4xVal )); Lexical analysis identifies lexemes and their token type recognizes illegal lexemes (4xVal) does NOT identify syntax error: ) ) identifier illegal lexeme left_paren equal_sign
D Goforth COSC Syntax or Grammar of Language rules for generating (used by programmer) or Recognizing (used by parser) a valid sequence of lexemes
D Goforth COSC Grammars 4 categories of grammars (Chomsky) Two categories are important in computing: Regular expressions (pattern matching) Context-free grammars (programming languages)
D Goforth COSC Context-free grammar Meta-language for describing languages States rules or productions for what lexeme sequences are correct in the language Written in Backus-Naur Form (BNF) or EBNF Syntax graphs
D Goforth COSC Example of BNF rule PROBLEM: how to recognize all these as correct? y = x f = rVec.length + 1 button[4].label = “Exit” RULE for defining assignment statement: = Assumes other rules for,
D Goforth COSC BNF rules Non-terminal and terminal symbols: Non-terminals are defined by at least one rule = Terminals are tokens (or lexemes)
D Goforth COSC Simple sample grammar(p.123) = A | B | C // lexical + | * | ( ) | terminals terminals
D Goforth COSC Simple sample production = <- apply one rule at each step B = to leftmost non-terminal B = * B = A * B = A * ( ) B = A * ( + ) B = A * ( C + ) B = A * ( C + C ) = A | B | C + | * | ( ) | = A | B | C + | * | ( ) |
D Goforth COSC Sample parse tree = + * B A ( ) C C Leaves represent the sentence of lexemes Rule application = A | B | C + | * | ( ) | = A | B | C + | * | ( ) |
D Goforth COSC extended sample grammar | if ( ) then | if ( ) then else | = | == | ~= How to add compound condition?
D Goforth COSC Ambiguous grammar Different parse trees for same sentence Different translations for same sentence Different machine code for same source code!
D Goforth COSC Grammars for ‘human’ conventions without ambiguity Putting features of languages into grammars: expression any length: lists, p. 121 precedence - an extra non-terminal: p. 125 associativity - order in recursive rules: p. 128 nested if statements - “dangling else” problem: p. 130
D Goforth COSC Forms for grammars Backus-Naur form (BNF) Extended Backus-Naur form (EBNF) -shortens set of rules Syntax graphs -easier to read for learning language
D Goforth COSC EBNF optional zero or one occurrence [..] -> [ + ] optional zero or more occurrences {..} -> { + } ‘or’ choice of alternative symbols | -> [ (*|/) ]
Syntax Graph - basic structures expr term factor * / expr term + - factor * / term
BNF (p. 121)EBNF Syntax Graph -> + | - | -> * | / | -> [ (+|-)] -> [ (*|/)] -> {(+|-) } -> {(*|/) } expr term + - factor * /
D Goforth COSC Attribute grammars Problem: context-free grammars cannot describe some features needed in programming - “static semantics” e.g.: rules for using data types *Can’t assign real to integer (clumsy in BNF) *Can’t access variable before assigning (impossible in BNF)
D Goforth COSC Attributes Symbols in the grammar can have attributes (properties) Productions can have functions of some of the attributes of their symbols that compute the attributes of other symbols Predicates (boolean functions) inspect the attributes of non- terminals to see if they are legitimate
D Goforth COSC Using attributes 1)Apply productions to create parse tree (symbols have some intrinsic attributes) 2)Apply functions to determine remaining attributes 3)Apply predicates to test correctness of parse tree
D Goforth COSC Sebesta’s example = + | A | B | C Add attributes for type checking Expected_type Actual_type
D Goforth COSC Sebesta’s example = + | A | B | C expected_type actual_type expected_type actual_type expected_type actual_type expected_type actual_type
D Goforth COSC Sebesta’s example = + | A | B | C actual_type Determined from string (A,B,C) Which has been declared actual_type Determined from string (A,B,C) Which has been declared
D Goforth COSC Sebesta’s example = + | A | B | C actual_type Determined from Actual types actual_type Determined from Actual types
D Goforth COSC Sebesta’s example = + | A | B | C expected type Determined from Actual types expected type Determined from Actual types
D Goforth COSC Sebesta’s type rules p.138
D Goforth COSC Sebesta’s example
D Goforth COSC Sebesta’s example
D Goforth COSC Axiomatic semantics Assertions about statements Preconditions Postconditions like JUnit testing Purpose Define meaning of statement Test for validity of computation (does it do what it is supposed to do?)
D Goforth COSC Example for assignment What the statement should do is expressed as a postcondition Based on the syntax of the assignment, a precondition is inferred When statement is executed, conditions can be verified before and after
D Goforth COSC Example assignment statement y = 25 + x * 2 postcondition: y>40 y>40 25+x*2>40 x*2>15 x>7.5precondition