CS 3304 Comparative Languages

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Attribute Grammars Prabhaker Mateti ACK: Assembled from many sources.
Semantic Analysis Chapter 4. Role of Semantic Analysis Following parsing, the next two phases of the "typical" compiler are – semantic analysis – (intermediate)
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Topic 5 -Semantic Analysis Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
1 Lecture 11: Semantic Analysis (Section ) CSCI 431 Programming Languages Fall 2002 A modification of slides developed by Felix Hernandez-Campos.
Copyright © 2005 Elsevier Chapter 4 :: Semantic Analysis Programming Language Pragmatics Michael L. Scott.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Syntax Directed Translation. Syntax directed translation Yacc can do a simple kind of syntax directed translation from an input sentence to C code We.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
COP4020 Programming Languages Semantics Prof. Xin Yuan.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
CS 363 Comparative Programming Languages Semantics.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Copyright © 2009 Elsevier Chapter 4 :: Semantic Analysis Programming Language Pragmatics Michael L. Scott.
Chapter 8: Semantic Analyzer1 Compiler Designs and Constructions Chapter 8: Semantic Analyzer Objectives: Syntax-Directed Translation Type Checking Dr.
CSE 420 Lecture Program is lexically well-formed: ▫Identifiers have valid names. ▫Strings are properly terminated. ▫No stray characters. Program.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Chapter4 Syntax-Directed Translation Introduction : 1.In the lexical analysis step, each token has its attribute , e.g., the attribute of an id is a pointer.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 9 Ahmed Ezzat Semantic Analysis.
CS 3304 Comparative Languages
Lecture 9 Symbol Table and Attributed Grammars
Semantic analysis Jakub Yaghob
Chapter 1 Introduction.
Semantic Analysis Chapter 4.
Syntax-Directed Translation
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
Constructing Precedence Table
CS 3304 Comparative Languages
CS510 Compiler Lecture 4.
Chapter 2 :: Programming Language Syntax
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
Chapter 1 Introduction.
PROGRAMMING LANGUAGES
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Compiler Lecture 1 CS510.
Syntax-Directed Translation Part I
CS416 Compiler Design lec00-outline September 19, 2018
Compiler Design 22. ANTLR AST Traversal (AST as Input, AST Grammars)
CS 3304 Comparative Languages
Chapter 5. Syntax-Directed Translation
Syntax-Directed Translation Part I
Syntax-Directed Translation Part I
Introduction CI612 Compiler Design CI612 Compiler Design.
Chapter 6 Intermediate-Code Generation
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CS 3304 Comparative Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS 3304 Comparative Languages
Chapter 4 Action Routines.
CS416 Compiler Design lec00-outline February 23, 2019
Syntax-Directed Translation Part I
SYNTAX DIRECTED DEFINITION
Chapter 2 :: Programming Language Syntax
Compiler Construction
Chapter 2 :: Programming Language Syntax
Chapter 10: Compilers and Language Translation
Syntax-Directed Translation Part I
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
COP4020 Programming Languages
COP4020 Programming Languages
Compiler Construction
Chapter 2 :: Programming Language Syntax
Presentation transcript:

CS 3304 Comparative Languages Lecture 6: Semantic Analysis 2 February 2012

Introduction Context-free grammar can become very complex if trying to do too much. A rule that requires the compiler to compare things that are separated by long distances, or to count things that are not properly nested, ends up being a matter of semantics, Semantics rules: Static: enforced by the compiler at compile time. Dynamic: enforced by the compiler generate code at runtime. Following parsing, the next two phases of the “typical” compiler are: Semantic analysis. (Intermediate) code generation. Described in terms of annotations (attributes) of a parse tree. Attribute grammars provide a formal framework.

Semantic Analyzer The principal job of the semantic analyzer is to enforce static semantic rules: Constructs a syntax tree (usually first). Information gathered is needed by the code generator. This interface is a boundary between the front end and the back end. There is a considerable variety in the extent to which parsing, semantic analysis, and intermediate code generation are interleaved. Fully separated phases: a full parse tree, a syntax tree, and semantic check. Fully interleaved phases: no need to build both pars and syntax trees. A common approach interleaves construction of a syntax tree with parsing (no explicit parse tree), follows with separate, sequential phases for semantic analysis and code generation.

Dynamic Checks Many compilers have an option to enable/disable code generation for dynamic checks: questionable. A consequence of an undetected error in production use is significantly worse than when testing. Sometimes dynamic checks are cheap: execute in instruction slots that would otherwise go unused. Sometimes dynamic checks are expensive: pointer arithmetic in C.

Assertions An assertion is a statement that a specified condition is expected to be true when execution reaches a certain point in the code (Java): assert denominator != 0; Many languages support assertions via standard library (C): assert(denominator != 0); Other constructs include invariants, preconditions and post- conditions (Euclid, Eiffel): can cover a potentially large number of places where an assertion would be required. Semantic checking requires a significant run-time overhead but with recent hardware advances, it becomes very feasible.

Static Analysis Compile-time algorithms that predict run-time behavior. It is precise if it allows the compiler to determine whether a given program will always follow the rules: type checking. Also useful when not precise: a combination of compile time check and code for run time checking. Static analysis is also used for code improvement: Alias analysis: when values can be safely cached in registers. Escape analysis: all references to a value confined to a given context. Subtype analysis: an OO variable is of a certain subtype. Unsafe and speculative optimization. Conservative and optimistic compilers. Some languages have tighter semantic rules to avoid dynamic checking.

Attribute Grammars Both semantic analysis and (intermediate) code generation can be described in terms of annotation, or “decoration” of a parse or syntax tree. Attribute grammars provide a formal framework for decorating such a tree. We'll start with decoration of parse trees, then consider syntax trees.

Example Grammar Consider the following LR (bottom-up) grammar for arithmetic expressions made of constants, with precedence and associativity: says nothing about what the program means. E → E + T E → E – T E → T T → T * F T → T / F T → F F → - F F → ( F ) F → const

Example Attribute Grammar SLR(1) We can turn this into an attribute grammar as follows (similar to Figure 4.1): E → E + T E1.val = E2.val + T.val E → E – T E1.val = E2.val - T.val E → T E.val = T.val T → T * F T1.val = T2.val * F.val T → T / F T1.val = T2.val / F.val T → F T.val = F.val F → - F F1.val = - F2.val F → (E) F.val = E.val F → const F.val = C.val

Attribute Rules The attribute grammar serves to define the semantics of the input program. Attribute rules are best thought of as definitions, not assignments. They are not necessarily meant to be evaluated at any particular time, or in any particular order, though they do define their left-hand side in terms of the right-hand side.

Annotation The process of evaluating attributes is called annotation, or decoration, of the parse tree (see Figure 4.2 for (1+3)*2): When a parse tree under this grammar is fully decorated, the value of the expression will be in the val attribute of the root. The code fragments for the rules are called semantic functions: Strictly speaking, they should be cast as functions, e.g.: E1.val = sum (E2.val, T.val). This is a very simple attribute grammar: Each symbol has at most one attribute: the punctuation marks have no attributes. These attributes are all so-called synthesized attributes: They are calculated only from the attributes of things below them in the parse tree.

Decoration of a Parse Tree

Evaluating Attributes In general, we are allowed both synthesized and inherited attributes. Inherited attributes may depend on things above or to the side of them in the parse tree. Tokens have only synthesized attributes, initialized by the scanner (name of an identifier, value of a constant, etc.). Inherited attributes of the start symbol constitute run-time parameters of the compiler.

Synthesized Attributes The S-attributed grammar uses only synthesized attributes. Its attribute flow (attribute dependence graph) is purely bottom-up. The arguments to semantic functions in an S-attributed grammar are always attributes of symbols on the right-hand side of the current production. The return value is always placed into an attribute of the left hand side of the production. The intrinsic properties of tokens are synthesized attributes initialized by the scanner.

Inherited Attributes Inherited attributes: values are calculated when their symbol is on the right-hand side of the current production. Contextual information flow into a symbols for above or from the side: provide different context. Symbol table information is commonly passed be means of inherited attributes. Inherited attributes of the root of the parse tree can be used to represent external environment. Example: left-to-right associativity may create a situation where an S-attributed grammar would be cumbersome to use. By passing attribute values left-to-right in the tree, things are much simpler.

Example Attribute Grammar LL(1) E → T TT E.v = TT.v TT.st = T.v TT1 → + T TT2 TT1.v = TT2.v TT2.st = TT1.st + T.v TT1 → - T TT2 TT1.v = TT2.v TT2.st = TT1.st - T.v TT → ε TT.v = TT.st T → F FT T.v = FT.v FT.st = F.v FT1 → * F FT2 FT1.v = FT2.v FT2.st = FT1.st * F.v FT1 → / F FT2 FT1.v = FT2.v FT2.st = FT1.st / F.v FT → ε FT.v = FT.st F1 → - F2 F1.v = - F2.v F → ( E ) F.v = E.v F → const F.v = C.v

Parse Tree for (1+3)*2

Attribute Flow Well defined attribute grammar: its rules determine a unique set of values for attributes of every possible parse tree. Noncircular attribute grammar: it never leads to a parse tree in which there are cycles in the attribute flow graph. Translation scheme: an algorithm that decorates parse trees by invoking the rules of an attribute grammar in an order consistent with the tree’s attribute flow. The LL(1) attribute grammar is a good bit messier than the SLR(1) one, but it is still L-attributed (the attributes can be evaluated in a single left-to-right pass over the input): L-attributed grammars are the most general class of attribute grammars that can be evaluated during an LL parse. S-attributed grammars are the most general class of attribute grammars that can be evaluated during an LR parse.

Parsers and Attribute Grammars Each synthetic attribute of a LHS symbol (by definition of synthetic) depends only on attributes of its RHS symbols. A bottom-up parser: in general paired with an S-attributed grammar. Each inherited attribute of a RHS symbol (by definition of L- attributed) depends only on: Inherited attributes of the LHS symbol, or Synthetic or inherited attributes of symbols to its left in the RHS. A top-down parser: in general paired with an L-attributed grammar. There are certain tasks, such as generation of code for short- circuit Boolean expression evaluation, that are easiest to express with non-L-attributed attribute grammars. Because of the potential cost of complex traversal schemes, however, most real-world compilers insist that the grammar be L-attributed.

Building a Syntax Tree If intermediate code generation is interleaved with the parsing, no need to build a syntax tree. One-pass compiler: a compiler that interleaves semantic analysis and code generation with parsing. Semantic analysis is easier to perform during a separate traversal of a syntax tree: Add attribute rules to the context-free grammar. The attributes point to nodes of a syntax tree. Two examples: Bottom-up attribute grammar. Top-down attribute grammar.

Bottom-Up Attribute Grammar

Two Leaves: Constants 1 and 3

First Internal Node: 1+3

Third Leaf: Constant 2

Second Internal Node: (1+3)*2

Top-Down Attribute Grammar

First Leaf: Constant 1

Second Leaf: Constant 3

Third Leaf: Constant 2

ANTLR Grammar: Program.g grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;

Using ANTLR From Generate Menu select Generate Code menu item. In gencode subdirectory three files are generated: Program.tokens: The list of token-name, token-type assignments ProgramLexer.java: The lexer (scanner) generated from Program.g. ProgramParser.java: The parser generated from Program.g. Create a tester class (with main), e.g. RunProgram.java. Compile and run: javac RunProgram.java ProgramParser.java ProgramLexer.java java RunProgram Make sure that the ANTLR jar file is in your class path or included in your Java installation. ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).

Main Class: RunProgram.java import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ); parser.program(); }

Evaluate: ProgramEvaluation.g I grammar ProgramEvaluation; @header { import java.util.HashMap; } @members { HashMap symbolTable = new HashMap(); program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;

Evaluate: ProgramEvaluation.g II expression returns [int value] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpression returns [int value] : e=atom ('*' e=atom {$value *= $e.value;}

Evaluate: ProgramEvaluation.g II atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

Summary The enforcement of static semantic rules and the generation of intermediate code can be cast in terms of annotation of a parse tree. An attribute grammar associates attributes with each symbol in a context-free grammar and attribute rules which each production. Attributes are either synthesized(bottom-up) or inherited (top- down). The syntax tree is a tree representation of the abstract syntactic structure of source code.