Compilation Encapsulation Or: Why Every Component Should Just Do Its Damn Job.

Slides:



Advertisements
Similar presentations
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
Advertisements

Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
WPSM Programming Language A simple language that transform simple data structure into complex xML format Wai Y. Wong Peter Chen Seema Gupta Miqdad Mohammed.
Introduction to Computers and Programming Lecture 4: Mathematical Operators New York University.
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion.
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Mini-Pascal Compiling Mini-Pascal (MPC) language
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 CISC181 Introduction to Computer Science Dr. McCoy Lecture 19 Clicker Questions November 3, 2009.
COP4020 Programming Languages
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
General Programming Introduction to Computing Science and Programming I.
C Tokens Identifiers Keywords Constants Operators Special symbols.
CSC 338: Compiler design and implementation
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Compiler Construction Dr. Noam Rinetzky and Orr Tamir School of Computer Science Tel Aviv University
CPS 506 Comparative Programming Languages Syntax Specification.
Functions CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
Programming Languages
School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
What am I? while b != 0 if a > b a := a − b else b := b − a return a AST == Abstract Syntax Tree.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.
Tokens in C  Keywords  These are reserved words of the C language. For example int, float, if, else, for, while etc.  Identifiers  An Identifier is.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Compilers Computer Symbol Table Output Scanner (lexical analysis)
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Language Implementation Overview John Keyser Spring 2016.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
Functions CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
1 float Data Type Data type that can hold numbers with decimal values – e.g. 3.14, 98.6 Floats can be used to represent many values: –Money (but see warning.
CS 3304 Comparative Languages
A Simple Syntax-Directed Translator
Chapter 2 :: Programming Language Syntax
PROGRAMMING LANGUAGES
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
COMPILERS Semantic Analysis
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Faculty of Computer Science and Information System
Presentation transcript:

Compilation Encapsulation Or: Why Every Component Should Just Do Its Damn Job

“when a negative int literal (e.g. -5) appears in the code, should it be a single integer token whose value is -5 or as two tokens, minus and an integer whose value is 5?”

Well, in theory… We can write a lexer (maybe not with flex) with lookbehind, that makes sure the last token was neither a number nor a variable. (Or a function call, or a field reference. Pretty complicated lexer.)

But just because we can do it does that make it a good idea?

But what if we change the syntax? Professor Moriarty wants IC to be more like Matlab. He asks you to support support scalar operations on arrays of scalars. array – scalar = [a 1 -scalar, … a n -scalar] And suddenly new int[n] - 6 is valid…

Generic compiler structure Executable code exe Source text txt Semantic Representation Backend (synthesis) Compiler Frontend (analysis)

Executable code exe Source text txt Semantic Representation Backend (synthesis) Compiler Frontend (analysis) IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

Encapsulation, what does it mean? It means each component needs to do its job, without regard for what the other components are doing. The tokenizer only cares about dividing the stream into tokens – Invalid characters – Keywords – Strings and comments

The parser only cares about building a structure out of tokens – Assumes a valid stream of tokens – Structural rules with no meaning The semantic checker is free to only worry about semantics – Assumes a valid AST – Actually worries about meaning

Fake Exam Question #1 Professor Xavier wants IC to be more like Python. He asks you to support array and string multiplication. "abc"*3  “abcabcabc” (new MyClass[5] * 2).length  10

But suppose… Suppose you decided to define your strings like so: \" { //move to state to handle content yybegin(STRING); in_string_literal = true; } { \"/{VALID_STR_POSTFIX} { //found the end of a string, finish. yybegin(YYINITIAL); in_string_literal = false; return new Token (sym.QUOTE,yyline + 1, string.toString()); } \" { throw new LexicalError(yyline+1); } //longest token only if invalid ahead

But suppose… Now we have to go back and fix the lexer, too. When, in reality, there was no real reason to perform that test: – There’s no case of something after the string the syntax won’t be able to cope with.

Back to the tokenizer not caring What needs changing? Lexer: – Nothing Syntax: – Nothing Semantic checks – Type check Code generation – Functionality of the operation

Fake Exam Question #2 We want IC to support binary numeric literals – With the following syntax: 0b (leading zeros after the binary signifier allowed) – With the same range restrictions

Solution #1 We’ll add a new lexer token type, BINNUMBER – 0b[01]+ And a new syntax rule for a BinNumber literal – Which, really, is only BINNUMBER And then check its range – Which is actually a lot easier than with decimals…

A short interlude: where does X go? Is property X lexical, syntactic or semantic? Two main deciding factors 1.Correctness: Is there enough data to make the call right now? 2.Laziness: What will be gained by doing this right now? Is this the place where it’s easiest to do?

Example A: Range of decimal literals Correctness: – In any two's complement implementation of integers, the bound is not going to be symmetric. – So we can’t make the call until we know if we have a positive or a negative number on our hand… Laziness: – Writing a lot of code that looks at the child expressions during syntax is usually a bad sign.

Example B: range of binary literals Correctness: – All the data is there the second we got the token. Laziness: – Postponing the check means a continued separation between binary and decimal literals – If we check right now, we can convert the value to a number and forget all about it

Back to Fake Question #2 So we can actually do it this way: We’ll add a new lexer rule – 0b[01]+ – We’ll also check the range here – And then! – return new Token (sym.INTEGER,yyline + 1, bin2decimal(yytext()));

Where does Y go? Place the following property: call to method foo() is a static call. Our guiding principle here is correctness: Lexer Syntax?

Where does Y go? Syntax breaks methods up into three types: 1.ClassName.foo() – definitely static 2.varname.foo() – definitely not So… correct? 3.foo() - ??? So… not syntax.

Fake Question #3 We want to allow type inference in IC var a = new A(); A b = a; C c = a; //type error

Q3: Lexer New token type VAR

Q3: Syntax We want an init expression whose type is VAR – Do we add VAR to types? – No, we treat it like void. How about AST representation? – We modify our LocalVariable class to keep “TBD” in its type

Q3: Semantics To determine the new variable’s type: – instead of computing its type field (which is TBD) – compute the type of the expression Put that value into the symbol table, and all else is business as usual!

Good luck on the exam!