2.1 2. Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The compilation process. Example C-- code. Extended Backus-Naur.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Programming Languages Third Edition Chapter 6 Syntax.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
School of Computing and Engineering, University of Huddersfield LANGUAGE TRANSLATORS: WEEK 10 LECTURE: symbol tables TUTORIAL: Pen and paper exercises.
CPSC Compiler Tutorial 9 Review of Compiler.
By Neng-Fa Zhou Compiler Construction CIS 707 Prof. Neng-Fa Zhou
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Chapter3: Language Translation issues
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 2: Structure of programs and programming languages as communication.
Compiler design Computer Science Rensselaer Polytechnic Lecture 1.
3. Phase 2 : Syntax Analysis Part I
MDraw Graphics Manipulation Language Huimin Sun(hs2740) Dongxiang Yan(dy2224) Jingyu Shi (js4151) COMS 4115 Columbia University August 16, 2013.
Invitation to Computer Science 5th Edition
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
COP4020 Programming Languages
©J.Tiberghien - ULB-VUB Version Troisième Partie Chapitre 1 Les supports à la programmation.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CSC 338: Compiler design and implementation
CST320 - Lec 11 Why study compilers? n n Ties lots of things you know together: –Theory (finite automata, grammars) –Data structures –Modularization –Utilization.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Phase 2 : Syntax Analysis Part II The unit directory. What you must do. Example run. syner.cxx The lookahead convention. Error detection and recovery.
Parse & Syntax Trees Syntax & Semantic Errors Mini-Lecture.
CPS 506 Comparative Programming Languages Syntax Specification.
What on Earth? LEXEMETOKENPATTERN print p,r,i,n,t (leftpar( 4number4 *arith* 5number5 )rightpar) userAnswerID Letter followed by letters and digits “Game.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
1 Original Source : and Problem and Problem Solving.ppt.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Chapter 1 Introduction Major Data Structures in Compiler
Programming Languages
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Syntax (2).
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
What am I? while b != 0 if a > b a := a − b else b := b − a return a AST == Abstract Syntax Tree.
1.3 Analysis And Synthesis OF LP Language Processor = Analysis of Source Program + Synthesis of Target Program. 1.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Dr. Mohamed Ramadan Saady 314ALL CH1.1 Chapter 1: Introduction to Compiling.
CSC 4181 Compiler Construction
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Language Implementation Overview John Keyser Spring 2016.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
MiniJava Compiler A multi-back-end JIT compiler of Java.
Prologue Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Objective of the course Understanding the fundamentals of the compilation technique Assist you in writing you own compiler (or any part of compiler)
Compiler Design (40-414) Main Text Book:
Introduction to Compiler Construction
Lexical and Syntax Analysis
C# and the .NET Framework
Compiler Lecture 1 CS510.
CS 536 / Fall 2017 Introduction to programming languages and compilers
CSE401 Introduction to Compiler Construction
Compilers B V Sai Aravind (11CS10008).
Programming Languages 2nd edition Tucker and Noonan
Compiler design.
Lesson Objectives Aims Key Words
Lesson Objectives Aims Understand Syntax Analysis.
Chapter 10: Compilers and Language Translation
Presentation transcript:

Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The compilation process. Example C-- code. Extended Backus-Naur form. Lexical analysis. The syntax of C--. The unit directory. Phase 1.

2.2 Inside A Compiler symbol table abstract syntax tree p.lst Syntax analyser Lexical analyser Code generator p.cxx tokens a.out C++ Compiler Optimiser

2.3 Inside A C-- Compiler symbol table abstract syntax tree Syntax analyser Lexical analyser Code generator p.c-- tokens a.s C-- Compiler

2.4 The Compilation Process There are three main stages to compilation : –Lexical analysis. –Syntax analysis. –Code generation. Lexical analysis. –Recognising the individual components of the language.  Literals, identifiers, operators etc. –Throwing away irrelevant things like comments and whitespace. –Often called tokenising.

2.5 The Compilation Process II Syntax analysis. –Recognising declarations, statements etc. –Detecting syntactic and static semantic errors. –Building the symbol table and abstract syntax tree (AST). Code generation. –Generating machine code from the symbol table and the AST. Most modern compilers also perform optimisation on the code after both syntax analysis (macro optimisation) and code generation (micro optimisation). There are three phases to writing a compiler. –Phase 1 : write a lexical analyser. –Phase 2 : write a syntax analyser. –Phase 3 : write a code generator.

2.6 Example C-- Code The factorial program : // Computes the factorial of a value read // from input. int a = 1 ; // Result int b = 0 ; // Data { cin >> b ; // Read data // Loop to compute factorial. while (b > 0) // Check for termination { a = a * b ; // Compute new a value b = b - 1 ; // Decrement b } cout << a ; // Output result } // End of program C-- is Turing Machine Equivalent.

2.7 Extended Backus-Naur Form (EBNF) A way of formally defining syntax. Production rules : non-terminal ::= syntactic_term non-terminal : syntactic category. terminal : A piece of program text (a.k.a. lexical token). Five types of syntactic expression : X Y -- Sequence. X | Y -- Alternation. [ X ] -- Optional ( 0 or 1 occurrences). { X } -- Repetition ( 0 or more occurrences). ( X ) -- Bracketing. Terminals must be distinguished from non-terminals (e.g. ‘if’ ).

2.8 EBNF Example A simple language : sentence ::= noun_clause verb ( noun_clause | adverb ) ‘.’ noun_clause ::= article [ adj_list ] noun adj_list ::= adj { ‘,’ adj } article ::= ‘a’ | ‘the’ noun ::= ‘cat’ | ‘mouse’ adj ::= ‘black’ | ‘white’ | ‘thin’ | ‘fat’ adverb ::= ‘quickly’ | ‘slowly’ verb ::= ‘eats’ | ‘runs’

2.9 EBNF Example II Any piece of text that can be produced by following the syntactic rules is valid. Any piece of text that cannot be produced by following the syntactic rules is invalid - it contains a syntax error. Possible sentences : a black, fat cat eats a white mouse. the mouse runs quickly. Possible syntax errors : a black thin cat eats the black cat. the quickly, slowly fred. Deciding whether a sentence is valid is called parsing. Two phases to parsing : –Lexical analysis (phase 1). –Syntax analysis (phase 2).

2.10 Lexical Analysis Tokenisation. Splits the input characters into a series of structs each holding a terminal symbol and a designation of its kind. Called as a subprogram by the syntax analyser. Called whenever syntax analyser needs another token. Example : A = B + C ; Also discards comments and whitespace. First step : identify the lexical tokens in C--. IDENT ‘A’ ADDOP ‘+’ IDENT ‘C’ IDENT ‘B’ ASSIGN TERMINATOR

2.11 C-- Lexical Tokens The lexical tokens are the terminal symbols of the grammar. The lexical tokens in C-- can be split into the following groups : –Identifiers. –Literals. –Punctuation : =, ‘,,, ;, &, (, ), [, ], {, } and !. –Operators :  Relational : ==, !=, >, = and <=.  Additional : +, - and ||.  Multiplicative : *, /, % and &&. –Reserved words : const, bool, string, int, if, else, while, cin and cout.

2.12 The Unit Directory On Jaguar. Contains lots of useful stuff for this unit : /usr/users/staff/aosc/cm049icp/phase1 makefile : A makefile to build the phase 1 program. lexprog.cxx : The test bed program for phase 1. lexer.h : The header file for phase 1. skipWhiteComments.cxx, writeToken.cxx : to be included in your program (more later). lexer : A compiled (and linked) executable for phase 1. examples/*.c-- : Random C-- source programs. –If you write a good C-- program it to me and I’ll include it in the unit directory for everyone to admire.

2.13 The Unit Directory II tests/test*.c-- : These are the C-- source programs that I’ll be using to test your lexer program during the demo. –To get maximum marks for phase 1 your program should produce exactly the same output as lexer when run on these source programs. The C++ string library files are in /usr/users/staff/aosc/cm049icp/lib They are : cstring.h : String library header file. string.cxx : String library implementation file. string.o : String library object code file. Useful commands : testphase1, demophase1. Shell scripts for running the demo.

2.14 What You Must Do Get a copies of makefile, writeToken.cxx and skipWhiteComments.cxx from the unit directory. Probably a good idea to get copies of lexprog.cxx and lexer.h as well though it’s not necessary. – lexprog.cxx includes lexer.h which is the header file which contains the definition of the type LexToken and the prototypes for the subprogram which you must write : lexAnal. You must put your implementation of lexAnal plus the contents of writeToken.cxx and skipWhiteComments.cxx into a file called lexer.cxx in your directory. – lexer.cxx will be linked into the executable because of the makefile.

2.15 Example Run Assume that the file prog.c-- contains the following simple program : // Simple test program int a ; const string s = “Input : “ ; const string endl = “\n” ; { cin >> a ; cout << s ; cout << b ; cout << endl ; }

2.16 Example Run II Use the makefile to compile and link your lexer into the file lexer. Then run it : jaguar> make lexer jaguar> lexer < prog.c-- INT IDENTIFIER : ‘a’ TERMINATOR CONST STRING IDENTIFIER : ‘s’ STRINGLIT : “Input :” TERMINATOR CONST STRING IDENTIFIER : ‘endl’ STRINGLIT : “\n” TERMINATOR LBRACE Lexer reads from cin and writes to cout each token’s kind and (if required), its value. Lexer reads from cin and writes to cout each token’s kind and (if required), its value.

2.17 Example Run III CIN INOP IDENTIFIER : ‘a’ TERMINATOR COUT OUTOP IDENTIFIER : ‘s’ TERMINATOR COUT OUTOP IDENTIFIER : ‘b’ TERMINATOR COUT OUTOP IDENTIFIER : ‘endl’ TERMINATOR RBRACE jaguar>

2.18 lexprog.cxx #include #include ".../phase1/lexer.h" void main() { LexToken lexToken ; // Next lexical token skipWhiteComments() ; while (cin) { lexAnal(lexToken) ; writeToken(lexToken) ; cout << endl ; } } // main You must write this subprogram.

2.19 lexer.h #ifndef LEXER_H #define LEXER_H #include ".../lib/cstring.h" enum LexTokenTag {... } ; struct LexToken {...} ; void lexAnal(LexToken &lexToken) ; void skipWhiteComments() ; void writeToken(LexToken lexToken) ; #endif

2.20 The LexToken Type enum LexTokenTag { IDENT,..., COUT } ; struct LexToken { LexTokenTag tag ; // Tag field string ident ; // Identifier string boolLit ; // String literal string stringLit ; // String literal int intLit ; // Integer literal string addOp ; // Add operator string relOp ; // Rel operator string mulOp ; // Mul operator } ; // LexToken

2.21 lexAnal Prototype : void lexAnal(LexToken &lexToken) ; lexAnal reads the next lexical token from input and returns it via the reference parameter lexToken. Assumes that the next character on cin is the first character of the next lexical token. –Done by calling skipWhiteComments. Pre-defined C++ function to put a character back onto the start of the input stream after we’ve inspected it : char putback(char ch) ; lexAnal inspects the next input character and uses it to decide what the next lexical token is.

2.22 lexAnal II After token has been read lexAnal calls skipWhiteComments to read to start of next lexical token. Top-level code for lexAnal : cin.get(next) ; if (!cin) // EOF encountered. Print error message and call exit. else { if... // Nested if statement to recognise next character, // lex the token and return it in lexToken. else // Character not recognised. Print error // message and call exit. } skipWhiteComments() ;

2.23 Lookahead 28 possible kinds of lexical token : one value in the LexTokenTag enum for each kind. lexAnal can identify the kind of some tokens just by looking at the first character on input (i.e. by one character lookahead). This is true for tokens that start with (or consist of only) the following characters : – “,,, ;, (, ), [, ], {, }, +, -, *, /, %, The following pairs of tokens require lexAnal to inspect the next two input characters (i.e. two character lookahead) to distinguish between them : = and == ! and != | and || > and >= < and <= & and && / and //

2.24 One Character Lookahead The lexAnal code for this is fairly simple. If the token kind has a value associated with it : if (next == ‘;’) { lexToken.tag = TERMINATOR ; } If the token kind has a value associated with it : if (isdigit(next)) { cin.putback(next) ; lexIntLit(lexToken) ; } The subprogram lexIntLit reads the integer literal and sets the fields of lexToken appropriately.

2.25 Reading String Literals When the next input character is “ we have a string literal. if (next == ‘\”’) { cin.putback(next) ; lexStringLit(lexToken) ; } lexStringLit uses a while loop to read the string literal character by character and appends them onto the end of lexToken.stringLit using the C++ string append operator, +. –The implementation of lexStringLit is in the file lexStringLit.cxx in the phase1 subdirectory of the unit directory. The C++ string append operator ( + ) will also be useful when reading identifiers.

2.26 Two Character Lookahead None of the token kinds that require two character lookahead have a value associated with them so the lexAnal code is actually fairly simple. Example : if (next == ‘<’) { lexToken.tag = RELOP ; cin.get(next) ; if (next == ‘=‘) lexToken.relOp = “<=“ ; else if (next == ‘<‘) lexToken.tag = OUTOP ; else { cin.putback(next) ; lexToken.relOp = “<“ ; }

2.27 Identifiers, Reserved Words And Boolean Literals If the next character on the input is a letter then the token may be an identifier, a reserved word or a boolean literal ( false or true ). The only way to decide is to read the string in character by character until a character which is not a letter, digit or ‘_’ is encountered. –Use the same method as in lexStringLit. Once the string has been read : if ((string == “true”) || (string == “false”)) // It’s a boolean. else if (string == “const”) // const reserved word. else if... else // It’s an identifier.

2.28 lexer.cxx Put your implementation of lexAnal (and all the lexing subprograms) in a file called lexer.cxx in your own directory. lexer.cxx must also #include the following files : “.../phase1/lexer.h” “.../phase1/cstring.h” You may not need ctype.h and stdlib.h but you will certainly need all the others.

2.29 Lexer Errors Most syntactic errors will be detected during syntax analysis. Some syntactic errors can be detected during lexical analysis. –Lexer errors. Already seen a few : –Unexpected EOF (see slide 2.22). –Unrecognised character (see slide 2.22). –A single | (see slide 2.23). There is one other error which your lexer must detect : –A single “ without a corresponding “ after the string literal.

2.30 Lexer Errors II Most compilers attempt to recover from errors in the program by guessing what the programmer actually meant. This is very difficult to do which is why compiles produce so many spurious errors and rubbish error messages. When your lexer detects an error it should simply output an error message to cout and call exit to terminate the program. –See lexStringLit.cxx for an example of how to do this. –Make sure you give exit a different integer parameter for different errors. –Make sure you give exit the same integer parameter for the same error.  e.g. always use exit(1) for unexpected end of file wherever in the lexer it is detected. –This is good programming practice.

2.31 Summary Write a compiler from C-- to M68K assembly code. –Phase 1 : lexical analysis (10%). –Phase 2 : syntax analysis (20%). –Phase 3 : code generation (10%). Write a parser for SCL. –Phase 4 : 60% Lexical analysis converts program text into a series of lexical tokens. –Discards comments and whitespace. –Requires lookahead to determine token kind.