COMP 433 – Theory of Compilers (Level – 10)

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

CPSC Compiler Tutorial 9 Review of Compiler.
Yu-Chen Kuo1 Chapter 1 Introduction to Compiling.
Compiler Construction
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Topic #3: Lexical Analysis
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
COP4020 Programming Languages
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
1 Chapter 1 Introduction. 2 Outlines 1.1 Overview and History 1.2 What Do Compilers Do? 1.3 The Structure of a Compiler 1.4 The Syntax and Semantics of.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Introduction to Compiling
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Role of Lexical Analyzer
What is a compiler? –A program that reads a program written in one language (source language) and translates it into an equivalent program in another language.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CS510 Compiler Lecture 1. Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles,
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
System Software Theory (5KS03).
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Introduction Chapter : Introduction.
PRINCIPLES OF COMPILER DESIGN
Chapter 1 Introduction.
CS510 Compiler Lecture 1.
Introduction to Compiler Construction
A Simple Syntax-Directed Translator
Chapter 3 Lexical Analysis.
Compiler Construction (CS-636)
Introduction.
Chapter 1 Introduction.
Compiler Construction
Chapter 1: Introduction to Compiling (Cont.)
Compiler Lecture 1 CS510.
CS416 Compiler Design lec00-outline September 19, 2018
Introduction to Compiler Construction
Course supervisor: Lubna Siddiqui
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
Compiler 薛智文 TH 6 7 8, DTH Spring.
R.Rajkumar Asst.Professor CSE
Compiler 薛智文 TH 6 7 8, DTH Spring.
CS 3304 Comparative Languages
CS416 Compiler Design lec00-outline February 23, 2019
Introduction to Compiler Construction
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Introduction Chapter : Introduction.
Compiler 薛智文 M 2 3 4, DTH Spring.
Introduction to Compiler Construction
Presentation transcript:

COMP 433 – Theory of Compilers (Level – 10) Unit 1 – Introduction to Compilers Unit 2 – Syntax Analysis Unit 3 – Intermediate Code Generation Unit 4 – Code Generation Unit 5 – Code Optimization

Unit – 1 : Introduction To Compilers Analysis of Source Program Phases of a compiler Cousins of Compilers Grouping of Phases Compiler construction tools Lexical Analysis Role of Lexical Analyzer Input Buffering Specification of Tokens

Definitions What is a compiler? What is an interpreter? A program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text (Grune et al, 2000). A program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language) (Aho et al) What is an interpreter? A program that reads a source program and produces the results of executing this source. We deal with compilers! Many of these issues arise with interpreters!

What is a Compiler? A program that translates a program in one language to another language The essential interface between applications & architectures Typically lowers the level of abstraction analyzes and reasons about the program & architecture We expect the program to be optimized, i.e., better than the original ideally exploiting architectural strengths and hiding weaknesses 4

Overview of Compilers Data Results Source program Object program Compilation Process: Interpretive Process: Results Source program Object program Compiler Executing Computer Compile time run time Data Source program Result Compiler

What Do Compilers Do (1) A compiler acts as a translator, transforming human-oriented programming languages into computer-oriented machine languages. Ignore machine-dependent details for programmer Programming Language (Source) Machine Language (Target) Compiler

What Do Compilers Do (2) Compilers may generate three types of code: Pure Machine Code Machine instruction set without assuming the existence of any operating system or library. Mostly being OS or embedded applications. Augmented Machine Code Code with OS routines and runtime support routines. More often Virtual Machine Code Virtual instructions, can be run on any architecture with a virtual machine interpreter or a just-in-time compiler Ex. Java

What Do Compilers Do (3) Another way that compilers differ from one another is in the format of the target machine code they generate: Assembly or other source format Relocatable binary Relative address A linkage step is required Absolute binary Absolute address Can be executed directly

Compiler vs. Interpreter (1/5) Compilers: Translate a source (human-writable) program to an executable (machine-readable) program Interpreters: Convert a source program and execute it at the same time.

Compiler vs. Interpreter (2/5) Ideal concept: Source code Executable Compiler Input data Executable Output data Source code Interpreter Output data Input data

Compiler vs. Interpreter (3/5) Most languages are usually thought of as using either one or the other: Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1 Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk BUT: not always implemented this way Virtual Machines (e.g., Java) Linking of executables at runtime JIT (Just-in-time) compiling

Compiler vs. Interpreter (4/5) Actually, no sharp boundary between them. General situation is a combo: Translator Intermed. code Source code Intermed. code Virtual machine Output Input Data

Compiler vs. Interpreter (5/5) Pros Less space Fast execution Cons Slow processing Partly Solved (Separate compilation) Debugging Improved thru IDEs Interpreter Pros Easy debugging Fast Development Cons Not for large projects Exceptions: Perl, Python Requires more space Slower execution Interpreter in memory all the time

Programs related to Compiler

Interpreters Execute the source program immediately rather than generating object code Examples: BASIC, LISP, used often in educational or development situations Speed of execution is slower than compiled code by a factor of 10 or more Share many of their operations with compilers

Assemblers A translator for the assembly language of a particular computer Assembly language is a symbolic form of one machine language A compiler may generate assembly language as its target language and an assembler finished the translation into object code

Linkers Collect separate object files into a directly executable file Connect an object program to the code for standard library functions and to resource supplied by OS Becoming one of the principle activities of a compiler, depends on OS and processor

Loaders Resolve all re-locatable address relative to a given base Make executable code more flexible Often as part of the operating environment, rarely as an actual separate program

Preprocessors Delete comments, include other files, and perform macro substitutions Required by a language (as in C) or can be later add-ons that provide additional facilities

Editors Compiler have been bundled together with editor and other programs into an interactive development environment (IDE) Oriented toward the format or structure of the programming language, called structure-based May include some operations of a compiler, informing some errors

Debuggers Used to determine execution error in a compiled program Keep tracks of most or all of the source code information Halt execution at pre-specified locations called breakpoints Must be supplied with appropriate symbolic information by the compiler

Profiles Collect statistics on the behavior of an object program during execution Called Times for each procedures Percentage of execution time Used to improve the execution speed of the program

Project Managers Coordinate the files being worked on by different people, maintain coherent version of a program Language-independent or bundled together with a compiler Two popular project manager programs on Unix system Sccs (Source code control system) Rcs (revision control system) BACK

The Many Phases of a Compiler Source Program 1 Lexical analyzer Analyses 2 Syntax Analyzer 3 Semantic Analyzer Intermediate Code Generator Symbol-table Manager 4 Error Handler 5 Code Optimizer Syntheses 6 Code Generator 7 Peephole Optimization 1, 2, 3, 4, 5 : Front-End 6, 7 : Back-End Target Program

Phase 1. Lexical Analysis Easiest Analysis - Identify tokens which are the basic building blocks For Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ All are tokens Blanks, Line breaks, etc. are scanned out

Phase 2. Syntax Analysis or Parsing For example, we would have a Parse Tree: identifier expression number assignment statement position := + * 60 initial rate Nodes of tree are constructed using a grammar for the language

Phase 3. Semantic Analysis Finds Semantic Errors One of the Most Important Activity in This Phase: Type Checking - Legality of Operands position initial rate := + * inttoreal 60 position initial rate := + * 60 Syntax Tree Conversion Action

Supporting Phases / Activities for Analysis Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis Error Handling Detection of Different Errors Which Correspond to All Phases What Happens When an Error Is Found?

The Synthesis Task For Compilation Intermediate Code Generation Abstract Machine Version of Code - Independent of Architecture Easy to Produce and Do Final, Machine Dependent Code Generation Code Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements Final Code Generation Generate Relocatable Machine Dependent Code Peephole Optimization With a Very Limited View Improves Produced Final Code

Reviewing the Entire Process position := initial + rate * 60 lexical analyzer id1 := id2 + id3 * 60 syntax analyzer := id1 id2 id3 + * 60 semantic analyzer := id1 id2 id3 + * inttoreal 60 Symbol Table Errors position .... initial …. rate…. intermediate code generator

Reviewing the Entire Process Symbol Table position .... initial …. rate…. Errors intermediate code generator t1 := inttoreal(60) t2 := id3 * t1 t3 := id2 + t2 id1 := t3 3 address code code optimizer t1 := id3 * 60.0 id1 := id2 + t1 final code generator MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1

The Phases of a Compiler Output Sample Programmer (source code producer) Source string A=B+C; Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’ And symbol table with names Parser (performs syntax analysis based on the grammar of the programming language) Parse tree or abstract syntax tree ; | = / \ A + / \ B C Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree Intermediate code generator Three-address code, quads, or RTL int2fp B t1 + t1 C t2 := t2 A Optimizer int2fp B t1 + t1 #2.3 A Code generator Assembly code MOVF #2.3,r1 ADDF2 r1,r2 MOVF r2,A Peephole optimizer ADDF2 #2.3,r2 MOVF r2,A

The Structure of a Compiler (1) Any compiler must perform two major tasks Analysis of the source program Synthesis of a machine-language program Compiler Analysis Synthesis

The Structure of a Compiler (2) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code

The Structure of a Compiler (3) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Scanner The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) RE ( Regular expression ) NFA ( Non-deterministic Finite Automata ) DFA ( Deterministic Finite Automata ) LEX Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code

The Structure of a Compiler (4) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Parser Given a formal syntax specification (typically as a context-free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. CFG ( Context-Free Grammar ) BNF ( Backus-Naur Form ) GAA ( Grammar Analysis Algorithms ) LL, LR, SLR, LALR Parsers YACC Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code

The Structure of a Compiler (5) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Semantic Routines Perform two functions Check the static semantics of each construct Do the actual translation The heart of a compiler Syntax Directed Translation Semantic Processing Techniques IR (Intermediate Representation) Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code

The Structure of a Compiler (6) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code This phase can be very complex and slow Peephole optimization loop optimization, register allocation, code scheduling Register and Temporary Management Peephole Optimization Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code

The Structure of a Compiler (7) Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Code Generator Interpretive Code Generation Generating Code from Tree/Dag Grammar-Based Code Generator Optimizer Code Generator Target machine code

The Structure of a Compiler (8) Code Generator [Intermediate Code Generator] Non-optimized Intermediate Code Scanner [Lexical Analyzer] Tokens Code Optimizer Parser [Syntax Analyzer] Optimized Intermediate Code Parse tree Code Optimizer Semantic Process [Semantic analyzer] Target machine code Abstract Syntax Tree w/ Attributes

LEXICAL ANALYSIS The role of the lexical analyzer First phase of a compiler 1、Main task To read the input characters To produce a sequence of tokens used by the parser for syntax analysis As an assistant of parser

LEXICAL ANALYSIS The role of the lexical analyzer 2、Interaction of lexical analyzer with parser Lexical analyzer Parser Symbol table Source program token Get next token

LEXICAL ANALYSIS The role of the lexical analyzer 3、Processes in lexical analyzers Scanning Pre-processing Strip out comments and white space Macro functions Correlating error messages from compiler with source program A line number can be associated with an error message Lexical analysis

LEXICAL ANALYSIS The role of the lexical analyzer 4、Terms of the lexical analyzer Token Types of words in source program Keywords, operators, identifiers, constants, literal strings, punctuation symbols(such as commas,semicolons) Lexeme Actual words in source program Pattern A rule describing the set of lexemes that can represent a particular token in source program Relation {<.<=,>,>=,==,<>}

LEXICAL ANALYSIS The role of the lexical analyzer 5、Attributes for Tokens A pointer to the symbol-table entry in which the information about the token is kept E.g E=M*C**2 <id, pointer to symbol-table entry for E> <assign_op,> <id, pointer to symbol-table entry for M> <multi_op,> <id, pointer to symbol-table entry for C> <exp_op,> <num,integer value 2>

LEXICAL ANALYSIS The role of the lexical analyzer 6、Lexical Errors Deleting an extraneous character Inserting a missing character Replacing an incorrect character by a correct character Transposing two adjacent characters(such as , fi=>if) Pre-scanning

LEXICAL ANALYSIS The role of the lexical analyzer 7、Input Buffering Two-buffer input scheme to look ahead on the input and identify tokens Buffer pairs Sentinels(Guards)

LEXICAL ANALYSIS The role of the lexical analyzer 1、Regular Definition of Tokens Defined in regular expression e.g. Id  letter(letter|digit) letter A|B|…|Z|a|b|…|z digit 0|1|2|…|9 Notes: Regular expressions are an important notation for specifying patterns. Each pattern matches a set of strings, so regular expressions will serve as as names for sets of strings.

LEXICAL ANALYSIS The role of the lexical analyzer 2、Regular Expression & Regular language Regular Expression A notation that allows us to define a pattern in a high level language. Regular language Each regular expression r denotes a language L(r) (the set of sentences relating to the regular expression r) Notes: Each word in a program can be expressed in a regular expression

LEXICAL ANALYSIS The role of the lexical analyzer 3、The rule of regular expression over alphabet   is a regular expression that denote {}  is regular expression {} is the related regular language 2) If a is a symbol in , then a is a regular expression that denotes {a} a is regular expression {a} is the related regular language

LEXICAL ANALYSIS The role of the lexical analyzer 3、The rule of regular expression over alphabet  3) Suppose  and  are regular expressions, then |, , * , * is also a regular expression Notes: Rules 1) and 2) form the basis of the definition; rule 3) provides the inductive step.

LEXICAL ANALYSIS The role of the lexical analyzer 4、Algebraic laws of regular expressions 1) |= | 2) |(|)=(|)| () =( ) 3) (| )=  |  (|)= |  4)  =  =  5)(*)*=* 6) *=+| + =  * = * 7) (|)*= (* | *)*= (* *)*

LEXICAL ANALYSIS The role of the lexical analyzer 4、Algebraic laws of regular expressions 8) If L(),then = |   = *  = |   =  * Notes: We assume that the precedence of * is the highest, the precedence of | is the lowest and they are left associative

LEXICAL ANALYSIS The role of the lexical analyzer 5、Notational Short-hands a)One or more instances ( r )+ digit+ b)Zero or one instance r? is a shorthand for r| (E(+|-)?digits)? c)Character classes [a-z] denotes a|b|c|…|z [A-Za-z] [A-Za-z0-9]

LEXICAL ANALYSIS The Specification of Tokens 1、Task of recognition of token in a lexical analyzer Isolate the lexeme for the next token in the input buffer Produce as output a pair consisting of the appropriate token and attribute-value, such as <id,pointer to table entry> , using the translation table given in the Fig in next page

LEXICAL ANALYSIS The Specification of Tokens 1、Task of recognition of token in a lexical analyzer Regular expression Token Attribute-value if - id Pointer to table entry < relop LT

LEXICAL ANALYSIS The Specification of Tokens 2、Methods to recognition of token Use Transition Diagram

LEXICAL ANALYSIS The Specification of Tokens 3、Transition Diagram(Stylized flowchart) Depict the actions that take place when a lexical analyzer is called by the parser to get the next token Accepting state start > = return(relop,GE) 6 7 other Start state * 8 return(relop,GT) Notes: Here we use ‘*’ to indicate states on which input retraction must take place

LEXICAL ANALYSIS The Specification of Tokens 4、Implementing a Transition Diagram Each state gets a segment of code If there are edges leaving a state, then its code reads a character and selects an edge to follow, if possible Use nextchar() to read next character from the input buffer

LEXICAL ANALYSIS The Specification of Tokens 4、Implementing a Transition Diagram while (1) { switch(state) { case 0: c=nextchar(); if (c==blank || c==tab || c==newline){ state=0;lexeme_beginning++} else if (c== ‘<‘) state=1; else if (c==‘=‘) state=5; else if(c==‘>’) state=6 else state=fail(); break case 9: c=nextchar(); if (isletter( c)) state=10; else state=fail(); break … }}}

LEXICAL ANALYSIS The Specification of Tokens 5、A generalized transition diagram Finite Automation Deterministic or non-deterministic FA Non-deterministic means that more than one transition out of a state may be possible on the the same input symbol

LEXICAL ANALYSIS The Specification of Tokens 6. The model of recognition of tokens i f d 2 =… FA simulator Input buffer Lexeme_beginning

LEXICAL ANALYSIS The Specification of Tokens e.g:The FA simulator for Identifiers is: Which represent the rule: identifier=letter(letter|digit)* 1 2 letter digit

INPUT BUFFERING Speedup the reading the source program Look one or more characters beyond the next lexeme There are many situations where we need to look at least one additional character ahead.

INPUT BUFFERING For instance, we cannot be sure we’ve seen the end of an identifier until we see a character that is not a letter or digit, and therefore is not part of the lexeme for id. In C, single-character operators like -, =, or < could also be the beginning of a two-character operator like ->, ==, or <=. A a two-buffer scheme that handles large lookaheads safely. We then consider an improvement involving “sentinels” that saves time checking for the ends of buffers.

END OF UNIT - 1