Compilers Computer Symbol Table Output Scanner (lexical analysis)

Slides:



Advertisements
Similar presentations
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Advertisements

CS 31003: Compilers Introduction to Phases of Compiler.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
CPSC Compiler Tutorial 9 Review of Compiler.
Introduction to Compilers Professor Yihjia Tsai 2006 Spring Tamkang University.
Chapter3: Language Translation issues
Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.
UMBC Introduction to Compilers CMSC 431 Shon Vick 01/28/02.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
COP4020 Programming Languages
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Introduction to Compiling
Introduction CPSC 388 Ellen Walker Hiram College.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
2/1/20161 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CSC 4181 Compiler Construction
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
CS510 Compiler Lecture 1. Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles,
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
CS 3304 Comparative Languages
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
System Software Unit-1 (Language Processors) A TOY Compiler
A Simple Syntax-Directed Translator
Chapter 1 Introduction.
PROGRAMMING LANGUAGES
-by Nisarg Vasavada (Compiled*)
Compiler Lecture 1 CS510.
Compiler Construction
Introduction CI612 Compiler Design CI612 Compiler Design.
Review: Compiler Phases:
Compilers B V Sai Aravind (11CS10008).
CS 3304 Comparative Languages
CMPE 152: Compiler Design August 21/23 Lab
CS 3304 Comparative Languages
Compiler design.
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Faculty of Computer Science and Information System
Presentation transcript:

Compilers Computer Symbol Table Output Scanner (lexical analysis) Syntactic/semantic structure tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Machine language Code Optimizer Input Data Computer Symbol Table Output

Interpreters Interpreter Source language Output Input Data

Hybrid Output Interpreter Symbol Table Scanner (lexical analysis) tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Source language Input Data Intermediate Code Interpreter Symbol Table Output

Object Code Optimization Source Program if (a >= b+1) { a *= 2; } … Lexical Analysis Front End (analysis) Syntax Analysis Semantic Analysis Intermediate Code Gen The Compilation Process _t1 = b + 1 _t2 = a < _t1 If _t2 go to L0 … Intermediate Representation IR Optimization Back End (synthesis) Object Code Gen Object Code Optimization lw $t1, -16($fp) Add $t0, $t1, 1 … Target Program

The Analysis Stage Broken up into four phases Lexical Analysis (also called scanning or tokenization) Parsing Semantic Analysis Intermediate Code Generation

Lexical Analysis and Scanners/Lexers Lexical analysis is the first phase of compilation where the compiler attempts to recognize the symbols of the actual source code Lexical analyzers also called scanners/lexers are usually subroutines or coroutines of the parser. The parser will ask for the next token from the source file and the lexer will return that token.

Lexing Example lexemes double d1; double d2; d2 = d1 * 2.0; double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name = TOK_OPER has value of “=” d1 TOK_ID variable name * TOK_OPER has value of “*” 2.0 TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes

Lexical Analysis – sequences Expression Base * base - 0x4 * height * width Token sequence Name:Base operator:times name:base operator:minus hexConstant:4 operator:times name:height operator:times name:width Lexical phase returns token and value

Tokens and lexemes Lexers work with patterns, tokens, and lexemes. Patterns formally describe tokens in some way. Tokens are the terminal symbols in the grammar for the language. Lexemes are the actual strings that match the patterns

Example Token Lexeme Pattern Description int identifier MyVar letter followed by digits or letters literal ``foo'' characters enclosed in quotes

Expressing Patterns for Tokens As you may have already guessed (or know), the easiest way to specify a token is with a regular expression.

Regexs Regular expressions (regexs) are used to describe (regular) languages. Here are the rules of regular expressions: The empty string, , is a regular expression (e) A symbol is a regular expression (e.g., a) If R and S are regexs, then so is R|S (denoting R or S) RS (concatenation) R* (zero or more of R) (R) (grouping)

Regex Conventions There are various conventions used in the world of regular expressions to make things a bit easier. R+ (one or more of R) R? (zero or one of R) [a-z], [A-Z], [0-9] (character classes) . - any single character/symbol Precedence rules for operators to avoid excessive parenthesis. All operators group left-to-right. *, + and ? have highest concatenation is second highest | is the lowest

Examples a...b - five letter words starting with a and ending with b a*(bb)*a* - words with an even number of b's. .*(ing|er)s? - words ending with ing or er, with zero or one s [0-9]+\.[0-9]+(e|E)-?[0-9](l|L|f|F)? - simplified version of floating point constants in C (the backslash (\) means ``take the next character literally'') (R|)* - equivalent to R*

Another Example Expression -> Expression + Expression | ... Variable | Constant | Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT

The Parse a + 2 Expression -> Expression + Expression -> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT

Semantic Analysis The syntactically correct parse tree (or derivation) is checked for semantic errors Check for constructs that while valid syntax do not obey the semantic rules of the source language. Examples: Use of an undeclared/un-initialized variable Function called with improper arguments Incompatible operands and type mismatches,

Most semantic analysis pertains to the checking of types. Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.

Intermediate Code Generation Where the intermediate representation of the source program is created. The representation can have a variety of forms, but a common one is called three-address code (TAC) Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.

Example _t1 = b * c a = b * c + b * d _t2 = b * d _t3 = _t1 + _t2 a = _t3 a = b * c + b * d Note temps

Another Example Note Temps Symbolic addresses _t1 = a > b if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses

Backend (Synthesis) Basic Steps Intermediate Code optimization Object Code Generation Object Code Optimization Synthesis is not as deterministic/predictable as analysis. Thus, synthesis must be conservative and this is why optimizing can be lengthy and not ``perfect''.

Intermediate Code optimization Input is IR, output is optimized IR What are some of the optimizations that can be performed? Algebraic simplifications (*1,/1,*0, factoring, etc) Moving invariant code out of loops Removal of isolated code and unused variables Removing variables that are not used

IR Optimization Optimizations take place with IR and when manipulating actual machine code. However the optimizations done at the IR stage can be done to any program, regardless of architecture. The optimizations done with machine/object code usually exploit some feature of the target architecture in some way What’s this say about a JITC approach?

Example _t1 = b * c _t2 = _t1 + 0 _t1 = b * c _t3 = b * c a = _t4 _t1 = b * c _t2 = _t1 + t1 a = _t2

Object Code Generation The output of this stage is machine or assembly code Variables get mapped to memory locations (Variables are just a shorthand for that anyway) Actual machine instructions are swapped for symbolic ones

Object Code Optimization May follow code generation Optional – only on demand Variable Like IR Optimization may be expensive Levels Exploits machine detail Examples: Register pools Instruction Pipelining