Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Chapter 3 Lexical Analysis Yu-Chen Kuo.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Recognize tokens and ignore white spaces, comments
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis Natawut Nupairoj, Ph.D.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
CSC 338: Compiler design and implementation
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lexical Analysis (I) Compiler Baojian Hua
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
Lexical and Syntax Analysis
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Lexical Analyzer in Perspective
CPS 506 Comparative Programming Languages Syntax Specification.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
1st Phase Lexical Analysis
Compilers Computer Symbol Table Output Scanner (lexical analysis)
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter2 : Lexical Analysis
CS 3304 Comparative Languages
I.E. LEXICAL ANALYSIS I.E. LINEAR ANALYSIS
Lexical Analyzer in Perspective
Lexical and Syntax Analysis
Chapter 3 Lexical Analysis.
Lexical Analysis CSE 340 – Principles of Programming Languages
CSc 453 Lexical Analysis (Scanning)
Lexical and Syntax Analysis
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Review: Compiler Phases:
Compilers B V Sai Aravind (11CS10008).
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Specification of tokens using regular expressions
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Compiler Construction
Introduction Reading: Sections 1.5 – 1.7.
COMPILERS LECTURE(6-Aug-13)
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol table manager Error handler Front End Backend

Chapter 3: Lexical Analysis Lexical analyzer: reads input characters and produces a sequence of tokens as output (nexttoken()). –Trying to understand each element in a program. –Token: a group of characters having a collective meaning. const pi = ; Token 1: (const, -) Token 2: (identifier, ‘pi’) Token 3: (=, -) Token 4: (realnumber, ) Token 5: (;, -)

Interaction of Lexical analyzer with parser Lexical analyzer symbol table parser Source program token Nexttoken()

Some terminology: –Token: a group of characters having a collective meaning. A lexeme is a particular instant of a token. E.g. token: identifier, lexeme: pi, etc. –pattern: the rule describing how a token can be formed. E.g: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])* Lexical analyzer does not have to be an individual phase. But having a separate phase simplifies the design and improves the efficiency and portability.

Two issues in lexical analysis. –How to specify tokens (patterns)? –How to recognize the tokens giving a token specification (how to implement the nexttoken() routine)? How to specify tokens: –all the basic elements in a language must be tokens so that they can be recognized. Token types: constant, identifier, reserved word, operator and misc. symbol. –Tokens are specified by regular expressions. main() { int i, j; for (I=0; I<50; I++) { printf(“I = %d”, I); }

Some definitions –alphabet : a finite set of symbols. E.g. {a, b, c} –A string over an alphabet is a finite sequence of symbols drawn from that alphabet (sometimes a string is also called a sentence or a word). –A language is a set of strings over an alphabet. –Operation on languages (a set): union of L and M, L U M = {s|s is in L or s is in M} concatenation of L and M LM = {st | s is in L and t is in M} Kleene closure of L, Positive closure of L, –Example: L={aa, bb, cc}, M = {abc}

Formal definition of Regular expression:f Given an alphabet, (1) is a regular expression that denote { }, the set that contains the empty string. (2) For each, a is a regular expression denote {a}, the set containing the string a. (3) r and s are regular expressions denoting the language (set) L(r ) and L(s ). Then –( r ) | ( s ) is a regular expression denoting L( r ) U L( s ) –( r ) ( s ) is a regular expression denoting L( r ) L ( s ) –( r )* is a regular expression denoting (L ( r )) * Regular expression is defined together with the language it denotes.

Examples: –let a | b (a | b) a * (a | b)* a | a*b –We assume that ‘*’ has the highest precedence and is left associative. Concatenation has second highest precedence and is left associative and ‘|’ has the lowest precedence and is left associative (a) | ((b)*(c ) ) = a | b*c

Regular definition. –gives names to regular expressions to construct more complicate regular expressions. d1 -> r1 d2 ->r2 … dn ->rn –example: letter -> A | B | C | … | Z | a | b | …. | z digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 identifier -> letter (letter | digit) * more examples: integer constant, string constants, reserved words, operator, real constant.