Review: Compiler Phases:

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Chapter 5: Languages and Grammar 1 Compiler Designs and Constructions ( Page ) Chapter 5: Languages and Grammar Objectives: Definition of Languages.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
Lexical Analysis (I) Compiler Baojian Hua
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
Lexical and Syntax Analysis
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Lexical Analyzer in Perspective
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CSc 453 Lexical Analysis (Scanning)
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
Compilers Computer Symbol Table Output Scanner (lexical analysis)
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter2 : Lexical Analysis
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
CS 3304 Comparative Languages
Lexical Analyzer in Perspective
CS314 – Section 5 Recitation 2
Lexical and Syntax Analysis
Lexical Analysis.
Chapter 3 Lexical Analysis.
Lexical Analysis CSE 340 – Principles of Programming Languages
CSc 453 Lexical Analysis (Scanning)
PROGRAMMING LANGUAGES
Compiler Construction
Lexical and Syntax Analysis
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Compilers B V Sai Aravind (11CS10008).
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Specification of tokens using regular expressions
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Lexical and Syntax Analysis
Compiler Construction
Appendix B.1 Lex Appendix B.1 -- Lex.
COMPILERS LECTURE(6-Aug-13)
Compiler Construction
CSc 453 Lexical Analysis (Scanning)
Lex Appendix B.1 -- Lex.
Presentation transcript:

Review: Compiler Phases: Source program Front End Lexical analyzer Syntax analyzer Symbol table manager Error handler Semantic analyzer Intermediate code generator Code optimizer Backend Code generator

Chapter 3: Lexical Analysis Lexical analyzer: reads input characters and produces a sequence of tokens as output (nexttoken()). Token: a group of characters having a collective meaning. A lexeme is a particular instant of a token. E.g. token: identifier, lexeme: position, init, etc. pattern: the rule describing how a token can be formed. E.g: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])* Lexical analyzer does not have to be an individual phase. But having a separate phase simplifies the design and improves the efficiency and portability.

Two issues in lexical analysis. How to specify tokens (patterns)? How to recognize the tokens giving a token specification (how to implement the nexttoken() routine)? How to specify tokens: all the basic elements in a language must be tokens so that they can be recognized. Token types: constant, identifier, reserved word, operator and misc. symbol. Tokens are specified by regular expressions. Main() { int I, j; for (I=0; I<50; I++) { printf(“I = %d”, I); }

Some definitions Example: alphabet : a finite set of symbols. E.g. {a, b, c} A string over an alphabet is a finite sequence of symbols drawn from that alphabet (sometimes a string is also called a sentence or a word). A language is a set of strings over a alphabet. Operation on languages (a set): union of L and M, L U M = {s|s is in L or s is in M} concatenation of L and M LM = {st | s is in L and t is in M} Kleene closure of L, L* = L^0 + L^1 + L^2+…. L^0 = {e}, L^1 = L, L^2 = L L, L^3 = L^2 L Positive closure of L, L+ = L^1 + L^2 + …. Example: L={aa, bb, cc}, M = {abc}

Regular expression: e.g. ls a*.c, what is a*.c? Formal definition: Given an alphabet , (1) is a regular expression that denote { }, the set that contains the empty string. (2) For each , a is a regular expression denote {a}, the set containing the string a. (3) r and s are regular expressions denoting the language (set) L(r ) and L(s ). Then ( r ) | ( s ) is a regular expression denoting L( r ) U L( s ) ( r ) ( s ) is a regular expression denoting L( r ) L ( s ) ( r )* is a regular expression denoting (L ( r )) *

Examples: let a | b (a | b) (a | b) a * (a | b)* a | a*b We assume that ‘*’ has the highest precedence and is left associative. Concatenation has second highest precedence and is left associative and ‘|’ has the lowest precedence and is left associative (a) | ((b)*(c ) ) = a | b*c

Regular definition. gives names to regular expressions to construct more complicate regular expressions. d1 -> r1 d2 ->r2 … dn ->rn example: letter -> A | B | C | … | Z | a | b | …. | z digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 identifier -> letter (letter | digit) * more examples: integer constant, string constants, reserved words, operator, real constant.

Lex -- a Lexical Analyzer Generator Lex source program {definition} %% {rules} {user subroutines} Rules: <regular expression> action Each regular expression specifies a token. Action: C source fragments specifies what to do when a token is recognized.

Lex -- a Lexical Analyzer Generator lex automatically generates a nexttoken routine (yylex()) that recognizes all tokens specified in rules and perform the action (typically returns a token_code) when a token is recognized. For some tokens, only the token_code is not sufficient. Additional information is passed to the parser through global variables. See the example.