Lexical Analysis Mooly Sagiv Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern.

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
CSc 453 Lexical Analysis (Scanning)
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Course Overview Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Automata and Regular Expression Discrete Mathematics and Its Applications Baojian Hua
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Compilation Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Lexical analysis.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Compilation (Semester A, 2013/14) Lecture 2: Lexical Analysis Modern Compiler Design: Chapter 2.1 Noam Rinetzky 1.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
Lexical Analysis: Regular Expressions CS 671 January 22, 2008.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Introduction to Lex Ying-Hung Jiang
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CSc 453 Lexical Analysis (Scanning)
Introduction CPSC 388 Ellen Walker Hiram College.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
The Role of Lexical Analyzer
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
1st Phase Lexical Analysis
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Lexical Analysis: Regular Expressions CS 471 September 3, 2007.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Department of Software & Media Technology
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Textbook:Modern Compiler Design
Lecture 2: Lexical Analysis Noam Rinetzky
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Finite-State Machines (FSMs)
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compiler Structures 2. Lexical Analysis Objectives
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern Compiler Implementation in C Chapter 2

A motivating example Create a program that counts the number of lines in a given input file

A motivating example solution int num_lines = 0; % \n ++num_lines;. ; % main() { yylex(); printf( "# of lines = %d\n", num_lines); }

Subjects Roles of lexical analysis The straightforward solution a manual scanner for C Regular Expressions Finite automata From regular languages into finite automata Flex

Basic Compiler Phases Source program (string) Fin. Assembly lexical analysis syntax analysis semantic analysis Translate Instruction selection Register Allocation Tokens Abstract syntax tree Intermediate representation Assembly Finite automata Pushdown automata Memory organization graph algorithms Dynamic programming

Example a\b := ;\nb := (print(a, a-1), 10 * a) ;\nprint(b) Input string Tokens id (“a”) assign num (5) + num(3) ; id(“b”) assign print(id(“a”), id(“a”) - num(1)), num(10) * id(“a”)) ; print(id(“b”))

Functionality –input program text (file) –output sequence of tokens –Read input file –Identify language keywords and standard identifiers –Handle include files and macros –Count line numbers –Remove whitespaces –Report illegal symbols –Produce symbol table Lexical Analysis (Scanning)

A simplified scanner for C Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `:goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: putchar(c); return Plus; } case `<`: case `w`: }

Automatic Generation of Lexical Analysis The matching of input strings can be performed by a finite automaton Examples: –An automaton for while –An automaton for C identifier –An automaton for C comment The program for the automaton is automatically generated from regular expressions

Flex Input – regular expressions and actions (C code) Output – A scanner program that reads the input and applies actions when input regular expression is matched flex regular expressions input program tokens scanner

Regular Expression Notations aAn ordinary character stands for itself M|NM or N MNM followed by N M*Zero or more times of M M+One or more times of M M?Zero or one occurrence of M [a-zA-Z]Character set alternation (single character).Any (single) character but newline “a.+”Quotation \Convert an operator into text

Ambiguity Resolving Find the longest matching token Between two tokens with the same length use the one declared first

A Flex specification of C Scanner Letter [a-zA-Z_] Digit [0-9] % [ \t]{;} [\n]{line_count++;} “;”{ return SemiColumn;} “++”{ return PlusPlus ;} “+=“{ return PlusEqual ;} “+”{ return Plus} “while”{ return While ; } {Letter}({Letter}|{Digit})*{ return Id ;} “<=”{ return LessOrEqual;} “<”{ return LessThen ;}

Running Example if{ return IF; } [a-z][a-z0-9]*{ return ID; } [0-9]+ { return NUM; } [0-9]”.”[0-9]*|[0-9]*”.”[0-9]+{ return REAL; } (\-\-[a-z]*\n)|(“ “|\n|\t){ ; }.{ error(); }

int edges[][256] ={ /* …, 0, 1, 2, 3,..., -, e, f, g, h, i, j,... */ /* state 0 */ {0,..., 0, 0, …, 0, 0, 0, 0, 0,..., 0, 0, 0, 0, 0, 0} /* state 1 */ {13,..., 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4,..., 13, 13} /* state 2 */ {0, …, 4, 4, 4, 4,..., 0, 4, 3, 4, 4, 4, 4,..., 0, 0} /* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4,, 0, 0} /* state 4 */{0, …, 4, 4, 4, 4,..., 0, 4, 4, 4, 4, 4, 4,..., 0, 0} /* state 5 */{0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0} /* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0,..., 0, 0} /* state 7 */... /* state 13 */{0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}

Pseudo Code for Scanner Token nextToken() { lastFinal = 0; currentState = 1 ; inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) { nextState = edges[currentState][currentPosition]; if (isFinal(nextState)) { lastFinal = nextState ; inputPositionAtLastFinal = currentPosition; } currentState = nextState; advance currentPosition; } input = inputPositionAtLastFinal ; return action[lastFinal]; }

Example Input: “if --not-a-com”

Efficient Scanners Efficient state representation Input buffering Using switch and goto instead of tables

Constructing Automaton from Specification Create a non-deterministic automaton (NDFA) from every regular expression Merge all the automata using epsilon moves (like the | construction) Construct a deterministic finite automaton (DFA) Minimize the automaton starting with separate accepting states

NDFA Construction if{ return IF; } [a-z][a-z0-9]*{ return ID; } [0-9]+ { return NUM; } [0-9]”.”[0-9]*|[0-9]*”.”[0-9]+{ return REAL; } (\-\-[a-z]*\n)|(“ “|\n|\t){ ; }.{ error(); }

DFA Construction

Minimization

%{ /* C declarations */ #include “tokens.h'' /* Mapping of tokens into integers */ #include “errormsg.h'' /* Shared by all the phases */ union {int ival; string sval; double fval;} yylval; int charPos=1 ; #define ADJ (EM_tokPos=charPos, charPos+=yyleng) %} /* Lex Definitions */ digits [0-9]+ % if { ADJ; return IF;} [a-z][a-z0-9] { ADJ; yylval.sval=String(yytext); return ID; } {digits} {ADJ; yylval.ival=atoi(yytext); return NUM; } ({digits}\.{digits}?)|({digits}?\.{digits}) { ADJ; yylval.fval=atof(yytext); return REAL; } (\-\-[a-z]*\n)|([\n\t]|" ")* { ADJ; }. { ADJ; EM_error(“illegal character''); }

Start States Regular expressions may be more complicated than automata –C comments Solutions –Conversion of automata into regular expressions –Start States % start s1 s2 % r1 { action0 ; BEGIN s_1; } r1 { action1 ; BEGIN s2; } r2 { action2 ; BEGIN INITIAL};

Realistic Example % start Comment % ”/*'' { BEGIN Comment; } r1 { Usual actions; } r2 { Usual actions; }... rk { Usual actions; } ”*/”’ { BEGIN Initial; }.|\n ;

Summary For most programming languages lexical analyzers can be easily constructed Exceptions: –Fortran –PL/1 Flex is a useful tool beyond compilers