Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
0 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed.
Lexical Analysis — Introduction Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Compiler Construction
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
CSC 338: Compiler design and implementation
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
CSc 453 Lexical Analysis (Scanning)
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
2/1/20161 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CMPSC 160 Translation of Programming Languages Fall 2002 Instructor: Hugh McGuire Lecture 2 Phases of a Compiler Lexical Analysis.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Introduction to Parsing
Department of Software & Media Technology
1. Course Goals To provide students with an understanding of the major phases of a compiler. To introduce students to the theory behind the various phases,
CS 3304 Comparative Languages
Lexical and Syntax Analysis
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Two issues in lexical analysis
Recognizer for a Language
Lecture 3: Introduction to Lexical Analysis
Lexical Analysis - An Introduction
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
CS 3304 Comparative Languages
Designing a Predictive Parser
CS 3304 Comparative Languages
Lexical Analysis - An Introduction
Compiler Construction
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis - An Introduction

The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler Source code Front End Errors Machine code Back End IR

The Front End Scanner Maps stream of characters into words Basic unit of syntax x = x + y ; becomes Characters that form a word are its lexeme Its part of speech (or syntactic category) is called its token type Scanner discards white space & (often) comments Source code Scanner IR Parser Errors tokens Speed is an issue in scanning  use a specialized recognizer

The Front End Parser Checks stream of classified words (parts of speech) for grammatical correctness Determines if code is syntactically well- formed Guides checking at deeper levels than syntax Builds an IR representation of the code Source code Scanner IR Parser Errors tokens

The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term 3. | term 4. term  number 5. | id 6. op  + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7}

The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term 3. | term 4. term  number 5. | id 6. op  + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7} No words here! Parts of speech, not words!

The Big Picture Scanner Generator specifications source codeparts of speech & words tables or code Specifications written as “regular expressions” Represent words as indices into a global table

The Big Picture Why study lexical analysis? Goals: To simplify specification & implementation of scanners To understand the underlying techniques and technologies

How to implement a scanner Regular Expressions NFADFA

Regular Expressions Lexical patterns form a regular language *** any finite language is regular *** Regular expressions ( RE s) describe regular languages Ever type “rm *.o a.out” ?

Regular Expressions Regular Expression (over alphabet  )  is a RE denoting the set {  } If a is in , then a is a RE denoting {a} If x and y are RE s denoting L(x) and L(y) then x |y is an RE denoting L(x)  L(y) xy is an RE denoting L(x)L(y) x * is an RE denoting L(x)*

Set Operations ( review )

Examples of Regular Expressions Identifiers: Letter  (a|b|c| … |z|A|B|C| … |Z) Digit  (0|1|2| … |9) Identifier  Letter ( Letter | Digit ) * Numbers: Integer  (+|-|  ) (0| (1|2|3| … |9)(Digit * ) ) Decimal  Integer. Digit * Real  ( Integer | Decimal ) E (+|-|  ) Digit * Complex  ( Real, Real )

Regular Expressions ( the point ) Regular expressions can be used to specify the words to be translated to parts of speech by a lexical analyzer Using results from automata theory and theory of algorithms, we can automatically build recognizers from regular expressions  We study REs and associated theory to automate scanner construction !

Consider the problem of recognizing ILOC register names Register  r (0|1|2| … | 9) (0|1|2| … | 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA) Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

DFA operation Start in state S 0 & take transitions on each input character DFA accepts a word x iff x leaves it in a final state (S 2 ) So, r17 takes it through s 0, s 1, s 2 and accepts r takes it through s 0, s 1 and fails Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

Example ( continued ) To be useful, recognizer must turn into code Char  next character State  s 0 while (Char  EOF ) State   (State,Char) Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE  r 0,1,2,3, 4,5,6,7, 8,9 All other s s0s0 s1s1 sese sese s1s1 sese s2s2 sese s2s2 sese s2s2 sese sese sese sese sese

Example ( continued )  r 0,1,2,3,4,5,6, 7,8,9 All other s s0s0 s 1 start s e error s e error s1s1 s e erro r s 2 add s e error s2s2 s e erro r s 2 add s e error sese s e erro r s e error s e error Char  next character State  s 0 while (Char  EOF ) State   (State,Char) perform specified action Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE

Algorithm Project 1 Open a file to read from Open a file to write to Open a file to read from Open a file to write to Create a scanner object Call a method from Scanner class to scan, classify each token and write to the output file

Algorithm scanner Read a line from input file c i = first character of this line Read a line from input file c i = first character of this line Recognize the meta character Current Token = meta character Print out token with new line Recognize the meta character Current Token = meta character Print out token with new line c i ==‘#’ || (c i ==‘/’ && c i+1 ==‘/’ false true C i ==‘”’ true Recognize the string (read until you reach another “) Current Token = string Print out the token Recognize the string (read until you reach another “) Current Token = string Print out the token C i is a digit Recognize the number Current Token = number Print out the token Recognize the number Current Token = number Print out the token C i is a letter Recognize the id token is not a keyword true false true Token is an ID, print with tag true false Token is a keyword Print the token Token is a keyword Print the token C i is symbol true Recognize the symbol Print Recognize the symbol Print false i+= len of token Print c i i++ Print c i i++ i+= len of token C i is symbol Not the end of file the end of line the end of file Read another line

Sample:test1.c #include void sample() { int b=4; printf("Helloworld %d",b); } int main() { sample(); }

Token list #include ---- meta void ---- keyword sample ---- id ( ---- symbol ) ---- symbol { ---- symbol int b ---- id = ---- symbol number ; ---- symbol printf ---- keyword ( ---- symbol "Helloworld %d" ---- string, ---- symbol b ---- id ) ---- symbol ; ---- symbol } ---- symbol int ---- keyword main ---- id ( ---- symbol ) ---- symbol { ---- symbol sample ---- symbol ( ---- symbol ) ---- symbol ; ---- symbol }---- symbol

Result #include void CS322sample() { int CS322b=4; printf("Helloworld %d",CS322b); } int CS322main() { CS322sample(); }