Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
0 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed.
Lexical Analysis — Introduction Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Compiler Construction
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CSc 453 Lexical Analysis (Scanning)
Introduction to Compiling
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CMPSC 160 Translation of Programming Languages Fall 2002 Instructor: Hugh McGuire Lecture 2 Phases of a Compiler Lexical Analysis.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Introduction to Parsing
Department of Software & Media Technology
1. Course Goals To provide students with an understanding of the major phases of a compiler. To introduce students to the theory behind the various phases,
CS 3304 Comparative Languages
Lexical and Syntax Analysis
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
CSc 453 Lexical Analysis (Scanning)
Compiler Lecture 1 CS510.
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
Lecture 3: Introduction to Lexical Analysis
Lexical Analysis - An Introduction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Introduction to Parsing
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Parsing
CS416 Compiler Design lec00-outline February 23, 2019
Lexical Analysis - An Introduction
Compiler Construction
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler The front end is not monolithic Source code Front End Errors Machine code Back End IR

The Front End Scanner Maps stream of characters into words  Basic unit of syntax  x = x + y ; becomes Characters that form a word are its lexeme Its part of speech (or syntactic category ) is called its token type Scanner discards white space & (often) comments Source code Scanner IR Parser Errors tokens Speed is an issue in scanning  use a specialized recognizer

The Front End Parser Checks stream of classified words (parts of speech) for grammatical correctness Determines if code is syntactically well-formed Guides checking at deeper levels than syntax Builds an IR representation of the code We’ll return to parsing later Source code Scanner IR Parser Errors tokens

The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term | term 3. term  number | id 4. op  + | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = ``Rules’’ 1-4

The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term | term 3. term  number | id 4. op  + | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = Rules 1-4 No words here! Parts of speech, not words!

The Big Picture Why study lexical analysis? We want to avoid writing scanners by hand We want to harness material from the theory of computation class Goals:  To simplify specification & implementation of scanners  To understand the underlying techniques and technologies Scanner Generator specifications source codeparts of speech & words tables or code Specifications written as “regular expressions” Represent words as indices into a global table

Regular Expressions Lexical patterns form a regular language *** any finite language is regular *** Regular expressions ( RE s) describe regular languages Regular Expression (over alphabet  )  is a RE denoting the set {  } If a is in , then a is a RE denoting {a} If x and y are RE s denoting L(x) and L(y) then  x |y is an RE denoting L(x)  L(y)  xy is an RE denoting L(x)L(y)  x * is an RE denoting L(x)* Precedence is closure, then concatenation, then alternation Ever type “rm *.o a.out” ?

These definitions should be well known Set Operations ( review )

Examples of Regular Expressions Identifiers: Letter  (a|b|c| … |z|A|B|C| … |Z) Digit  (0|1|2| … |9) Identifier  Letter ( Letter | Digit ) * Numbers: Integer  (+|-|  ) (0| (1|2|3| … |9)(Digit * ) ) Decimal  Integer. Digit * Real  ( Integer | Decimal ) E (+|-|  ) Digit * Complex  ( Real, Real ) Numbers can get much more complicated!

Regular Expressions ( the point ) Regular expressions can be used to specify the words to be translated to parts of speech by a lexical analyzer Using results from automata theory and theory of algorithms, we can automatically build recognizers from regular expressions  We study RE s and associated theory to automate scanner construction !

Consider the problem of recognizing register names Register  r (0|1|2| … | 9) (0|1|2| … | 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA ) Transitions on other inputs go to an error state, s e Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

DFA operation Start in state S 0 & take transitions on each input character DFA accepts a word x iff x leaves it in a final state (S 2 ) So, r17 takes it through s 0, s 1, s 2 and accepts r takes it through s 0, s 1 and fails a takes it straight to s e Example ( continued ) S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

Example ( continued ) To be useful, recognizer must turn into code  r 0,1,2,3,4, 5,6,7,8,9 All others s0s0 s1s1 sese sese s1s1 sese s2s2 sese s2s2 sese s2s2 sese sese sese sese sese Char  next character State  s 0 while (Char  EOF ) State   (State,Char) Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE

Example ( continued ) To be useful, recognizer must turn into code  r 0,1,2,3,4, 5,6,7,8,9 All others s0s0 s 1 start s e error s e error s1s1 s e error s 2 add s e error s2s2 s e error s 2 add s e error sese s e error s e error s e error Char  next character State  s 0 while (Char  EOF ) State   (State,Char) perform specified action Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE

r Digit Digit * allows arbitrary numbers Accepts r00000 Accepts r99999 What if we want to limit it to r0 through r31 ? Write a tighter regular expression  Register  r ( (0|1|2) (Digit |  ) | (4|5|6|7|8|9) | (3|30|31) )  Register  r0|r1|r2| … |r31|r00|r01|r02| … |r09 Produces a more complex DFA Has more states Same cost per transition Same basic implementation What if we need a tighter specification?

Tighter register specification (continued) The DFA for Register  r ( (0|1|2) (Digit |  ) | (4|5|6|7|8|9) | (3|30|31) ) Accepts a more constrained set of registers Same set of actions, more states S0S0 S5S5 S1S1 r S4S4 S3S3 S6S6 S2S2 0,1,20,1,2 3 0,10,1 4,5,6,7,8,94,5,6,7,8,9 (0|1|2| … 9)

Tighter register specification (continued)  r0, All others s0s0 s1s1 sese sese sese sese sese s1s1 sese s2s2 s2s2 s5s5 s4s4 sese s2s2 sese s3s3 s3s3 s3s3 s3s3 sese s3s3 sese sese sese sese sese sese s4s4 sese sese sese sese sese sese s5s5 sese s6s6 sese sese sese sese s6s6 sese sese sese sese sese sese sese sese sese sese sese sese sese Table encoding RE for the tighter register specification Runs in the same skeleton recognizer