Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Slides:



Advertisements
Similar presentations
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
CS-338 Compiler Design Dr. Syed Noman Hasany Assistant Professor College of Computer, Qassim University.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
2. Lexical Analysis Prof. O. Nierstrasz
Topics Automata Theory Grammars and Languages Complexities
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
CPSC 388 – Compiler Design and Construction
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output
Lexical Analysis Natawut Nupairoj, Ph.D.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
CS308 Compiler Principles Lexical Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
CH3.1 CS 345 Dr. Mohamed Ramadan Saady Algebraic Properties of Regular Expressions AXIOMDESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
1 Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
1.  It is the first phase of compiler.  In computer science, lexical analysis is the process of converting a sequence of characters into a sequence.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Lexical Analysis.
1st Phase Lexical Analysis
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Lexical Analyzer in Perspective
Finite automate.
Chapter 3 Lexical Analysis.
Two issues in lexical analysis
Recognizer for a Language
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Lexical Analysis and Lexical Analyzer Generators
Review: Compiler Phases:
Recognition of Tokens.
Finite Automata.
Chapter 3. Lexical Analysis (2)
Compiler Construction
Presentation transcript:

Lecture # 3 Chapter #3: Lexical Analysis

Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis Reasons to make it a separate phase are: – Simplifies the design of the compiler – Provides efficient implementation – Improves portability

3 Interaction of the Lexical Analyzer with the Parser (Fig 3.1) Lexical Analyzer Parser Source Program Token, tokenval Symbol Table Get next token error

4 Tokens, Patterns, and Lexemes A token is a classification of lexical units – For example: id and num Lexemes are the specific character strings that make up a token – For example: abc and 123 Patterns are rules describing the set of lexemes belonging to a token – For example: “letter followed by letters and digits” and “non-empty sequence of digits”

Diff b/w Token, Lexeme and Pattern TokenLexemePattern if relation,>,>= or > or >= idy, xLetter followed by letters and digits num31, 28Any numeric constant operator+, *, -,/Any arithmetic operator + or * or – or /

6 Attributes of Tokens Lexical analyzer y := *x Parser token tokenval (token attribute)

Section # 3.3: Specification of Tokens Alphabet: Finite, nonempty set of symbols Example: Strings: Finite sequence of symbols from an alphabet e.g Empty String: The string with zero occurrences of symbols from alphabet. The empty string is denoted by

Continue… Length of String: Number of positions for symbols in the string. |w| denotes the length of string w Example |0110| = 4; | | = 0 Powers of an Alphabet: = = the set of strings of length k with symbols from Example:

Continue.. The set of all strings over is denoted

Continue.. Language: is a specific set of strings over some fixed alphabet  Example: The set of legal English words The set of strings consisting of n 0's followed by n 1’s L P = the set of binary numbers whose value is prime

Concatenation and Exponentiation The concatenation of two strings x and y is denoted by xy The exponentiation of a string s is defined by s 0 =  s i = s i-1 s for i > 0 note that s  =  s = s

Language Operations Union L  M = {s  s  L or s  M} Concatenation LM = {xy  x  L and y  M} Exponentiation L 0 = {  }; L i = L i-1 L Kleene closure L * =  i=0,…,  L i Positive closure L + =  i=1,…,  L i

13 Regular Expressions Basis symbols: –  is a regular expression denoting language {  } – a   is a regular expression denoting {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then – r  s is a regular expression denoting L(r)  M(s) – rs is a regular expression denoting L(r)M(s) – r * is a regular expression denoting L(r) * – (r) is a regular expression denoting L(r) A language defined by a regular expression is called a Regular set or a Regular Language

14 Regular Definitions Regular definitions introduce a naming convention: d 1  r 1 d 2  r 2 … d n  r n where each r i is a regular expression over   {d 1, d 2, …, d i-1 }

Example: letter  A  B  …  Z  a  b  …  z digit  0  1  …  9 id  letter ( letter  digit ) * The following shorthands are often used: r + = rr * r? = r  [ a - z ] = a  b  c  …  z Examples: digit  [ ] num  digit + (. digit + )? ( E (+  -)? digit + )?

16 Regular Definitions and Grammars stmt  if expr then stmt  if expr then stmt else stmt   expr  term relop term  term term  id  num if  if then  then else  else relop   >  >=  = id  letter ( letter | digit ) * num  digit + (. digit + )? ( E (+  -)? digit + )? Grammar Regular definitions

17 Coding Regular Definitions in Transition Diagrams return(relop, LE ) return(relop, NE ) return(relop, LT ) return(relop, EQ ) return(relop, GE ) return(relop, GT ) start < = > = > = other * * 9 startletter *other letter or digit return(gettoken(), install_id()) relop   >  >=  = id  letter ( letter  digit ) *

Section 3.6 : Finite Automata Finite Automata are used as a model for: – Software for designing digital circuits – Lexical analyzer of a compiler – Searching for keywords in a file or on the web. – Software for verifying finite state systems, such as communication protocols.

19 Design of a Lexical Analyzer Generator Translate regular expressions to NFA Translate NFA to an efficient DFA regular expressions NFADFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens Optional

20 Nondeterministic Finite Automata An NFA is a 5-tuple (S, , , s 0, F) where S is a finite set of states  is a finite set of symbols, the alphabet  is a mapping from S   to a set of states s 0  S is the start state F  S is the set of accepting (or final) states

21 Transition Graph An NFA can be diagrammatically represented by a labeled directed graph called a transition graph 0 start a bb a b S = {0,1,2,3}  = { a, b } s 0 = 0 F = {3}

22 Transition Table The mapping  of an NFA can be represented in a transition table State Input a Input b 0{0, 1}{0} 1{2} 2{3}  (0, a ) = {0,1}  (0, b ) = {0}  (1, b ) = {2}  (2, b ) = {3}

23 The Language Defined by an NFA An NFA accepts an input string x if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some accepting state in the transition graph A state transition from one state to another on the path is called a move The language defined by an NFA is the set of input strings it accepts, such as ( a  b )* abb for the example NFA

24 N(r2)N(r2)N(r1)N(r1) From Regular Expression to  - NFA (Thompson’s Construction) f i  f a i f i N(r1)N(r1) N(r2)N(r2) start     f i N(r)N(r) f i    a r1r2r1r2 r1r2r1r2 r*r* 

25 Combining the NFAs of a Set of Regular Expressions 2 a 1 start 6 a 3 45 bb 8 b 7 a b a { action 1 } abb { action 2 } a * b +{ action 3 } 2 a 1 6 a 3 45 bb 8 b 7 a b 0 start   

26 Deterministic Finite Automata A deterministic finite automaton is a special case of an NFA – No state has an  -transition – For each state s and input symbol a there is at most one edge labeled a leaving s Each entry in the transition table is a single state – At most one path exists to accept a string – Simulation algorithm is simple

27 Example DFA 0 start a bb b b a a a A DFA that accepts ( a  b )* abb

28 Conversion of an NFA into a DFA The subset construction algorithm converts an NFA into a DFA using:  -closure(s) = {s}  {t  s   …   t}  -closure(T) =  s  T  -closure(s) move(T,a) = {t  s  a t and s  T} The algorithm produces: Dstates is the set of states of the new DFA consisting of sets of states of the NFA Dtran is the transition table of the new DFA

29  -closure and move Examples 2 a 1 6 a 3 45 bb 8 b 7 a b 0 start     -closure({0}) = {0,1,3,7} move({0,1,3,7}, a ) = {2,4,7}  -closure({2,4,7}) = {2,4,7} move({2,4,7}, a ) = {7}  -closure({7}) = {7} move({7}, b ) = {8}  -closure({8}) = {8} move({8}, a ) =  abaa none Also used to simulate NFAs

30 Subset Construction Example 1 0 start a b b a b         A start B C D E b b b b b a a a a Dstates A = {0,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,10} a

31 Subset Construction Example 2 2 a 1 6 a 3 45 bb 8 b 7 a b 0 start    a1a1 a2a2 a3a3 Dstates A = {0,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8} A start a D b b b a b b B C EF a b a1a1 a3a3 a3a3 a 2 a 3