Recognition of Tokens.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
Chapter 3 Lexical Analysis Yu-Chen Kuo.
Chapter 3 Lexical Analysis. Definitions The lexical analyzer produces a certain token wherever the input contains a string of characters in a certain.
CS-338 Compiler Design Dr. Syed Noman Hasany Assistant Professor College of Computer, Qassim University.
1 IMPLEMENTATION OF FINITE AUTOMAT IN CODE There are several ways to translate either a DFA or an NFA into code. Consider, again the example of a DFA that.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CPSC 388 – Compiler Design and Construction
 We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis Natawut Nupairoj, Ph.D.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
CH3.1 CS 345 Dr. Mohamed Ramadan Saady Algebraic Properties of Regular Expressions AXIOMDESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analyzer in Perspective
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
1 Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Using Scanner Generator Lex By J. H. Wang May 10, 2011.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Lexical Analysis.
1st Phase Lexical Analysis
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer in Perspective
Finite automate.
CS510 Compiler Lecture 2.
Chapter 3 Lexical Analysis.
Lecture 5 Transition Diagrams
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Two issues in lexical analysis
Regular Definition and Transition Diagrams
Recognizer for a Language
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Lexical Analysis and Lexical Analyzer Generators
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
Chapter 3. Lexical Analysis (2)
Other Issues - § 3.9 – Not Discussed
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
NFAs and Transition Graphs
Presentation transcript:

Recognition of Tokens

Recognition of Tokens A grammar for branching statements stmt  if expr then stmt | if expr then stmt else stmt |  expr  term relop term | term term  id | number

Example Patterns for tokens in the grammar digit  [0-9] digits  digit+ number  digits (. digits)? (E [+|-]? digits )? id  letter (letter |digit)* if  if then  then else  else relop  < | > | <= | >= | = | < > ws  (blank | tab | newline)+

Tokens, their patterns, and attribute values Lexemes Token name Attribute value Any ws if then else Any id Any number < <= = <> > >= - id number relop Pointer to table entry LT LE EQ NE GT GE

Example C=a+b*5 <id, pointer to symbol table entry> <relop, EQ> <assign_op, -> <multi_op, -> <num, pointer to symbol table entry>

Transition Diagrams Nodes: states, conditions that could occur during the process of scanning the input looking for a lexeme that matches one of several patterns Edges: directed from state to state Labeled by a symbol or set of symbols Deterministic: there’s never more than one edge out of a given state with a given symbol among its labels Certain states are accepting or final: a lexeme has been found Double circle If it’s necessary to retract the forward pointer, we shall additionally place a * near that accepting state Start state, or initial state, is indicated by an edge, labeled “start”, entering from nowhere start < = 1 2 return(relop, LE) > 3 return(relop, NE) other * 4 return(relop, LT)

Example Transition Diagram for relop start < = 1 2 return(relop, LE) > 3 = return(relop, NE) other * 4 return(relop, LT) 5 > return(relop, EQ) = 6 7 return(relop, GE) other * 8 return(relop, GT)

Recognition of Reserved Words and Identifiers Problem: keywords look like identifiers Solution: Install the reserved words in the symbol table initially Create separate transition diagrams for each keyword

Examples for Identifiers and Keywords * start letter other 9 10 11 return(getToken(), installID()) letter or digit *

Completion of the Running Example – Unsigned Numbers 3.14 314

Transition Diagram for Whitespace delim * start delim other 22 23 24 delim -> blank | tab | newline

Transition Diagram 1 2 3 4 5 7 8 6 start < = > * other 1 2 3 4 5 7 8 6 start < = > * other return(relop, EQ) 9 10 11 start letter other 12 13 20 start digit other * 14 15 21 . 17 18 19 16 + or - E 22 23 24 start delim other * Transition Diagram

C code to find next start state

C Code for Lexical analyzers

Finite Automata Finite automata are recognizers Two kinds: They simply say “yes” or “no” about each input string Two kinds: Nondeterministic finite automata (NFA) No restrictions on the labels of the edges Deterministic finite automata (DFA) For each state, and for each symbol, there’s exactly one edge with that symbol leaving that state

Nondeterministic Finite Automata NFA consists of A finite set of states S A set of input symbol , the input alphabet A transition function that gives, for each state, and for each symbol in ∪{} a set of states A state s0 from S (the start state or initial state) A set of states F, a subset of S (the accepting states, or final states)

NFA can be represented by a transition graph There’s an edge labeled a from state s to state t iff t is one of the next states for state s and input a It’s similar to a transition diagram except: The same symbol can label edges from one state to several different states An edge may be labeled by , in addition to symbols from the input alphabet

An Example NFA: (a|b)*abb Transition Tables Transition Graph a State a b  {0, 1} {0}  1 {2} 2 {3} 3 1 3 a 2 b start

Example NFA: aa*|bb* a a 1 3  start b 2 4  b

Deterministic Finite Automata DFA is a special case of an NFA where: There are no moves on input  For each state s and input symbol a, there’s exactly one edge out of s labeled s Every regular expression and every NFA can be converted to a DFA accepting the same language

Example DFA accepting (a|b)*abb start a b b 1 2 3 a a a

Construction of an NFA from a Regular Expression (Thomson’s algorithm) Basis: For expression , construct the NFA For subexpression a in , construct the NFA start  i f start a i f

NFA for the concatenation of two regular expressions N(s).N(t) start N(s) N(t) i f abb a b b start 1 2 3

NFA for the union of two regular expressions r=N(s)|N(t)   start i f   N(t) a a|b 1 2   start 5 b   3 4 

NFA for the closure of a regular expression N(s)*  start  N(s)  i f   (a|b)* a 2 3   start   1 6 7 b   4 5 

NFA for (a|b)*abb#         a 2 3 start a b b # 1 6 7 8 9 10 11 1 6 7 8 9 10 11 b   4 5 