Converting NFAs to DFAs How Lex is constructed

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
Regular Expressions and DFAs COP 3402 (Summer 2014)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Converting NFAs to DFAs How a Syntax Analyser is constructed.
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Theory of Computation Automata Theory Dr. Ayman Srour.
Deterministic Finite Automata Nondeterministic Finite Automata.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Lecture #5 Advanced Computation Theory Finite Automata.
Theory of Computation Automata Theory Dr. Ayman Srour.
Department of Software & Media Technology
Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CIS Automata and Formal Languages – Pei Wang
Finite automate.
Lecture 2 Lexical Analysis
Lexical analysis Finite Automata
Copyright © Cengage Learning. All rights reserved.
Regular Expressions.
The time complexity for e-closure(T).
Deterministic Finite Automata
Two issues in lexical analysis
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Jaya Krishna, M.Tech, Assistant Professor
Deterministic Finite Automata
Non-Deterministic Finite Automata
Principles of Computing – UFCFA3-30-1
Transition Diagrams Lecture 3 Fri, Jan 21, 2005.
Finite Automata.
4b Lexical analysis Finite Automata
CSCI 2670 Introduction to Theory of Computing
CSE322 NDFA WITH NULL MOVES AND REGULAR EXPRESSION
4b Lexical analysis Finite Automata
Lecture 5 Scanning.
Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st
CHAPTER 1 Regular Languages
CSCI 2670 Introduction to Theory of Computing
Presentation transcript:

Converting NFAs to DFAs How Lex is constructed ICS312 set 30 Converting NFAs to DFAs How Lex is constructed

Converting a nfa to a dfa Defn: The e-closure of a state is the set of all states, including S itself, that you can get to via e-transitions. The e-closure of state S is denoted:

Converting a nfa to a dfa (Cont.) Example: The e-closure of state 1 = { 1, 2, 4 } The e-closure of state 3 = { 3, 2, 4 } Defn: The e-closure of a set of states S1, ... Sn is S1 È S2 È ... È Sn. Example: The e-closure for above states 1 and 3 is { 1, 2, 4 } È { 3, 2, 4 } = { 1, 2, 3, 4 }

To construct a dfa from a nfa Step 1: Let the start state of the dfa be formed from the e-closure of the start state of the nfa. Subsequent steps: If S is any state that you have previously constructed for the dfa and it is formed from say states t1, ... , tr of the nfa, then for any symbol x for which at least one of the states t1, ... , tr has a x-successor, the x-successor of S is the e-closure of the x-successors of t1, ... , tr. Any state of the dfa which is formed from an accepting state, among others, of the nfa becomes an accepting state.

To construct a dfa from a nfa (Cont.1) Example 1: To convert the following nfa: b 5 we get: This constructs a dfa that has no epsilon-transitions and a single accepting state.

To construct a dfa from a nfa (Cont.2) Example 2: To convert the nfa for an identifier to a dfa

To construct a dfa from a nfa (Cont.3) we get:

Minimizing the Number of States in a DFA Step 1: Start with two sets of states (a) all the accepting states, and (b) all the non-accepting states Subsequent steps: Given the sets of states S1, ... Sr consider each set S and each symbol x in turn. If any member of S has a x-successor and this x-successor is in say S', then unless all the members of S have x-successors that are in S', split up S into those members whose x-successors are in S' and the others (which don't have x-successors in S').

Minimizing the Number of States in a DFA(Cont.1) Example 1. Consider the dfa we constructed for an identifier (with renumbered states):

Minimizing the Number of States in a DFA(Cont.2) The sets of states for this dfa are: S1 S2 Nonaccepting states Accepting states 1 2 3 4 All states in S2 have the successors letter-successor and digit-successor, and the successor states are all in the set of states S2. Combine all the states of S2 to get:

Minimizing the Number of States in a DFA(Cont.3) Example 2. Consider the dfa: All of the states (1, 2, and 3) are accepting states and all their successors are also accepting states, but state 1 has an a-successor whereas states 2 and 3 do not.

Minimizing the Number of States in a DFA(Cont.4) So, we split the set of accepting states into two sets S1 and S2 where: S1 consists of state 1, and S2 consists of states 2, 3 to get:

HOW LEX WORKS Using the methods described above, Lex constructs a mimimized finite automata for each regular expression in the definition file. Lex generates a C program, which we will refer to as lex.yy.c The finite automatas are represented in lex.yy.c by a set of arrays.

For instance, a portion of a finite automata such as: . + 4 7 can be represented by entering. in the associated array, a 7 in the column for “+” at row 4.

lex.yy.c keeps track of the latest accepting state it has reached in any of the finite automatas, plus the number of source characters it has read at that point. When it reaches a stage that no transition exists for the next source symbol from any of the states it has reached in any of the finite automatas, it picks the regular expression corresponding to the finite automata in which this last accepting state occurs, and it pops off the remaining input only those characters read that lead to that state

. . Consider, for example, a Lex defn. file containing: {digit}+(”.” {digit}+)? {…return Number;} {digit}+(”.” {digit}+)?e{digit}+ {…return Float;} Finite automata corresponding to the above re’s are: . digit digit digit 1 1 2 3 4 dfa for Number digit digit . e digit digit digit e digit dfa for Float 1 1 2 3 4 4 4 5 6 digit digit digit digit

Example 1: let the remaining input be 36e8=X1… On reading the “3”, lex.yy.c records that the latest accepting state encountered is state 2 in the dfa for Number, and the no. of source characters read in getting to that state is 1. (It has also reached the non-accepting state 2 in the dfa for Float). On reading the “6”, lex.yy.c records the above again, i.e , except that here the no. of characters read in getting to accepting state 2 is 2. On reading the “e”, lex.yy.c records that it has reached state5 in the dfa for Float and the no. of characters read is 3 On reading the “8”, lex.yy.c records that the latest accepting state is state 6 in the dfa for Float, and no. of characters leading to that state is 4. On reading the “=”, lex.yy.c finds that state 6 has no “=“ successor. This is the 5th character read. So the last accepting state (state 6) is in the dfa for Float after 4 characters had been read. Hence Float is taken as matching the remaining input, and the 4 character read, i.e 36e8 is removed from the remaining input, and placed in yytext, and yyleng is set to 4.

Example 2. Now consider the case where the remaining input is: 36e-45-23 Here, after reading 3 symbols (36e), the finite automata for Float reaches state 5, where no action is defined for the character “-”. State 5 is not an accepting state. The last accepting state was state 2 in the finite automata for Number which was reached after two characters had been read. So Lex.yy.c reports the next input symbol is Number, and two symbols are removed from the remaining input, which then becomes e-45-23. The two symbols involved (36) are placed in yytext, and yyleng is set to 2.