Lecture 4 (new improved) Subset construction NFA  DFA

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

CSE 311 Foundations of Computing I
4b Lexical analysis Finite Automata
Lecture 6 Nondeterministic Finite Automata (NFA)
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Compiler Construction
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Lecture 4 RegExpr  NFA  DFA Topics Thompson Construction Subset construction Readings: 3.7, 3.6 January 23, 2006 CSCE 531 Compiler Construction.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Decidable Questions About Regular languages 1)Membership problem: “Given a specification of known type and a string w, is w in the language specified?”
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: DFA Minimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
String Matching of Regular Expression
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: , 3.7, 3.6 January 18, 2006 CSCE 531.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Finite Automata Chapter 1. Automatic Door Example Top View.
Lecture Notes 
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
1/29/02CSE460 - MSU1 Nondeterminism-NFA Section 4.1 of Martin Textbook CSE460 – Computability & Formal Language Theory Comp. Science & Engineering Michigan.
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lecture 9 Symbol Table and Attributed Grammars
CIS Automata and Formal Languages – Pei Wang
Lexical analysis Finite Automata
RegExps & DFAs CS 536.
Chapter 2 Finite Automata
The time complexity for e-closure(T).
Two issues in lexical analysis
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Lecture 3 RegExpr  NFA  DFA
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Intro to Data Structures
Lecture 4: Lexical Analysis II: From REs to DFAs
This method is used for converting an
Finite Automata.
4b Lexical analysis Finite Automata
Chapter Two: Finite Automata
Finite Automata & Language Theory
Discrete Math II Howon Kim
CSCI 2670 Introduction to Theory of Computing
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis: DFA Minimization & Wrap Up
4b Lexical analysis Finite Automata
Lecture 6 NFA Subset Construction & Epsilon Transitions
CSE 311 Foundations of Computing I
Lecture 5 Scanning.
Presentation transcript:

Lecture 4 (new improved) Subset construction NFA  DFA CSCE 531 Compiler Construction Lecture 4 (new improved) Subset construction NFA  DFA Topics FLEX Subset Construction Equivalence relations, partitions … Readings: January 30, 2018

Overview Last Time Today’s Lecture References Thompson Construction: Regular Expression  NFA Today’s Lecture Rubular – regular expressions online Subset Construction: NFA  DFA Minimization of DFA well: distinguished states Lex/Flex References Chapter 2 http://www.gnu.org/software/flex/manual/html_mono/flex.html C introduction Examples

Rubular.com

RegEx Quick Reference http://rubular.com/r/11Z1trB1JI

Python Regular Expressions https://docs.python.org/3.4/library/re.html Pythex: a Python regular expression editor https://pythex.org/ POSIX: https://www.regular-expressions.info/posix.html

Python Regular Expressions import re import time filename="../../513/513F16.txt" f = open(filename, "r") regexp=re.compile(r'Degree:\s*([A-Z].*)|[1-9][0-9]?.*([A-Z][a-zA-Z-]*)') #,\s*([A-Z][a-z]*)') degreeRE=re.compile(r'Degree:\s*([A-Za-z].*)') nameRE=re.compile(r'[1-9][0-9]?\s*([A-Z][a-zA-Z-]*\,\s*[A-Za-z-]+)') https://docs.python.org/3.4/library/re.html

Python Example Continued name="unassigned" for line in f: m=regexp.match(line) if m != None: m2 = nameRE.match(line) if m2 != None: name=m2.group(1) m3 = degreeRE.match(line) if m3 != None: print (name, "\t", m3.group(1)) f.close() Python Example Continued

Example 3.14 (a | b)*abb

Transition Table

Deterministic Finite Automata (DFA) A deterministic finite automata (DFA) is an NFA such that: There are no moves on input ε, and For each state s and input symbol a , there is exactly one edge out of s labeled a .

NFA transitions versus DFA Not every state an input has a transition δ(s, a) might be empty δ(1, a) = ϕ For a given state and input there can be more than one transition δ(0, a) = { 0, 1} Epsilon transitions allowed in NFAs None in this diagram

Every DFA is an NFA A DFA is just an NFA that satisfies: No epsilon moves; formally δ(s, ε) = ϕ For each state s and input symbol a, δ(s, a) is a single state; formally | δ(s, a) | = 1 So, every language accepted by an DFA is also accepted by an NFA. Earlier we showed every language denoted by a regular expression has an NFA that accepts it. Today we show every language recognized by an NFA has a DFA that recognizes it also.

So why use an NFA and why use a DFA? NFA more expressive power DFA more efficient recognizer

Algorithm 3.18 : Simulating a DFA. INPUT: An input string x terminated by an end-of-file character eof. A DFA D with start state s0 , accepting states F , and transition function move. OUTPUT: Answer “yes” if D accepts x ; “no” otherwise. METHOD: Apply the algorithm in Fig. 3.27 to the input string x . The function move( s; c) gives the state to which there is an edge from state s on input c . The function nextChar returns the next character of the input string x .

Simulating DFA = recognizing L(DFA)

Algorithm 3.20 : The subset construction Algorithm 3.20 : The subset construction of a DFA from an NFA. INPUT: An NFA N . OUTPUT: A DFA D accepting the same language as N . METHOD: Our algorithm constructs a transition table Dtran for D . Each state of D is a set of NFA states, and we construct Dtran so D will simulate “in parallel” all possible moves N can make on a given input string.

Fig 3.31 Operations on NFA states

Fig. 3.32 the Subset Contruction initially, - closure ( s 0 ) is the only state in Dstates, and it is unmarked; while ( there is an unmarked state T in Dstates ) { mark T ; for ( each input symbol a ) { U = - closure (move( T;a) ); if ( U is not in Dstates ) add U as an unmarked state to Dstates; Dtran[T;a] = U ; } Aho, Alfred V.; Monica S. Lam; Jeffrey D. Ullman; Ravi Sethi. Compilers: Principles, Techniques, and Tools (Page 154). Pearson Education. Kindle Edition.

Fig 3.33 Computing ε-closure of a set

NFA  DFA via Subset Construction d0 = ε-closure(0) = {0, 1, 2, 4, 7}

State of DFA a b d0 = ε-closure(0) = {0, 1, 2, 4, 7} d1 = ε-closure(move(d0, a)) =ε-closure( 3, 8} ={3, 8, 6, 7, 1, 2, 4} d2 = ε-closure(move(d0, b)) =ε-closure( 5 } ={5, 6, 1, 7, 2, 4} d1={3, 8, 6, 7, 1, 2, 4} ε-closure(move(d1, a)) =ε-closure( 3, 8} d2={5, 6, 1, 2, 4}

Algorithm 3.22 : Simulating an NFA INPUT: An input string x terminated by an end-of- le character eof. An NFA N with start state s0 , accepting states F , and transition function move. OUTPUT: Answer “yes” if N accepts x ; “no” otherwise. METHOD: The algorithm keeps a set of current states S, those that are reached from s0 following a path labeled by the inputs read so far. If c is the next input character, read by the function nextChar(), then we first compute move( S; c) and then close that set using ε - closure ().

Simulating an NFA

DFA Minimization Algorithm (Author’s Slide) Discover sets of equivalent states Represent each such set with just one state Two states are equivalent (indistinguishable)if and only if: The set of paths leading to them are equivalent    , transitions on  lead to equivalent states (DFA) -transitions to distinct sets  states must be in distinct sets A partition P of S Each s  S is in exactly one set pi  P The algorithm iteratively partitions the DFA’s states

Details of the algorithm (Author’s Slide) Group states into maximal size sets, optimistically Iteratively subdivide those sets, as needed States that remain grouped together are equivalent Initial partition, P0 , has two sets: {F} & {Q-F} (D =(Q,,,q0,F)) Splitting a set (“partitioning a set by a”) Assume qa, & qb  s, and (qa,a) = qx, & (qb,a) = qy If qx & qy are not in the same set, then s must be split qa has transition on a, qb does not  a splits s One state in the final DFA cannot have two transitions on a

Minimization Algorithm P  { F, {Q-F}} while ( P is still changing) T  { } for each set S  P for each a   partition S by a into S1, and S2 Add S1 and S2 to T if T  P then P  T The partitioning step is: If δ(S) contains states in 2 equivalent sets of states. S

Building Faster Scanners from the DFA (Author’s slide) Table-driven recognizers waste a lot of effort Read (& classify) the next character Find the next state Assign to the state variable Trip through case logic in action() Branch back to the top We can do better Encode state & actions in the code Do transition tests locally Generate ugly, spaghetti-like code Takes (many) fewer operations per input character

Symbol Tables #define ENDSTR 0 #define MAXSTR 100 #include <stdio.h> struct nlist { /* basic table entry */ char *name; struct nlist *next; /*next entry in chain */ int val; }; #define HASHSIZE 100 static struct nlist *hashtab[HASHSIZE]; /* pointer table */

The Hash Function /* PURPOSE: Hash determines hash value based on the sum of the character values in the string. USAGE: n = hash(s); DESCRIPTION OF PARAMETERS: s(array of char) string to be hashed AUTHOR: Kernighan and Ritchie LAST REVISION: 12/11/83 */ hash(char *s) /* form hash value for string s */ { int hashval; for (hashval = 0; *s != '\0'; ) hashval += *s++; return (hashval % HASHSIZE); }

The lookup Function /*PURPOSE: Lookup searches for entry in symbol table and returns a pointer USAGE: np= lookup(s); DESCRIPTION OF PARAMETERS: s(array of char) string searched for AUTHOR: Kernighan and Ritchie LAST REVISION: 12/11/83*/ struct nlist *lookup(char *s) /* look for s in hashtab */ { struct nlist *np; for (np = hashtab[hash(s)]; np != NULL; np = np->next) if (strcmp(s, np->name) == 0) return(np); /* found it */ return(NULL); /* not found */ }

The install Function PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry. USAGE: np = install(name); DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol table AUTHOR: Kernighan and Ritchie, modified by Ron Sobczak LAST REVISION: 12/11/83 */

struct nlist *install(char *name) /* put (name) in hashtab */ { struct nlist *np, *lookup(); char *strdup(), *malloc(); int hashval; if ((np = lookup(name)) == NULL) { /* not found */ np = (struct nlist *) malloc(sizeof(*np)); if (np == NULL) return(NULL); if ((np->name = strdup(name)) == NULL) hashval = hash(np->name); np->next = hashtab[hashval]; hashtab[hashval] = np; } return(np);