1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
4b Lexical analysis Finite Automata
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Scanner 中正理工學院 電算中心副教授 許良全. Copyright © 1998 by LCH Compiler Design Overview of Scanning n The purpose of a scanner is to group input characters into.
2. Lexical Analysis Prof. O. Nierstrasz
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Topic #3: Lexical Analysis
Finite-State Machines with No Output
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
1 Chapter 3 Scanning – Theory and Practice. 2 Overview of scanner A scanner transforms a character stream of source file into a token stream. It is also.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
CSE 5317/4305 L2: Lexical Analysis1 Lexical Analysis Leonidas Fegaras.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analyzer in Perspective
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CS 321 Programming Languages and Compilers Lectures 16 & 17 Introduction to Formal Languages Regular Languages Lexical Analysis.
CSc 453 Lexical Analysis (Scanning)
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Lexical Analysis.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Lexical Analysis (Tokenizing) COMP 3002 School of Computer Science.
Department of Software & Media Technology
Lexical Analyzer in Perspective
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 :: Programming Language Syntax
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
Two issues in lexical analysis
Recognizer for a Language
Review: Compiler Phases:
Chapter 2 :: Programming Language Syntax
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Chapter 2 :: Programming Language Syntax
Lecture 5 Scanning.
Presentation transcript:

1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015

2 2 Outlines  3.1 Overview  3.2 Regular Expressions  3.4 Finite Automata and Scanners  3.5 Using a Scanner Generator  LEX --- Introduce in TA course: LEX introduction  3.7 Practical Considerations  3.8 Translating Regular Expressions into Finite Automata  3.9 Summary  Modify form 10/8/2015

Overview(1)  Formal notations  For specifying the precise structure of tokens are necessary  Quoted string in Pascal  Can a string split across a line?  Is a null string allowed?  Is.1 or 10. ok?  The problem  Scanner generators  Tables, Programs  What formal notations to use? 10/8/20153

Overview(2)  Lexical analyzer (scanner) role  Produce a sequence of (tokens) for parser  Stripe out comments and whitespaces  Associate a line number with each error message  Expand macros 10/8/20154 Lexical Analyzer Parser Symbol Table source program to semantic analysis token getNextToken

Regular Expressions (1)  Tokens  built from symbols of a finite vocabulary.  Structures of tokens  use regular expressions to define  Set Definition  The sets of strings defined by regular expressions are termed   is a regular expression denoting the empty set  is a regular expression denoting the set that contains only the empty string  A string s is a regular expression denoting a set containing only s 10/8/20155

Regular Expression (2)  if A and B are regular expressions, so are  A | B (alternation)  A regular expression formed by A or B  (a)|(b) = {a, b}  AB or AB (concatenation)  A regular expression formed by A followed by B  (a)(b) = {ab}  A* (Kleene closure)  A regular expression formed by zero or more repetitions of A  a* = {, a, aa, aaa, …} 10/8/20156 More Complex Example (a|b|c)* = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc …}

Regular Expression (3)  Some notational convenience  P +  PP* (at least one)  Not(A)  V - A  Not(S)  V* - S  A K  AA …A (k copies)  A ?  Optional, zero or one occurrence of A 10/8/20157 More Complex Example  Let D = (0 | 1 | 2 | 3 | 4 |... | 9 )  Let L = (A | B |... | Z | a | b |... | z)  comment = -- not(EOL)* EOL  ex: --hello12_34 \n  decimal = D + · D +  ex:  ident = L (L | D | _)*  ex: A1a5_6  comments = ((#|  ) not(#))*  ex:#A435#3

Regular Expressions (4)  Is regular expression as power as CFG?  { [ i ] i | i  1}  Regular grammar 10/8/20158 AaAa AaBAaB AaAa A  Ba or

Finite Automata and Scanners (1)  Finite automaton (FA)  can be used to recognize the tokens specified by a regular expression  A FA consists of  A finite set of states S  A set of input symbols  (the input symbol alphabet)  A set of transitions (or moves) from one state to another, labeled with characters in V  A special start state s 0 (only one)  A set of final, or accepting, states F 10/8/20159 FA = {S, , s 0, F, move }

Finite Automata and Scanners (2) 10/8/ is a transition is a state is a final state is the start state Example at next page….

 Example  A transition diagram  This machine accepts (abc + ) + Finite Automata and Scanners (3) 10/8/ a abc c (abc + ) +

Finite Automata and Scanners (4)  Other Example  (0|1)*0(0|1)(0|1) 10/8/ ,1 5 0 (0|1)*0

Finite Automata and Scanners (5)  Other Example  ID = L(L|D)*(_(L|D) + )*  A data structure can be translated for many REs or FAs 10/8/ L - L | DL | D L | DL | D (_(L|D) + )*L(L|D)* Final for two * symbol What difference? Answer : “_” by times item 2 = item 3

Finite Automata and Scanners (6)  Other Example  RealLit = (D + ( |.))|(D*.D + ) 10/8/201514

 Two kinds of FA:  Deterministic: next transition is unique  Non-deterministic: otherwise Finite Automata and Scanners (7) 10/8/ a a Which path we should select?...

 A transition diagram  A transition table Finite Automata and Scanners (8) 10/8/ / / Not(Eol) 342 Eol StateCharacter -Eolab…

Finite Automata and Scanners (9)  Any regular expression  can be translated into a DFA that accepts the set of strings denoted by the regular expression  The transition can be done  Automatically by a scanner generator : LEX (TA course)  Manually by a programmer :  Coding the DFA in two form  1. Table-driven, commonly produced by a scanner generator  2. Explicit control, produced automatically or by hand 10/8/201517

Finite Automata and Scanners (10)  Scanner Driver Interpreting a Transition Table /* Note: CurrentChar is already set to the current input character. */ State = StartState; while (TRUE) { NextSate = T[State, CurrentChar]; if (NextSate == ERROR) break; State = NextState; CurrentChar = getchar(); } If(is_final_state(State)) /* Return or process valid token. */ else lexical_error(CurrentChar); 10/8/ Table-driven

Finite Automata and Scanners (11)  Scanner with Fixed Token Definition if (CurrentChar == ‘/') { CurrentChar = getchar(); if (CurrentChar == ‘/') { do { CurrentChar = getchar(); } while (CurrentChar != '\n'); } else { ungetc(CurrentChar, stdin); lexical_error(CurrentChar); } else lexical_error(CurrentChar); /* Return or process valid token. */ 10/8/ Explicit control

Finite Automata and Scanners (12)  Transducer  We may perform some actions during state transition.  A scanner can be turned into a transducer by the appropriate insertion of actions based on state transitions 10/8/201520

21 Using a Scanner Generator  By TA…. 10/8/2015

Practical Considerations (1)  Reserved Words  Usually, all keywords are reserved in order to simplify parsing.  In Pascal, we could even write  begin  begin; end; end; begin;  end  if else then if = else;  The problem  with reserved words is that they are too numerous.  COBOL has several hundreds of reserved words!  ZEROS  ZERO  ZEROES 10/8/201522

Practical Considerations (2)  Compiler Directives and Listing Source Lines  Compiler options e.g. optimization, profiling, etc.  handled by scanner or semantic routines  Complex pragmas are treated like other statements.  Source inclusion  e.g. #include in C  handled by preprocessor or scanner  Conditional compilation  e.g. #if, #endif in C  useful for creating program versions 10/8/201523

Practical Considerations (3)  Entry of Identifiers into the Symbol Table  Who is responsible for entering symbols into symbol table?  Scanner?  Consider this example:  { int abc;  …  { int abc; }  } 10/8/201524

Practical Considerations (4)  How to handle end-of-file?  Create a special EOF token.  EOF token is useful in a CFG  Multicharacter Lookahead  Blanks are not significant in Fortran  DO 10 I= 1,100  Beginning of a loop  DO 10 I =  An assignment statement DO 10 I=  A Fortran Scanner  can determine whether the O is the last character of a DO token only after reading as far as the comma 10/8/201525

Practical Considerations (5)  Multicharacter Lookahead (Cont’d)  In Ada and Pascal  To scan  There are three token  10 ..  100  Two-character (..) lookahead after the 10  It is easy to build a scanner that can perform general backup.  If we reach a situation  in which we are not in final state and cannot scan any more characters, we extract characters from the right end of the buffer and queue them fir rescanning  Until we reach a prefix of the scanned characters flagged as a valid token 10/8/ Example at next page

Practical Considerations (6)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D ● D D D ●● Buffered TokenToken Flag 1Integer Literal 12Integer Literal 12.Invalid 12.3Real Literal 12.3eInvalid 12.3e+Invalid Detail Operation of each case at next page

Practical Considerations (7)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Integer Literal1 Input Token Input string: 12.3e+q

Practical Considerations (8)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Integer Literal1 Input Token D 22 Input string: 12.3e+q

Practical Considerations (9)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● Input string: 12.3e+q

Practical Considerations (10)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Real Literal1 Input Token D 22.. ● 33 D Input string: 12.3e+q

Practical Considerations (11)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? Input string: 12.3e+q Backup is invoked!

Practical Considerations (11)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? Input string: 12.3e+q Backup is invoked!

Practical Considerations (12)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/ D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? ++ ? Input string: 12.3e+q

Practical Considerations (13) cannot scan any more characters, and not in accept state  Backup is invoked ! 10/8/ D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? ++ ? Input string: 12.3e+q

36 Outlines  3.1 Over View  3.2 Regular Expression  3.3 Finite Automata and Scanners  3.4 Using a Scanner Generator  3.5 Practical Considerations  3.6 Translating Regular Expressions into Finite Automata  Creating Deterministic Automata  Optimizing Finite Automata  3.7 Tracing Example 10/8/2015

Translating Regular Expressions into Finite Automata(1)  Regular expressions are equivalent to FAs  The main job of a scanner generator  To transform a regular expression definition into an equivalent FA 10/8/ A regular expressionNondeterministic FADeterministic FA Optimized Deterministic FA minimize # of states Importance in NFA->DFA

Translating Regular Expressions into Finite Automata(2)  We can transform any regular expression into an NFA with the following properties:  There is an unique final state  The final state has no successors  Every other state has at least one successors  Example : A Nondeterministic Finite Automaton (NFA)  Input : babb  Regular Expressions : (a|b)*abb 10/8/ Unique final stateFinal S has no successor 0 a a 2 bb b 31 either one or two successors

Translating Regular Expressions into Finite Automata(3)  We need to review the definition of regular expression  Item 1:  It is null string  Item 2: a  It is a char of the vocabulary  Item 3 : |  It is “or” operation.  Example : A|B  Item 4 : ●  It is the operation of catenation  Example : AB  Item 4 : *  It is the operation of repetition  Example : A* 10/8/ More Example at Next Page

Translating Regular Expressions into Finite Automata(4)  NFA :  (null string)  NFA : a (1string)  A char of the vocabulary 10/8/ a Processing Token a

 NFA :  NFA For A Translating Regular Expressions into Finite Automata(5) 10/8/ NFA For B  Processing Token 

 NFA :  ●  Translating Regular Expressions into Finite Automata(6) 10/8/ NFA For A NFA For B  Processing Token ● 

 NFA :   Translating Regular Expressions into Finite Automata(7) 10/8/ NFA For A  Processing Token  = 0 times > 1 times

 Construct an NFA for Regular Expression    01 * | 1  (0(1 * )) |1 Translating Regular Expressions into Finite Automata(8) 10/8/ * Processing Token  Start

 Construct an NFA for Regular Expression    01 * |1  (0(1 * )) |1 Translating Regular Expressions into Finite Automata(9) 10/8/ Processing Token 1 *  Start 0  For Connection

 Construct an NFA for Regular Expression     01 * +1  (0(1 * ))+1 Translating Regular Expressions into Finite Automata(10) 10/8/ Processing Token 1 * 0 |1    Start

 What’s problem about NFA?  Ans: It may be ambiguous that difficult to program!!!  A Nondeterministic Finite Automaton (NFA): (a|b)*abb Translating Regular Expressions into Finite Automata(11) 10/8/ b 3 Start 0 a 1 b a b Input : babb Processing Token ba Ambiguous!!! Which one should we select?

 What’s problem about NFA?  Ans: It may be ambiguous that difficult to program!!!  A deterministic Finite Automaton (NFA): b*abb Translating Regular Expressions into Finite Automata(12) 10/8/ b 3 Start 0 a 1 b b Input : babb Processing Token ba No Ambiguous!!! It have unique path! bb

Creating Deterministic Automata(1)  The transformation  from an NFA N to an equivalent DFA M works by what is sometimes called the subset construction  An Example for each step…  Initial NFA : 01 * |1  (0(1 * )) |1 10/8/ Start 4  65 2   More Detail operation at next page…

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7, 10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} More Detail operation at next page…

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(1) ={1, 2, 8}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(2) ={2}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(3) ={3,4,5,7,10}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(4) ={4,5,7,10}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(5) ={5}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(6) ={5,6,7,10} This point line not be computed!!

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(7) ={5, 7,10} 1 5

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(8) ={8}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(9) ={9,10}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Total closures, but…..

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Delete Sub Set...

Creating Deterministic Automata(2)  Step 1: 10/8/ Start closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Now Closures, No Sub Set... State 3   state 3   state 4   state 5,7   state 10   empty

Creating Deterministic Automata(3)  Step 1:  The initial state of M is the set of states reachable from the initial state of N by -transitions  Usually called l-closure or ε-closure 10/8/ Algorithm for example at upside

Creating Deterministic Automata(4)  Step 2:  To create the successor states  Take any state S of M and any character c, and compute S’s successor under c  S is identified with some set of N’s states, {n 1, n 2,…}  Find all possible successor states to {n 1, n 2,…} under c  Obtain a set {m 1, m 2,…}  T=close({m 1, m 2,…}) 10/8/ ST {n 1, n 2,…}close({m 1, m 2,…})

Creating Deterministic Automata(7)  Step 2: void make_deterministic( nondeterministic_fa N, deterministic *M) { set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N. states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states; Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S); } 10/8/ Example at next page…

Creating Deterministic Automata(5)  Step 2:  First Re-Number for simplifying the work flow 1. -closure(1) ={1, 2, 8}  A = {1, 2, 8} 3. -closure(3) ={3,4,5,7,10}  B = {3,4,5,7,10} 6. -closure(6) ={5,6,7,10}  C = {5,6,7,10} 9. -closure(9) ={9,10}  D = {9, 10} 10/8/ Start More Operation at next page ……

Creating Deterministic Automata(6) 10/8/ {1,2,8} {3,4,5,7,10} {9, 10} {5,6,7,10} Start A : {1, 2, 8} B : {3,4,5,7,10} C : {5,6,7,10} D : {9, 10} A B C D Start No Out-Degree Final

Creating Deterministic Automata(7)  Step 2: void make_deterministic( nondeterministic_fa N, deterministic *M) { set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N. states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states; Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S); } 10/8/ Example at next page…

Optimizing Finite Automata(1)  Minimize number of states  Every DFA has a unique smallest equivalent DFA  Given a DFA M we use Transition Table to construct the equivalent minimal DFA.  Initially, we draw a transition table from DFA diagram. 10/8/ Start 1 A   DFA D 1 BC Table State Character 01 ABD BC CC D A: Start State B,C,D: Final State

Optimizing Finite Automata(2)  Minimize number of states 10/8/ State Character 01 ABD BC CC D Start 1 A   DFA D 1 BC Optimize B is equal C State Character 01 A{B, C}D D New DFA Start A   D 1 B,C A: Start State B,C,D: Final State Special : B can merge into C, Because the B and C are final state.

Additional  Simplifying rules (removing parentheses)  “ * ” has highest precedence and is left associative  Concatenation has 2nd highest precedence and is left associative  “| “has lowest precedence and is left associative  E.g., (a)|((b)*(c)) == a|b*c 10/8/201582

83 Outlines  3.1 Over View  3.2 Regular Expression  3.3 Finite Automata and Scanners  3.4 Using a Scanner Generator  3.5 Practical Considerations  3.6 Translating Regular Expressions into Finite Automata  3.7 Tracing Example  Modify form 10/8/2015

Tracing Example(1)  Review Steps of Scanner Generator 10/8/ A regular expressionNondeterministic FADeterministic FA Optimized Deterministic FA minimize # of states Importance in NFA->DFA

Tracing Example(2)  Regular Expression  IF and IFA 10/8/ if {return IF;} [a - z] [a – z|0 - 9 ] * {return ID;} [0 - 9] + {return NUM;}. {error ();}

Tracing Example(3)  Translate from RE to NFA 10/8/ A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states

Tracing Example(4) 10/8/ The NFA for a symbol i is: i 12 start The NFA for the regular expression if is: f 3 1 start 2 i The NFA for a symbol f is: f 2 start 1 IF if {return IF;}

Tracing Example(5) 10/8/ a-z 1 start [a-z] [a-z|0-9 ] * {return ID;} 423 a-z 0-9 ID

Tracing Example(6) 10/8/ start NUM [0 – 9] + {return NUM;} 0-9

Tracing Example(9) 10/8/ NUM 21 any but \n error ID IF 1 2 i f 3 a-z

Tracing Example(10)  Translate from NFA to DFA 10/8/ A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states

Tracing Example(11) 10/8/ a-z 0-9 a-z 0-9 i f IF error NUM ID any character Full NFA Diagram Special case :Handle in Final

Tracing Example(12) 10/8/ a-z 0-9 a-z 0-9 i f IF error NUM ID any character 1. -closure(1) ={ 1, 4, 9, 14} 2. -closure(2) ={ 2} 3. -closure(3) ={ 3} 4. -closure(5) ={ 5, 6, 8} 5. -closure(7) ={ 7, 8} 6. -closure(8) ={ 6, 8} 7. -closure(10) ={ 10, 11, 13} 9. -closure(13) ={11, 13} 8. -closure(12) ={12, 13} closure(15) ={15} 15

Tracing Example(13) 10/8/ DFA States = { } Now we need to compute: move( ,a-h) = {5,15} -closure ({5,15}) = {5,6,8,15} a-h a-z 0-9 a-z 0-9 i f IF error NUM ID any character

Tracing Example(16) 10/8/ DFA States = { } move( , i) = a-h {2,5,15} -closure ({2,5,15}) = {2,5,6,8,15} i a-z 0-9 a-z 0-9 i f IF error any character

Tracing Example(21) 10/8/ DFA States = { } move( , j-z) = -closure ({5,15}) = a-h i j-z {5,15} {5,6,8,15} a-z 0-9 a-z 0-9 i f IF error NUM ID any character

Tracing Example(22) 10/8/ DFA States = { } move( , 0-9) = a-h i j-z {10,15} -closure ({10,15}) = {10,11,13,15} a-z 0-9 a-z 0-9 i f IF error NUM ID any character

Tracing Example(23) 10/8/ DFA States = { } move( , other ) = a-h i j-z other {15} -closure ({15}) = {15} a-z 0-9 a-z 0-9 i f IF error any character NUM ID

Tracing Example(24) 10/8/ DFA states = { } The analysis for is complete. We mark it and pick another state in the DFA to analysis. (Practice) a-z 0-9 a-z 0-9 i f IF error NUM ID any character a-h i j-z other

Tracing Example(25) 10/8/ a-e, g-z, 0-9 a-z, f i a-h j-z 0-9 other ID NUM IF error ID a-z,0-9 See pp. 118 of Aho-Sethi-Ullman and pp. 29 of Appel.

Tracing Example(26) 10/8/ A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states Minimize DFA

Tracing Example(27) 10/8/ Stat e character 0-9a-efg-hij-z oth er ADCCCBCE BGGFGGG- CGGGGGG- DH E FGGGGGG- GGGGGGG- HH A B C D E F G H Transition Table DFA

Tracing Example(28) 10/8/ Stat e character 0-9a-efg-hij-z oth er ADCCCBCE BGGFGGG- CGGGGGG- DH E FGGGGGG- GGGGGGG- HH A B C D E F G H Transition Table DFA Sta te character 0-9a-efg-hij-z oth er ADCCCBCE BCCCCCC- CCCCCCC- DD E New Transition Table-1

Tracing Example(29) 10/8/ A B C D E F G H DFA Sta te character 0-9a-efg-hij-z oth er ADCCCBCE BCCCCCC- CCCCCCC- DD E New Transition Table-1 Sta te character 0-9a-efg-hij-z oth er ADBBBBBE BBBBBBB- DD E New Transition Table-2

Tracing Example(30) 10/8/ A B C D E F G H DFA B DE A 0-9 a-z 0-9 other IF ID NUM error a-z,0-9 B=C=F=G D=H Sta te character 0-9a-efg-hij-z oth er ADBBBBBE BBBBBBB- DD E New Transition Table-2 i f New DFA IF can be handled by look-ahead programming

Chapter 3 End Any Question? 10/8/ 隨堂考試(1 + ) What is the optimized DFA for 1 + ?

1. -closure(1) ={1, 2} 2. -closure(2) ={2} 3. -closure(3) ={3,4,2} 4. -closure(4) ={4,2} 1423  1 * = (Can use this method) 1. -closure(1) ={1, 2}  A 3. -closure(3) ={3,4,2}  B State Character 01 AB BB A  B  {1,2} A Start {3,4,2} 1 B 1 Can Not Optimized, (Merge) For A is Start State, B is Final State!