1 Lexical Analysis Cheng-Chia Chen. 2 Outline l Introduction to lexical analyzer l Tokens l Regular expressions (RE) l Finite automata (FA) »deterministic.

Slides:



Advertisements
Similar presentations
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger
Prof. Fateman CS 164 Lecture 51 Lexical Analysis Lecture 5.
1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.
2. Lexical Analysis Prof. O. Nierstrasz
Prof. Hilfinger CS 164 Lecture 21 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Transparency No. 8-1 Formal Language and Automata Theory Chapter 8 DFA state minimization (lecture 13, 14)
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CS 426 Compiler Construction
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lexical Analysis Cheng-Chia Chen.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Lexical Analysis.
1st Phase Lexical Analysis
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Transparency No. 4-1 Formal Language and Automata Theory Chapter 4 Patterns, Regular Expressions and Finite Automata (include lecture 7,8,9) Transparency.
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
1 Lexical Analysis Cheng-Chia Chen. 2 Outline 1. The goal and niche of lexical analysis in a compiler 2. Lexical tokens 3. Regular expressions (RE) 4.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CS510 Compiler Lecture 2.
Chapter 3 Lexical Analysis.
The time complexity for e-closure(T).
Two issues in lexical analysis
Recognizer for a Language
Review: Compiler Phases:
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
4b Lexical analysis Finite Automata
Compiler Construction
4b Lexical analysis Finite Automata
Presentation transcript:

1 Lexical Analysis Cheng-Chia Chen

2 Outline l Introduction to lexical analyzer l Tokens l Regular expressions (RE) l Finite automata (FA) »deterministic and nondeterministic finite automata (DFA and NFA) »from RE to NFA »from NFA to DFA »from DFA to optimized DFA

3 Outline l Informal sketch of lexical analysis »Identifies tokens in input string l Issues in lexical analysis »Lookahead »Ambiguities l Specifying lexers »Regular expressions »Examples of regular expressions

4 The Structure of a Compiler SourceTokensInterm. Language Lexical analysis Parsing Code Gen. Machine Code Optimization

5 Lexical Analysis l What do we want to do? Example: if (i == j) Z = 0; else Z = 1; l The input is just a sequence of characters: \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1; l Goal: Partition input string into substrings »And determine the categories (tokens) to which the substrings belong

6 What ’ s a Token? l Output of lexical analysis is a stream of tokens l A token is a syntactic category »In English: noun, verb, adjective, … »In a programming language: Identifier, Integer, Keyword, Whitespace, … l Parser relies on the token distinctions: identifiers are treated differently than keywords

7 Aspects of Token l language view: a set of strings (belonging to that token) »[while], [identifier], [arithOp] l Pattern (grammar): a rule defining a token »[while]: while »[identifier]: letter followed by letters and digits »[arithOp]: + or - or * or / l Lexeme (member): a string matched by the pattern of a token »while, var23, count, +, *

8 Attributes of Tokens l Attributes are used to distinguish different lexemes in a token »[while,] »[identifier, var35] »[arithOp, +] »[integer, 26] »[string, “26”] l positional information: start/end line/position l Tokens affect syntax analysis and attributes affect semantic analysis and error handling

9 Lexical Analyzer: Implementation l An implementation must do two things: 1.Recognize substrings corresponding to tokens 2.Return the attributes( lexeme and positional information) of the token –The lexeme is either the substring or some data object constructed from the substring.

10 Example l input lines: \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1; l Token-lexeme pairs returned by the lexer: »[Whitespace, “ \t ” ] »[if, ] »[OpenPar, “ ( “ ] »[Identifier, “ i ” ] »[Relation, “ == “ ] »[Identifier, “ j ” ] »…

11 Lexical Analyzer: Implementation The lexer usually discards “ uninteresting ” tokens that don ’ t contribute to parsing. l Examples: Whitespace, Comments l Question: What happens if we remove all whitespace and all comments prior to lexing?

12 Lexical Analysis in FORTRAN l FORTRAN rule: Whitespace is insignificant E.g., VAR1 is the same as VA R1 l Footnote: FORTRAN whitespace rule motivated by inaccuracy of punch card operators

13 A terrible design! Example l Consider »DO 5 I = 1,25 »DO 5 I = 1.25 The first is DO 5 I = 1, 25 The second is DO5I = 1.25 Reading left-to-right, cannot tell if DO5I is a variable or DO stmt. until after “, ” is reached

14 Lexical Analysis in FORTRAN. Lookahead. l Two important points: 1.The goal is to partition the string. This is implemented by reading left-to-right, recognizing one token at a time 2.“ Lookahead ” may be required to decide where one token ends and the next token begins »Even our simple example has lookahead issues i vs. if = vs. ==

15 Lexical Analysis in PL/I l PL/I keywords are not reserved IF ELSE THEN THEN = ELSE; ELSE ELSE = THEN l PL/I Declarations: DECLARE (ARG1,..., ARGN) Can ’ t tell whether DECLARE is a keyword or array reference until after the ). »Requires arbitrary lookahead!

16 Review l The goal of lexical analysis is to »Partition the input string into lexemes »Identify the token of each lexeme l Left-to-right scan => lookahead sometimes required

17 Next l We still need »A way to describe the lexemes of each token »A way to resolve ambiguities –Is if two variables i and f ? –Is == two equal signs = = ?

18 Regular Expressions and Regular Languages l There are several formalisms for specifying tokens l Regular languages are the most popular »Simple and useful theory »Easy to understand »Efficient implementations

19 Languages Def. Let S be a set of characters. A language over S is a set of strings of characters drawn from S (  is called the alphabet )

20 Examples of Languages l Alphabet = English characters l Language = English words l Not every string on English characters is an English word »likes, school, … »beee,yykk, … l Alphabet = ASCII l Language = C programs l Note: ASCII character set is different from English character set

21 Notation l Languages are sets of strings. l Need some notation for specifying which sets we want l For lexical analysis we care about regular languages, which can be described using regular expressions.

22 Regular Expressions and Regular Languages l Each regular expression is a notation for a regular language (a set of words) l If A is a regular expression then we write L(A) to refer to the language denoted by A

23 Atomic Regular Expressions l Single character: c L( c) = { c } (for any c 2  ) Epsilon (empty string):  L(  ) = {  }

24 Compound Regular Expressions l Union (or choice) A | B = { s | s 2 A or s 2 B } l Concatenation: AB (where A and B are reg. exp.) L(A  B) = {    |  2 L(A) and  2 L(B) } l Note: »A  B (set concatenation) and    (string concatenation) will be abbreviated to AB and , respectively. l Examples: »if | then | else = { if, then, else} »0 | 1 | … | 9 = { 0, 1, …, 9 } »Another example: (0 | 1) (0 | 1) = { 00, 01, 10, 11 }

25 More Compound Regular Expressions l So far we do not have a notation for infinite languages l Iteration: A * L(A * ) = {  } [ L(A) [ L(AA) [ L(AAA) [ … l Examples: 0 * = { , 0, 00, 000, … } 1 0 * = { strings starting with 1 and followed by 0 ’ s }

26 Example: Keyword »Keyword: else or if or begin … else | if | begin | …

27 Example: Integers Integer: a non-empty string of digits ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ) ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 )* l problem: reuse complicated expression l improvement: define intermediate reg. expr. digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 number = digit digit * Abbreviation: A + = A A *

28 Regular Definitions l Names for regular expressions »d 1 =  r 1 »d 2 =  r 2 »... »d n =  r n where r i over alphabet  {d 1, d 2,..., d i-1 } l note: Recursion is not allowed.

29 Example »Identifier: strings of letters or digits, starting with a letter digit = 0 | 1 |... | 9 letter = A | … | Z | a | … | z identifier = letter (letter | digit) * »Is (letter* | digit*) the same ?

30 Example: Whitespace Whitespace: a non-empty sequence of blanks, newlines, CRNL and tabs WS = (\ | \t | \n | \r\n ) +

31 Example: Addresses l Consider  = letters [ } name = letter + address = name ’ name ( ‘. ’ name) *

32 Notational Shorthands l One or more instances »r+ = r r* »r* = r+ |  l Zero or one instance »r? = r |  l Character classes »[abc] = a | b | c »[a-z] = a | b |... | z »[ac-f] = a | c | d | e | f »[^ac-f] =  – [ac-f]

33 Summary l Regular expressions describe many useful languages l Regular languages are a language specification »We still need an implementation l problem: Given a string s and a rexp R, is

34 Implementation of Lexical Analysis

35 Outline l Specifying lexical structure using regular expressions l Finite automata »Deterministic Finite Automata (DFAs) »Non-deterministic Finite Automata (NFAs) l Implementation of regular expressions RegExp => NFA => DFA =>optimized DFA => Tables

36 Regular Expressions in Lexical Specification l Last lecture: a specification for the predicate s  L(R) l But a yes/no answer is not enough ! l Instead: partition the input into lexemes l We will adapt regular expressions to this goal

37 Regular Expressions => Lexical Spec. (1) 1. Select a set of tokens Number, Keyword, Identifier, Write a rexp for the lexemes of each token Number = digit + Keyword = if | else | … Identifier = letter (letter | digit)* OpenPar = ‘ ( ‘ …

38 Regular Expressions => Lexical Spec. (2) 3. Construct R, matching all lexemes for all tokens R = Keyword | Identifier | Number | … = R 1 | R 2 | R 3 + … Facts: If s 2 L(R) then s is a lexeme »Furthermore s 2 L(R i ) for some “ i ” »This “ i ” determines the token that is reported

39 Regular Expressions => Lexical Spec. (3) 4. Let the input be x 1 … x n (x 1... x n are characters in the language alphabet) For 1  i  n check x 1 … x i  L(R) ? 5. It must be that x 1 … x i  L(R j ) for some j 6. Remove x 1 … x i from input and go to (4)

40 How to Handle Spaces and Comments? 1. We could create a token Whitespace Whitespace = ( ‘ ’ + ‘ \n ’ + ‘ \r ’ + ‘ \t ’ ) + »We could also add comments in there »An input “ \t\n 5555 “ is transformed into Whitespace Integer Whitespace 2. Lexer skips spaces (preferred) Modify step 5 from before as follows: It must be that x k... x i 2 L(R j ) for some j such that x 1... x k-1 2 L(Whitespace) Parser is not bothered with spaces

41 Ambiguities (1) l There are ambiguities in the algorithm l How much input is used? What if – x 1 … x i  L(R) and also – x 1 … x K  L(R) »Rule: Pick the longest possible substring »The maximal match

42 Ambiguities (2) l Which token is used? What if – x 1 … x i  L(R j ) and also – x 1 … x i  L(R k ) »Rule: use rule listed first (j if j < k) l Example: »R 1 = Keyword and R 2 = Identifier »“ if ” matches both. »Treats “ if ” as a keyword not an identifier

43 Error Handling l What if No rule matches a prefix of input ? Problem: Can ’ t just get stuck … l Solution: »Write a rule matching all “ bad ” strings »Put it last l Lexer tools allow the writing of: R = R 1 |... | R n | Error »Token Error matches if nothing else matches

44 Summary l Regular expressions provide a concise notation for string patterns l Use in lexical analysis requires small extensions »To resolve ambiguities »To handle errors l Good algorithms known (next) »Require only single pass over the input »Few operations per character (table lookup)

45 Finite Automata l Regular expressions = specification l Finite automata = implementation l A finite automaton consists of »An input alphabet  »A set of states S »A start state n »A set of accepting states F  S »A set of transitions state  input state

46 Finite Automata l Transition s1 a s2s1 a s2 l Is read In state s 1 on input “ a ” go to state s 2 l If end of input (or no transition possible) »If in accepting state => accept »Otherwise => reject

47 Finite Automata State Graphs l A state The start state An accepting state A transition a

48 A Simple Example A finite automaton that accepts only “ 1 ” 1

49 Another Simple Example A finite automaton accepting any number of 1 ’ s followed by a single 0 l Alphabet: {0,1} 0 1

50 And Another Example l Alphabet {0,1} l What language does this recognize?

51 And Another Example l Alphabet still { 0, 1 } l The operation of the automaton is not completely defined by the input »On input “ 11 ” the automaton could be in either state 1 1

52 Epsilon Moves l Another kind of transition:  -moves  Machine can move from state A to state B without reading input A B

53 Deterministic and Nondeterministic Automata l Deterministic Finite Automata (DFA) »One transition per input per state »No  -moves l Nondeterministic Finite Automata (NFA) »Can have multiple transitions for one input in a given state »Can have  -moves l Finite automata have finite memory »Need only to encode the current state

54 Execution of Finite Automata l A DFA can take only one path through the state graph »Completely determined by input l NFAs can choose »Whether to make  -moves »Which of multiple transitions for a single input to take

55 Acceptance of NFAs l An NFA can get into multiple states Input: Rule: NFA accepts it it can get in a final state

56 Acceptance of a Finite Automata l A FA (DFA or NFA) accepts an input string s iff there is some path in the transition diagram from the start state to some final state such that the edge labels along this path spell out s

57 NFA vs. DFA (1) l NFAs and DFAs recognize the same set of languages (regular languages) l DFAs are easier to implement »There are no choices to consider

58 NFA vs. DFA (2) l For a given language the NFA can be simpler than the DFA NFA DFA DFA can be exponentially larger than NFA

59 Operations on NFA states  -closure(s): set of NFA states reachable from NFA state s on  -transitions alone  -closure(S): set of NFA states reachable from some NFA state s in S on  -transitions alone l move(S, c): set of NFA states to which there is a transition on input symbol c from some NFA state s in S notes: »  -closure(S) = U s ∈ S  -closure(S); »  -closure(s) = U s ∈ S  -closure({s}); »  -closure(S) = ?

60 Computing  -closure l Input. An NFA and a set of NFA states S. Output. E =  -closure(S). begin push all states in S onto stack; T := S; while stack is not empty do begin pop t, the top element, off of stack; for each state u with an edge from t to u labeled  do if u is not in T do begin add u to T; push u onto stack end end; return T end.

61 Simulating an NFA l Input. An input string ended with eof and an NFA with start state s0 and final states F. l Output. The answer “yes” if accepts, “no” otherwise. begin S :=  -closure({s0}); c := next_symbol(); while c != eof do begin S :=  -closure(move(S, c)); c := next_symbol(); end; if S  F !=  then return “yes” else return “no” end.

62 Regular Expressions to Finite Automata l High-level sketch Regular expressions NFADFA Lexical Specification Table-driven Implementation of DFA Optimized DFA

63 Regular Expressions to NFA (1) l For each kind of rexp, define an NFA »Notation: NFA for rexp A A For   For input a a

64 Regular Expressions to NFA (2) l For AB A B  For A + B A B    

65 Regular Expressions to NFA (3) l For A* A   

66 Example of RegExp -> NFA conversion l Consider the regular expression (1+0)*1 l The NFA is  1 C E 0 DF   B   G    A H 1 IJ

67 NFA to DFA

68 Regular Expressions to Finite Automata l High-level sketch Regular expressions NFADFA Lexical Specification Table-driven Implementation of DFA Optimized DFA

69 RegExp -> NFA : an Examlpe l Consider the regular expression (1+0)*1 l The NFA is  1 C E 0 DF   B   G    A H 1 IJ

70 NFA to DFA. The Trick l Simulate the NFA l Each state of DFA = a non-empty subset of states of the NFA l Start state = the set of NFA states reachable through  - moves from NFA start state Add a transition S  a S ’ to DFA iff »S ’ is the set of NFA states reachable from any state in S after seeing the input a –considering  -moves as well

71 NFA -> DFA Example         A B C D E F G H IJ ABCDHI FGABCDHI EJGABCDHI

72 NFA to DFA. Remark l An NFA may be in many states at any time l How many different states ? l If there are N states, the NFA must be in some subset of those N states l How many non-empty subsets are there? »2 N - 1 = finitely many

73 From an NFA to a DFA l Subset construction Algorithm. l Input. An NFA N. l Output. A DFA D with states S and trasition table mv. begin add  -closure(s0) as an unmarked state to S; while there is an unmarked state T in S do begin mark T; for each input symbol a do begin U :=  -closure(move(T, a)); if U is not in S then add U as an unmarked state to S; mv[T, a] := U end end end.

74 Implementation l A DFA can be implemented by a 2D table T »One dimension is “ states ” »Other dimension is “ input symbols ” »For every transition S i  a S k define mv[i,a] = k DFA “ execution ” »If in state S i and input a, read mv[i,a] = k and skip to state S k »Very efficient

75 Table Implementation of a DFA S T U STU TTU UTU

76 Simulation of a DFA l Input. An input string ended with eof and a DFA with start state s0 and final states F. Output. The answer “yes” if accepts, “no” otherwise. begin s := s0; c := next_symbol(); while c <> eof do begin s := mv(s, c); c := next_symbol() end; if s is in F then return “yes” else return “no” end.

77 Implementation (Cont.) l NFA -> DFA conversion is at the heart of tools such as flex l But, DFAs can be huge »DFA => optimized DFA : try to decrease the number of states. »not always helpful! l In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations

78 Time-Space Tradeoffs l RE to NFA, simulate NFA »time: O(|r| * |x|), space: O(|r|) l RE to NFA, NFA to DFA, simulate DFA »time: O(|x|), space: O(2|r|) l Lazy transition evaluation »transitions are computed as needed at run time; »computed transitions are stored in cache for later use

79 DFA to optimized DFA

80 Motivations Problems: 1. Given a DFA M with k states, is it possible to find an equivalent DFA M’ (I.e., L(M) = L(M’)) with state number fewer than k ? 2. Given a regular language A, how to find a machine with minimum number of states ? Ex: A = L((a+b)*aba(a+b)*) can be accepted by the following NFA: By applying the subset construction, we can construct a DFA M2 with 2 4 =16 states, of which only 6 are accessible from the initial state {s}. stuv ab a a,b

81 Inaccessible states l A state p  Q is said to be inaccessible (or unreachable) [from the initial state] if there exists no path from from the initial state to it. If a state is not inaccessible, it is accessible. l Inaccessible states can be removed from the DFA without affecting the behavior of the machine. l Problem: Given a DFA (or NFA), how to find all inaccessible states ?

82 Finding all accessible states: (like e-closure) l Input. An FA (DFA or NFA) l Output. the set of all accessible states begin push all start states onto stack; Add all start states into A; while stack is not empty do begin pop t, the top element, off of stack; for each state u with an edge from t to u  do if u is not in A do begin add u to A; push u onto stack end end; return A end.

83 Minimization process l Minimization process for a DFA: »1. Remove all inaccessible states »2. Collapse all equivalent states l What does it mean that two states are equivalent? »both states have the same observable behaviors.i.e., »there is no way to distinguish their difference, or »more formally, we say p and q are not equivalent(or distinguishable) iff there is a string x  * s.t. exactly one of  (p,x) and  (q,x) is a final state, »where  (p,x) is the ending state of the path from p with x as the input. l Equivalents sates can be merged to form a simpler machine.

a a a,b a b bb ,23,4 a,b Example:

85 Quotient Construction M=(Q, , ,s,F): a DFA.  : a relation on Q defined by: p  q for all x   *  (p,x)  F iff  (q,x)  F Property:  is an equivalence relation. l Hence it partitions Q into equivalence classes [p] = {q  Q | p  q} for p  Q. and the quotient set Q/  = {[p] | p  Q}. Every p  Q belongs to exactly one class [p] and p  q iff [p]=[q]. Define the quotient machine M/  = where »Q’=Q/  ; s’=[s]; F’={[p] | p  F}; and  ’([p],a)=[  (p,a)] for all p  Q and a  .

86 Minimization algorithm l input: a DFA l output: a optimized DFA 1. Write down a table of all pairs {p,q}, initially unmarked. 2. mark {p,q} if p ∈ F and q  F or vice versa. 3. Repeat until no more change: 3.1 if ∃ unmarked pair {p,q} s.t. {move(p,q), move(q,a)} is marked for some a ∈ S, then mark {p,q}. 4. When done, p  q iff {p,q} is not marked. 5. merge all equivalent states into one class and return the resulting machine

87 An Example: l The DFA: ab >012 1F1F34 2F2F F5F55

88 Initial Table

89 After step 2 1M 2M- 3-MM 4-MM- 5M--MM 01234

90 After first pass of step 3 1M 2M- 3-MM 4-MM- 5MMMMM 01234

91 2nd pass of step 3. The result : 1  2 and 3  4. 1M 2M- 3M2MM 4 MM- 5MM1 MM 01234