Download presentation
Presentation is loading. Please wait.
1
LECTURE NOTES On FINITE AUTOMATA
2
Mathematical Preliminaries
Symbol A character, glyph, mark. An abstract entity that has no meaning by itself, often called uninterpreted. Letters from various alphabets, digits and special characters are the most commonly used symbols. Alphabet A finite set of symbols. An alphabet is often denoted by sigma , yet can be given any name. B = {0, 1} Says B is an alphabet of two symbols, 0 and 1. C = {a, b, c} Says C is an alphabet of three symbols, a, b and c. Sometimes space and comma are in an alphabet while other times they are meta symbols used for descriptions.
3
String / Word A finite sequence of symbols from an alphabet.
01110 and 111 are strings from the alphabet B above. aaabccc and b are strings from the alphabet C above. A null string is a string with no symbols, usually denoted by epsilon . The null string has length zero. The null string is usually denoted epsilon . Vertical bars around a string indicate the length of a string expressed as a natural number. For example |00100| = 5, |aab| = 3, | epsilon | = 0 Power of an alphabet n Represented as 0 = { }, 2 = { 00, 01, 10, 11 } for = { 0, 1 } Concatenation of Strings denoted as x . y
4
Formal Language, also called a Language
A set of strings from an alphabet. The set may be empty, finite or infinite. There are many ways to define a language and there are many classifications for languages. Because a language is a set of strings, the words language and set are often used interchangeably in talking about formal languages. L(M) is the notation for a language defined by a machine M. The machine M accepts a certain set of strings, thus a language. L(G) is the notation for a language defined by a grammar G. The grammar G recognizes a certain set of strings, thus a language. M(L) is the notation for a machine that accepts a language.
5
Formal Language, also called a Language(Contd.,)
The language L is a certain set of strings. G(L) is the notation for a grammar that recognizes a language. The union of two languages is a language. L = L1 L2 The intersection of two languages is a language. L = L1 L2 The complement of a language is a language. L = * - L1 The difference of two languages is a language. L = L1 - L2 Thus a set of strings all of which are chosen from some *, where is a particular alphabet is called a language If is an alphabet and L *, then L is a language over . Example: for = { a, b } { a, aba, aa } is a finite language is an infinite language
6
Formal Language, also called a Language(Contd.,)
Typical examples: L = { x | x is a bit string with two zeros } L = { anbn | n N } L = {1n | n is prime}
7
For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3,
Formal Language, also called a Language(Contd.,) Do not confuse the concatenation of languages with the Cartesian product of sets. For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3, AA = { (0,0), (0,00), (00,0), (00,00) } with |AA|=4
8
M Let L be a language S a machine M recognizes L if
Recognizing Languages Let L be a language S a machine M recognizes L if “accept” if and only if xL xS M “reject” if and only if xL
9
Language Definition Problem
How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures
10
Specifying Formal Languages
Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another
11
Specifying Lexical Structure Using Regular Expressions
Have some alphabet = set of letters Regular expressions are built from: - empty string Any letter from alphabet r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice | r | rr | … Parentheses to indicate grouping/precedence
12
Concept of Regular Expression Generating a String
Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2 r1 2) r1| r2 r2 3) r* rr* 4) r*
13
Nondeterminism in Generation
Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1
14
Concept of Language Generated by Regular Expressions
Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token
15
Examples of Languages and Regular Expressions
= { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers
16
Language Definition Problem
How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures
17
Specifying Formal Languages
Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another
18
Specifying Lexical Structure Using Regular Expressions
Have some alphabet = set of letters Regular expressions are built from: - empty string Any letter from alphabet r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice | r | rr | … Parentheses to indicate grouping/precedence
19
Concept of Regular Expression Generating a String
Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2 r1 2) r1| r2 r2 3) r* rr* 4) r*
20
Nondeterminism in Generation
Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1
21
Concept of Language Generated by Regular Expressions
Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token
22
Examples of Languages and Regular Expressions
= { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers
23
Alternate Abstraction Finite-State Automata
Alphabet Set of states with initial and accept states Transitions between states, labeled with letters (0 | 1)*.(0|1)* . Start state 1 1 Accept state
24
Automaton Accepting String
Conceptually, run string through automaton Have current state and current letter in string Start with start state and first letter in string At each step, match current letter against a transition whose label is same as letter Continue until reach end of string or match fails If end in accept state, automaton accepts string Language of automaton is set of strings it accepts
25
. Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter
26
. Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter
27
. Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter
28
. Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter
29
. Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter
30
. Example 11.0 String is accepted! Current state Start state 1 1
Accept state 11.0 String is accepted! Current letter
31
Finite Automaton The most simple machine that is not just a finite list of words. “Read once”, “no write” procedure. Typical is its limited memory. Think cell-phone, elevator door, etc.
32
A Simple Automaton (0) transition rules states starting state
q1 q2 q3 1 0,1 starting state accepting state
33
A Simple Automaton (1) start accept on input “0110”, the machine goes:
q1 q2 q3 1 0,1 start accept on input “0110”, the machine goes: q1 q1 q2 q2 q3 = “reject”
34
A Simple Automaton (2) on input “101”, the machine goes:
q1 q2 q3 1 0,1 on input “101”, the machine goes: q1 q2 q3 q2 = “accept”
35
A Simple Automaton (3) 010: reject 11: accept 010100100100100: accept
q1 q2 q3 1 0,1 010: reject 11: accept : accept : reject : reject
36
Finite Automaton (def.)
A deterministic finite automaton (DFA) M is defined by a 5-tuple M=(Q,,,q0,F) Q: finite set of states : finite alphabet : transition function :QQ q0Q: start state FQ: set of accepting states
37
M = (Q,,,q,F) q1 q2 q3 1 0,1 states Q = {q1,q2,q3}
0,1 states Q = {q1,q2,q3} alphabet = {0,1} start state q1 accept states F={q2} transition function :
38
Generative Versus Recognition
Regular expressions give you a way to generate all strings in language Automata give you a way to recognize if a specific string is in language Philosophically very different Theoretically equivalent (for regular expressions and automata) Standard approach Use regular expressions when define language Translated automatically into automata for implementation
39
From Regular Expressions to Automata
Construction by structural induction Given an arbitrary regular expression r, Assume we can convert r to an automaton with One start state One accept state Show how to convert all constructors to deliver an automaton with
40
Basic Constructs -Thompson’s construction
Start state Accept state a a
41
Sequence r1r2 r1 r2 Old start state Start state Old accept state
42
Choice r1 r1|r2 r2 Old start state Start state
Old accept state Accept state r1 r1|r2 r2
43
Kleene Star r r* Old start state Start state Old accept state
44
NFA vs. DFA DFA No transitions At most one transition from each state for each letter NFA – neither restriction a a NOT OK OK b a
45
Conversions Our regular expression to automata conversion produces an NFA Would like to have a DFA to make recognition algorithm simpler Can convert from NFA to DFA (but DFA may be exponentially larger than NFA)
46
NFA to DFA Construction
DFA has a state for each subset of states in NFA DFA start state corresponds to set of states reachable by following transitions from NFA start state DFA state is an accept state if an NFA accept state is in its set of NFA states To compute the transition for a given DFA state D and letter a Set S to empty set Find the set N of D’s NFA states For all NFA states n in N Compute set of states N’ that the NFA may be in after matching a Set S to S union N’ If S is nonempty, there is a transition for a from D to the DFA state that has the set S of NFA states Otherwise, there is no transition for a from D
47
NFA to DFA Example for (a|b)*.(a|b)*
a 3 5 . 11 13 1 2 7 8 9 10 15 16 4 6 12 14 b b . a a 5,7,2,3,4,8 . 13,15,10,11,12,16 a a a a 1,2,3,4,8 9,10,11,12,16 b b . b b 6,7,2,3,4,8 14,15,10,11,12,16 b b
48
Regular Languages The language recognized by a finite automaton M is denoted by L(M). A regular language is a language for which there exists a recognizing finite automaton.
49
Two DFA Questions Given the description of a finite automaton M = (Q,,,q,F), what is the language L(M) that it recognizes? In general, what kind of languages can be recognized by finite automata? (What are the regular languages?)
50
Union of Two Languages Theorem 1.12: If A1 and A2 are regular languages, then so is A1 A2. (The regular languages are ‘closed’ under the union operation.) Proof idea: A1 and A2 are regular, hence there are two DFA M1 and M2, with A1=L(M1) and A2=L(M2). Out of these two DFA, we will make a third automaton M3 such that L(M3) = A1 A2.
51
Proof Union-Theorem (1)
M1=(Q1,,1,q1,F1) and M2=(Q2,,2,q2,F2) Define M3 = (Q3,,3,q3,F3) by: Q3 = Q1Q2 = {(r1,r2) | r1Q1 and r2Q2} 3((r1,r2),a) = (1(r1,a), 2(r2,a)) q3 = (q1,q2) F3 = {(r1,r2) | r1F1 or r2F2}
52
Proof Union-Theorem (2)
The automaton M3 = (Q3,,3,q3,F3) runs M1 and M2 in ‘parallel’ on a string w. In the end, the final state (r1,r2) ‘knows’ if wL1 (via r1F1?) and if wL2 (via r2F2?) The accepting states F3 of M3 are such that wL(M3) if and only if wL1 or wL2, for: F3 = {(r1,r2) | r1F1 or r2F2}.
53
Concatenation of L1 and L2
Definition: L1• L2 = { xy | xL1 and yL2 } Example: {a,b} • {0,11} = {a0,a11,b0,b11} Theorem 1.13: If L1 and L2 are regular languages, then so is L1•L2. (The regular languages are ‘closed’ under concatenation.)
54
Proving Concatenation Thm.
Consider the concatenation: {1,01,11,001,011,…} • {0,000,00000,…} (That is: the bit strings that end with a “1”, followed by an odd number of 0’s.) Problem is: given a string w, how does the automaton know where the L1 part stops and the L2 substring starts? We need an M with ‘lucky guesses’.
55
Nondeterminism Nondeterministic machines are capable
of being lucky, no matter how small the probability. A nondeterministic finite automaton has transition rules/possibilities like q2 1 q1 q2 q1 q3 1
56
A Nondeterministic Automaton
0,1 0,1 1 0, 1 q1 q2 q3 q4 This automaton accepts “0110”, because there is a possible path that leads to an accepting state, namely: q1 q1 q2 q3 q4 q4
57
A Nondeterministic Automaton
0,1 0,1 1 0, 1 q1 q2 q3 q4 The string 1 gets rejected: on “1” the automaton can only reach: {q1,q2,q3}.
58
Nondeterminism ~ Parallelism
For any (sub)string w, the nondeterministic automaton can be in a set of possible states. If the final set contains an accepting state, then the automaton accepts the string. “The automaton processes the input in a parallel fashion. Its computational path is no longer a line, but a tree.”
59
Nondeterministic FA (def.)
A nondeterministic finite automaton (NFA) M is defined by a 5-tuple M=(Q,,,q0,F), with Q: finite set of states : finite alphabet : transition function :QP (Q) q0Q: start state FQ: set of accepting states
60
Nondeterministic :QP (Q)
The function :QP (Q) is the crucial difference. It means: “When reading symbol “a” while in state q, one can go to one of the states in (q,a)Q.” The in = {} takes care of the empty string transitions.
61
Recognizing Languages (def)
A nondeterministic FA M = (Q,,,q,F) accepts a string w = w1…wn if and only if we can rewrite w as y1…ym with yi and there is a sequence r0…rm of states in Q such that: 1) r0=q0 2) ri+1 (ri,yi+1) for all i=0,…,m–1 3) rm F
62
Building a Lexical Analyzer
• Token = Pattern • Pattern = Regular Expression • Regular Expression = NFA • NFA = DFA • DFA = Lexical Analyzer
63
Lexical Structure in Languages
Each language typically has several categories of words. In a typical programming language: Keywords (if, while) Arithmetic Operations (+, -, *, /) Integer numbers (1, 2, 45, 67) Floating point numbers (1.0, .2, 3.337) Identifiers (abc, i, j, ab345) Typically have a lexical category for each keyword and/or each category Each lexical category defined by regexp
64
Lexical Categories Example
If Keyword = if While Keyword = while Operator = +|-|*|/ Integer = [0-9] [0-9]* Float = [0-9]*. [0-9]* Identifier = [a-z]([a-z]|[0-9])* Note that [0-9] = (0|1|2|3|4|5|6|7|8|9) [a-z] = (a|b|c|…|y|z) Will use lexical categories in next level
65
Grammar and Automata Correspondence
Regular Grammar Context-Free Grammar Context-Sensitive Grammar Automaton Finite-State Automaton Push-Down Automaton Turing Machine
66
Context-Free Grammars
Grammar Characterization (T,NT,S,P) Finite set of Terminals T Finite set of Nonterminals NT Start Nonterminal S (goal symbol, start symbol) Finite set of Productions P: NT (T | NT)* RHS of production can have any sequence of terminals or nonterminals
67
Push-Down Automata DFA Plus a Stack (S,A,V, F,s0,sF)
Finite set of states S Finite Input Alphabet A, Stack Alphabet V Transition relation F : S (A U{})V S V* Start state s0 Final states sF Each configuration consists of a state, a stack, and remaining input string
68
CFG Versus PDA CFGs and PDAs are of equivalent power
Grammar Implementation Mechanism: Translate CFG to PDA, then use PDA to parse input string Foundation for bottom-up parser generators
69
Context-Sensitive Grammars and Turing Machines
Context-Sensitive Grammars Allow Productions to Use Context P: (T.NT)+ (T.NT)* Turing Machines Have Finite State Control Two-Way Tape Instead of A Stack
70
Exercises (1) Let A = {x,y,z} and B = {x,y}, answer:
Is A a subset of B? Is B a subset of A? What is AB? What is AB? What is AB? What is P (Q)?
71
Exercises (2) Give NFAs with the specified number of states that recognize the following languages over the alphabet ={0,1}: { w | w ends with 00}, three states {0}; two states { w | w contains even number of 0s, or exactly two 1s}, six states {0n | nN }, one state
72
Exercises (3) Proof the following result:
“If L1 and L2 are regular languages, then is a regular language too.” Describe the language that is recognized by this nondeterministic automaton: 1 0,1 1 0, 1 q1 q2 q3 q4
73
Additional Practice Problems
Write the formal descriptions of the sets containing … the numbers 1,10 and 100 … all integers greater than 5 … all natural numbers less than 5 … the string “aba” … the empty string … nothing Give a formal description of this NFA: Give DFA state diagrams for the following languages over ={0,1}: { w | w begins with 1 and ends with 0} { w | w does not contain substring 110} {} all strings except the empty string 1 0, 1 q1 q2 q3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.