Presentation is loading. Please wait.

Presentation is loading. Please wait.

LECTURE NOTES On FINITE AUTOMATA.

Similar presentations


Presentation on theme: "LECTURE NOTES On FINITE AUTOMATA."— Presentation transcript:

1 LECTURE NOTES On FINITE AUTOMATA

2 Mathematical Preliminaries
Symbol A character, glyph, mark. An abstract entity that has no meaning by itself, often called uninterpreted. Letters from various alphabets, digits and special characters are the most commonly used symbols. Alphabet A finite set of symbols. An alphabet is often denoted by sigma , yet can be given any name. B = {0, 1} Says B is an alphabet of two symbols, 0 and 1. C = {a, b, c} Says C is an alphabet of three symbols, a, b and c. Sometimes space and comma are in an alphabet while other times they are meta symbols used for descriptions.

3 String / Word A finite sequence of symbols from an alphabet.
01110 and 111 are strings from the alphabet B above. aaabccc and b are strings from the alphabet C above. A null string is a string with no symbols, usually denoted by epsilon . The null string has length zero. The null string is usually denoted epsilon . Vertical bars around a string indicate the length of a string expressed as a natural number. For example |00100| = 5, |aab| = 3, | epsilon | = 0 Power of an alphabet n Represented as 0 = { }, 2 = { 00, 01, 10, 11 } for  = { 0, 1 } Concatenation of Strings denoted as x . y

4 Formal Language, also called a Language
A set of strings from an alphabet. The set may be empty, finite or infinite. There are many ways to define a language and there are many classifications for languages. Because a language is a set of strings, the words language and set are often used interchangeably in talking about formal languages. L(M) is the notation for a language defined by a machine M. The machine M accepts a certain set of strings, thus a language. L(G) is the notation for a language defined by a grammar G. The grammar G recognizes a certain set of strings, thus a language. M(L) is the notation for a machine that accepts a language.

5 Formal Language, also called a Language(Contd.,)
The language L is a certain set of strings. G(L) is the notation for a grammar that recognizes a language. The union of two languages is a language. L = L1  L2 The intersection of two languages is a language. L = L1  L2 The complement of a language is a language. L = * - L1 The difference of two languages is a language. L = L1 - L2 Thus a set of strings all of which are chosen from some *, where  is a particular alphabet is called a language If  is an alphabet and L  *, then L is a language over . Example: for  = { a, b } { a, aba, aa } is a finite language is an infinite language

6 Formal Language, also called a Language(Contd.,)
Typical examples: L = { x | x is a bit string with two zeros } L = { anbn | n  N } L = {1n | n is prime}

7 For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3,
Formal Language, also called a Language(Contd.,) Do not confuse the concatenation of languages with the Cartesian product of sets. For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3, AA = { (0,0), (0,00), (00,0), (00,00) } with |AA|=4

8 M Let L be a language  S a machine M recognizes L if
Recognizing Languages Let L be a language  S a machine M recognizes L if “accept” if and only if xL xS M “reject” if and only if xL

9 Language Definition Problem
How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures

10 Specifying Formal Languages
Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another

11 Specifying Lexical Structure Using Regular Expressions
Have some alphabet  = set of letters Regular expressions are built from:  - empty string Any letter from alphabet  r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice  | r | rr | … Parentheses to indicate grouping/precedence

12 Concept of Regular Expression Generating a String
Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2  r1 2) r1| r2  r2 3) r* rr* 4) r*  

13 Nondeterminism in Generation
Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1

14 Concept of Language Generated by Regular Expressions
Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token

15 Examples of Languages and Regular Expressions
 = { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros  = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers

16 Language Definition Problem
How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures

17 Specifying Formal Languages
Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another

18 Specifying Lexical Structure Using Regular Expressions
Have some alphabet  = set of letters Regular expressions are built from:  - empty string Any letter from alphabet  r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice  | r | rr | … Parentheses to indicate grouping/precedence

19 Concept of Regular Expression Generating a String
Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2  r1 2) r1| r2  r2 3) r* rr* 4) r*  

20 Nondeterminism in Generation
Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1

21 Concept of Language Generated by Regular Expressions
Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token

22 Examples of Languages and Regular Expressions
 = { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros  = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers

23 Alternate Abstraction Finite-State Automata
Alphabet  Set of states with initial and accept states Transitions between states, labeled with letters (0 | 1)*.(0|1)* . Start state 1 1 Accept state

24 Automaton Accepting String
Conceptually, run string through automaton Have current state and current letter in string Start with start state and first letter in string At each step, match current letter against a transition whose label is same as letter Continue until reach end of string or match fails If end in accept state, automaton accepts string Language of automaton is set of strings it accepts

25 . Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter

26 . Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter

27 . Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter

28 . Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter

29 . Example 11.0 Current state Start state 1 1 Accept state
Accept state 11.0 Current letter

30 . Example 11.0 String is accepted! Current state Start state 1 1
Accept state 11.0 String is accepted! Current letter

31 Finite Automaton The most simple machine that is not just a finite list of words. “Read once”, “no write” procedure. Typical is its limited memory. Think cell-phone, elevator door, etc.

32 A Simple Automaton (0) transition rules states starting state
q1 q2 q3 1 0,1 starting state accepting state

33 A Simple Automaton (1) start accept on input “0110”, the machine goes:
q1 q2 q3 1 0,1 start accept on input “0110”, the machine goes: q1  q1  q2  q2  q3 = “reject”

34 A Simple Automaton (2) on input “101”, the machine goes:
q1 q2 q3 1 0,1 on input “101”, the machine goes: q1  q2  q3  q2 = “accept”

35 A Simple Automaton (3) 010: reject 11: accept 010100100100100: accept
q1 q2 q3 1 0,1 010: reject 11: accept : accept : reject : reject

36 Finite Automaton (def.)
A deterministic finite automaton (DFA) M is defined by a 5-tuple M=(Q,,,q0,F) Q: finite set of states : finite alphabet : transition function :QQ q0Q: start state FQ: set of accepting states

37 M = (Q,,,q,F) q1 q2 q3 1 0,1 states Q = {q1,q2,q3}
0,1 states Q = {q1,q2,q3} alphabet  = {0,1} start state q1 accept states F={q2} transition function :

38 Generative Versus Recognition
Regular expressions give you a way to generate all strings in language Automata give you a way to recognize if a specific string is in language Philosophically very different Theoretically equivalent (for regular expressions and automata) Standard approach Use regular expressions when define language Translated automatically into automata for implementation

39 From Regular Expressions to Automata
Construction by structural induction Given an arbitrary regular expression r, Assume we can convert r to an automaton with One start state One accept state Show how to convert all constructors to deliver an automaton with

40 Basic Constructs -Thompson’s construction
Start state Accept state a a

41 Sequence    r1r2 r1 r2 Old start state Start state Old accept state

42 Choice  r1  r1|r2  r2  Old start state Start state
Old accept state Accept state r1 r1|r2 r2

43 Kleene Star    r r*  Old start state Start state Old accept state

44 NFA vs. DFA DFA No  transitions At most one transition from each state for each letter NFA – neither restriction a a NOT OK OK b a

45 Conversions Our regular expression to automata conversion produces an NFA Would like to have a DFA to make recognition algorithm simpler Can convert from NFA to DFA (but DFA may be exponentially larger than NFA)

46 NFA to DFA Construction
DFA has a state for each subset of states in NFA DFA start state corresponds to set of states reachable by following  transitions from NFA start state DFA state is an accept state if an NFA accept state is in its set of NFA states To compute the transition for a given DFA state D and letter a Set S to empty set Find the set N of D’s NFA states For all NFA states n in N Compute set of states N’ that the NFA may be in after matching a Set S to S union N’ If S is nonempty, there is a transition for a from D to the DFA state that has the set S of NFA states Otherwise, there is no transition for a from D

47 NFA to DFA Example for (a|b)*.(a|b)*
a 3 5 . 11 13 1 2 7 8 9 10 15 16 4 6 12 14 b b . a a 5,7,2,3,4,8 . 13,15,10,11,12,16 a a a a 1,2,3,4,8 9,10,11,12,16 b b . b b 6,7,2,3,4,8 14,15,10,11,12,16 b b

48 Regular Languages The language recognized by a finite automaton M is denoted by L(M). A regular language is a language for which there exists a recognizing finite automaton.

49 Two DFA Questions Given the description of a finite automaton M = (Q,,,q,F), what is the language L(M) that it recognizes? In general, what kind of languages can be recognized by finite automata? (What are the regular languages?)

50 Union of Two Languages Theorem 1.12: If A1 and A2 are regular languages, then so is A1  A2. (The regular languages are ‘closed’ under the union operation.) Proof idea: A1 and A2 are regular, hence there are two DFA M1 and M2, with A1=L(M1) and A2=L(M2). Out of these two DFA, we will make a third automaton M3 such that L(M3) = A1  A2.

51 Proof Union-Theorem (1)
M1=(Q1,,1,q1,F1) and M2=(Q2,,2,q2,F2) Define M3 = (Q3,,3,q3,F3) by: Q3 = Q1Q2 = {(r1,r2) | r1Q1 and r2Q2} 3((r1,r2),a) = (1(r1,a), 2(r2,a)) q3 = (q1,q2) F3 = {(r1,r2) | r1F1 or r2F2}

52 Proof Union-Theorem (2)
The automaton M3 = (Q3,,3,q3,F3) runs M1 and M2 in ‘parallel’ on a string w. In the end, the final state (r1,r2) ‘knows’ if wL1 (via r1F1?) and if wL2 (via r2F2?) The accepting states F3 of M3 are such that wL(M3) if and only if wL1 or wL2, for: F3 = {(r1,r2) | r1F1 or r2F2}.

53 Concatenation of L1 and L2
Definition: L1• L2 = { xy | xL1 and yL2 } Example: {a,b} • {0,11} = {a0,a11,b0,b11} Theorem 1.13: If L1 and L2 are regular languages, then so is L1•L2. (The regular languages are ‘closed’ under concatenation.)

54 Proving Concatenation Thm.
Consider the concatenation: {1,01,11,001,011,…} • {0,000,00000,…} (That is: the bit strings that end with a “1”, followed by an odd number of 0’s.) Problem is: given a string w, how does the automaton know where the L1 part stops and the L2 substring starts? We need an M with ‘lucky guesses’.

55 Nondeterminism Nondeterministic machines are capable
of being lucky, no matter how small the probability. A nondeterministic finite automaton has transition rules/possibilities like q2 1 q1 q2 q1 q3 1

56 A Nondeterministic Automaton
0,1 0,1 1 0,  1 q1 q2 q3 q4 This automaton accepts “0110”, because there is a possible path that leads to an accepting state, namely: q1  q1  q2  q3  q4  q4

57 A Nondeterministic Automaton
0,1 0,1 1 0,  1 q1 q2 q3 q4 The string 1 gets rejected: on “1” the automaton can only reach: {q1,q2,q3}.

58 Nondeterminism ~ Parallelism
For any (sub)string w, the nondeterministic automaton can be in a set of possible states. If the final set contains an accepting state, then the automaton accepts the string. “The automaton processes the input in a parallel fashion. Its computational path is no longer a line, but a tree.”

59 Nondeterministic FA (def.)
A nondeterministic finite automaton (NFA) M is defined by a 5-tuple M=(Q,,,q0,F), with Q: finite set of states : finite alphabet : transition function :QP (Q) q0Q: start state FQ: set of accepting states

60 Nondeterministic :QP (Q)
The function :QP (Q) is the crucial difference. It means: “When reading symbol “a” while in state q, one can go to one of the states in (q,a)Q.” The  in  = {} takes care of the empty string transitions.

61 Recognizing Languages (def)
A nondeterministic FA M = (Q,,,q,F) accepts a string w = w1…wn if and only if we can rewrite w as y1…ym with yi and there is a sequence r0…rm of states in Q such that: 1) r0=q0 2) ri+1  (ri,yi+1) for all i=0,…,m–1 3) rm  F

62 Building a Lexical Analyzer
• Token = Pattern • Pattern = Regular Expression • Regular Expression = NFA • NFA = DFA • DFA = Lexical Analyzer

63 Lexical Structure in Languages
Each language typically has several categories of words. In a typical programming language: Keywords (if, while) Arithmetic Operations (+, -, *, /) Integer numbers (1, 2, 45, 67) Floating point numbers (1.0, .2, 3.337) Identifiers (abc, i, j, ab345) Typically have a lexical category for each keyword and/or each category Each lexical category defined by regexp

64 Lexical Categories Example
If Keyword = if While Keyword = while Operator = +|-|*|/ Integer = [0-9] [0-9]* Float = [0-9]*. [0-9]* Identifier = [a-z]([a-z]|[0-9])* Note that [0-9] = (0|1|2|3|4|5|6|7|8|9) [a-z] = (a|b|c|…|y|z) Will use lexical categories in next level

65 Grammar and Automata Correspondence
Regular Grammar Context-Free Grammar Context-Sensitive Grammar Automaton Finite-State Automaton Push-Down Automaton Turing Machine

66 Context-Free Grammars
Grammar Characterization (T,NT,S,P) Finite set of Terminals T Finite set of Nonterminals NT Start Nonterminal S (goal symbol, start symbol) Finite set of Productions P: NT  (T | NT)* RHS of production can have any sequence of terminals or nonterminals

67 Push-Down Automata DFA Plus a Stack (S,A,V, F,s0,sF)
Finite set of states S Finite Input Alphabet A, Stack Alphabet V Transition relation F : S (A U{})V  S  V* Start state s0 Final states sF Each configuration consists of a state, a stack, and remaining input string

68 CFG Versus PDA CFGs and PDAs are of equivalent power
Grammar Implementation Mechanism: Translate CFG to PDA, then use PDA to parse input string Foundation for bottom-up parser generators

69 Context-Sensitive Grammars and Turing Machines
Context-Sensitive Grammars Allow Productions to Use Context P: (T.NT)+  (T.NT)* Turing Machines Have Finite State Control Two-Way Tape Instead of A Stack

70 Exercises (1) Let A = {x,y,z} and B = {x,y}, answer:
Is A a subset of B? Is B a subset of A? What is AB? What is AB? What is AB? What is P (Q)?

71 Exercises (2) Give NFAs with the specified number of states that recognize the following languages over the alphabet ={0,1}: { w | w ends with 00}, three states {0}; two states { w | w contains even number of 0s, or exactly two 1s}, six states {0n | nN }, one state

72 Exercises (3) Proof the following result:
“If L1 and L2 are regular languages, then is a regular language too.” Describe the language that is recognized by this nondeterministic automaton: 1 0,1 1 0,  1 q1 q2 q3 q4

73 Additional Practice Problems
Write the formal descriptions of the sets containing … the numbers 1,10 and 100 … all integers greater than 5 … all natural numbers less than 5 … the string “aba” … the empty string … nothing Give a formal description of this NFA: Give DFA state diagrams for the following languages over ={0,1}: { w | w begins with 1 and ends with 0} { w | w does not contain substring 110} {} all strings except the empty string 1 0,  1 q1 q2 q3


Download ppt "LECTURE NOTES On FINITE AUTOMATA."

Similar presentations


Ads by Google