Download presentation
1
Regular Languages and Expressions
Surinder Kumar Jain, University of Sydney
2
Regular Languages & Expressions
Automaton DFA NFA Ε-NFA CFG as a DFA Equivalence Minimal DFA Expressions Definition Conversion from/to Automaton Regular Langauges Pumping Lemma – proving regularness Closures
3
Deterministic Finite Automaton
A system with many states Can transition from one state to another Usually caused by external input Set of states is finite System is in one state at any given time
4
DFA Mathematical Definition of a DFA A = (Q, Σ,δ, q0,F)
Q : States, DFA is in one of these finite states at any time. Σ : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. δ : Transition function. Given a state and an input symbols, gives the next DFA state Function over QxΣ -> Q. q0 : Initial DFA state F : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols.
5
DFA Example Q = { waiting, pending, rejected, approved, paid }
Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }
6
Transition Diagrams start receive accept pay reject
Accepted pay Waiting Pending Paid Paid reject Paid Rejected Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }
7
Language Set of alphabets Concatenation (joining) Strings
A subset of strings is a language A DFA defines a language Alphabet set is the set of input symbols Concatenation - one symbol follows another Acceptance – sequence of symbols takes DFA from start state to one of the accepting states
8
Non-deterministic Finite Automaton (DFA)
Five-tuple like a DFA, (Q, Σ,δ, q0,F) Transition function returns a set not one state Several outgoing arcs with same symbol In several states at the same time Language of NFA
9
Equivalence of DFA & NFA
Any NFA language can be described by some DFA Adding non-determinism does not give any thing more Why use NFAs then : Easier to make for some languages May have fewer states and less complex Algorithm to convert NFA to DFA For n state NFA,DFA may have up to 2n states Can throw away inaccessible states Observation : DFA has practically the same number of states as NFA though it often has more transitions
10
NFA to DFA conversion For an NFA, N = {Q, Σ, δ, q0, F},
Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd} Qd = Powerset of Q δd(S, a) = Up in S δ(p,a) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, NFA operates on sets of states. Given a state, NFA gives a set of new states Make all possible sets of DFA states as NFA states Transit from one set of states to a new set of all possible state set Any set with an accepting state is the accepting state in NFA
11
NFA to DFA conversion complexity
O(2n) (number of subsets of a set) Efficient algorithm Do not construct the entire power set Start with start state Only construct subsets that can reach an accepting state from the start state The number of states in DFA is much less than 2n. DFA has practically the same number of states as NFA though it often has more transitions
12
εpsilon - NFA Includes ε (the empty string, not in alphabet set) as a transition ε is identity in concatenation a.ε = ε.a = a for all a Spontaneous transition without an input
13
Equivalence to NFA An ε-NFA language can be described by some NFA
Every NFA can be described by some DFA Adding ε transition does not give any thing more Why use ε-NFAs then : Easier to make for some languages Useful in proving equivalence of languages
14
Conversion to NFA Conversion aims to remove ε transitions
Define a new set of states ε are contained inside the set No ε arc leaves or enters the new set of states Epsilon closure (eclose) For a state, set of all states reachable spontaneously Follow the ε arcs recursively and include reachable states in the epsilon closure
15
epsilon-NFA to DFA conversion
For an ε-NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd} Qd = { eclose(q) | q = eclose(q) and q in Q } δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set Make all eclose sets as DFA states Transit from one set of states to a new set of all eclose state set Any set with an accepting state is the accepting state in NFA
16
Programs as Automatan An imperative program can be represented as a Control Flow Graph (CFG) with statements at nodes and predicates at edges It can be converted into a CFG with both statements and predicates at edges by pushing node statements up incoming edges Such a CFG is a DFA Program points are States Statements are input symbols that change program state from program point to point
17
Regular Expression Algebraic expression to denote languages
Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets The language is generated using rules : L(ε) = empty set L(Ø) = empty set L(a) = a for all alphabets a L(p+q) = L(p) U L(q) L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1
18
Regular Expression Example
a+b.c The language generated is : { a, b.c } a.b.c*.d the language generated is : { a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … } A finite way to express an infinite language
19
Equality of Languages DEFINITION Two regular expression (or automaton)
are EQUAL if they both generate same languages Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)* = (ε + b).(a.b)*.(ε+a)
20
Algebraic laws of regular expressions
p + q = q + p (p + q) + r = p + (q + r) (p.q).r = p.(q.r) Ø + p = p + Ø = p ε.p = p.ε = p Ø.p = p.Ø = Ø p.(q=r) = p.q + p.r (p + q).r = p.r + q.r p + p = p (p*)* = p* Ø* = ε ε* = ε p.p* = p*.p (p + q)* = (p*.q*)*
21
Finite Automaton and Regular Expressions
Every language defined by a finite automaton is also defined by some regular expression defined by a regular expression is also defined by some DFA
22
DFA to Regular expression
Hopcroft’s formula Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1) Rij(n) is the regular expression of all paths from i to j. (n is the number of states) States are sorted in some order and numbered 1 to n Rij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k Computed for all i,j for k=0, then k=1,…,k=n Rs,f1(n)+…+Rs,fk(n) is the regular expression of the DFA s is the start state, f1,…,fk are accepting states, n is the number of states.
23
DFA to RE - complexity Hopcroft formula is O(n34n),
n3 to compute the table and 4n as size of regular expression grows by 4 every time. In practice it is close to O(n3) By simplifying the regular expression at every step and using judicious algorithm avoiding recomputation of Rkk(k) Most DFAs have almost n and not 2n accessible states A faster state elimination method close to O(n2) is also available
24
RE to Automatan conversion
Regular expression is converted to ε-NFA ε-NFA can the be converted to NFA and to DFA RE to ε-NFA conversion rules : ε -> One edge (two state) DFA with ε transition Ø -> Two state DFA with no edges a -> Two state with “a” transition + -> A new start/accept statejoining two arguments of + in parallel > Accept of first is start of second * -> An ε edge joining star/accept of argument and a new start/accept state Convert resulting ε-NFA to a DFA
25
Direct conversion Augment regular expression r to (r).#
Position number for each occurrence of alphabet Compute for each node of syntax tree nullable (ε in the language) firstpos (set of possible first alphabets) lastpos (set of possible last alphabets) Compute for each position followpos (set of possible next alphabet after this position) Construct the DFA
26
Applications Unix text search, search matching patterns (grep)
Lexical/Parser analysis Parse text against a regular expression find set of first tokens at this expression root find set of last tkens at this expression root can the expression at this root be null set find set of next tokens after an alphabet position in a regular expression Efficient search of patterns in very large repository (web text search)
27
Regular Language DEFINITION A language (a set of strings)
is defined to be a regular language if it can be defined by a finite automaton by a DFA or by an NFA or by an ε-NFA or by a regular expression Four different ways to describe a regular language
28
Pumping Lemma If L is a regular language then there exists
integer n such that for every string w in L we can break w into x, y, z such that w=x.y.z y ε |x.y| =< n x.yk.z is in L (for all k >= 0) Proof based on For a DFA of length n any string of length > n must revisit a state Used to prove that a language is not regular
29
Closure property Language is a set of string over finite alphabets
Language operators : Union of two languages L(A B) = L(A) L(B) - re Intersection Concatenation L(A.B) = { a.b | a in A, b in B} Kleene Closure L(A*) = { an | a in A, n >= 0 } a0 = ε for all a and an = an-1 Compliment L(A’) = { a | a not in A } (with respect to some overall alphabet set) - dfa Difference L(A-B) = L(A) – L(B) - dfa switch q0 F Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } Homomorphism – replace an alphabet with another regular expression Inverse homomorphism
30
Decision properties Is the language described empty?
Is a particualr string in the described language? Do two different of languages actually describe the same language?
31
Conversions Decision properties may require conversion between various forms. Can the conversion be done in reasonable time? Conversion Complexity Computing ε closures O(n3) Warshall’s O(n) Subset construction O(2n) NFA to DFA O(n32n) (In practice O(n3s) DFA to NFA conversion O(n) NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less) Regular Expression to εNFA Regular Expression to NFA O(n3) Regular Expression to DFA O(n34n^32^n)
32
Equivalence of automata
Equivalence of two states States p and q in an automaton are Defined to be equivalent if For all input strings applied at state p or q p ends up in an accepting state if and only if q also ends up in an accepting state The accepting state reached by p does not have to be same accepting state as that reached by q
33
Minimization of DFA If two states p and q are equivalent
we can combine them together into a single state it wont affect the language accepted by the DFA This process of combining states together is called Minimization Table-filling algorithm can find if two states are equivalent or not. Complexity O(n2) Non-equivalent pairs are distinguishable
34
MinimuM DFA Minimum DFA is unique Equivalence of two Regular Languages
Eliminate all states not reachable from start Determine which states are equivalent Partition states into blocks of equivalent states Equivalence is transitive Thus no state is in two blocks Equivalence of two Regular Languages Convert them into their minimum DFAs and check for isomorphism Union method Make a minimum DFA of the union of the two Start state of the two original DFAs must be equivalent if and only if DFAs are equivalent
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.