CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.

CSC 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 The Chinese University of Hong Kong Regular expressions Fall 2008

Operations on strings Given two strings s = a 1 …a n and t = b 1 …b m, we define their concatenation st = a 1 …a n b 1 …b m We define s n as the concatenation ss…s n times s = abb, t = cbast = abbcba s = 011s 3 = 011011011

Operations on languages The concatenation of languages L 1 and L 2 is Similarly, we write L n for LL…L ( n times) The union of languages L 1  L 2 is the set of all strings that are in L 1 or in L 2 Example: L 1 = {01, 0}, L 2 = { , 1, 11, 111, …}. What is L 1 L 2 and L 1  L 2 ? L 1 L 2 = {st: s  L 1, t  L 2 }

Operations on languages The star (Kleene closure) of L are all strings made up of zero or more chunks from L : –This is always infinite, and always contains  Example: L 1 = {01, 0}, L 2 = { , 1, 11, 111, …}. What is L 1 * and L 2 * ? L * = L 0  L 1  L 2  …

Constructing languages with operations Let’s fix an alphabet, say  = {0, 1} We can construct languages by starting with simple ones, like {0}, {1} and combining them {0}({0}  {1})* all strings that start with 0 ({0}{1}*)  ({1}{0}*) 0(0+1)* 01*+10*

Regular expressions A regular expression over  is an expression formed using the following rules: –The symbol  is a regular expression –The symbol  is a regular expression –For every a  , the symbol a is a regular expression –If R and S are regular expressions, so are RS, R+S and R*. Definition of regular language A language is regular if it is represented by a regular expression

Examples 1.01* = {0, 01, 011, 0111, …..} 2.(01*)(01) = {001, 0101, 01101, 011101, …..} 3.(0+1)* 4.(0+1)*01(0+1)* 5.((0+1)(0+1)+(0+1)(0+1)(0+1))* 6.((0+1)(0+1))*+((0+1)(0+1)(0+1))* 7.(1+01+001)*(  +0+00)

Examples Construct a RE over  = {0,1} that represents –All strings that have two consecutive 0 s. –All strings except those with two consecutive 0 s. –All strings with an even number of 0 s. (0+1)*00(0+1)* (1*01)*1* + (1*01)*1*0 (1*01*01*)*

Main theorem for regular languages Theorem A language is regular if and only if it is the language of some DFA DFA NFA regular expression regular languages

Proof plan For every regular expression, we have to give a DFA for the same language For every DFA, we give a regular expression for the same language  NFA regular expression NFADFA

What is an  NFA? An  NFA is an extension of NFA where some transitions can be labeled by  –Formally, the transition function of an  NFA is a function  : Q × (  {  }) → subsets of Q The automaton is allowed to follow  -transitions without consuming an input symbol

Example of  NFA q0q0 q1q1 q2q2 ,b a a   = {a, b} Which of the following is accepted by this  NFA: –aab, bab, ab, bb, a, 

M2M2 Examples: regular expression →  NFA R 1 = 0 R 2 = 0 + 1 R 3 = (0 + 1)* q0q0 q1q1 0 q0q0 q1q1    q2q2 q3q3 0 q4q4 q5q5 1 q’ 0 q’ 1  M2M2   

General method regular expr  NFA  q0q0  q0q0 symbol a q0q0 q1q1 a RS q0q0 q1q1  MRMR MSMS 

Convention When we draw a box around an  NFA: –The arrow going in points to the start state –The arrow going out represents all transitions going out of accepting states –None of the states inside the box is accepting –The labels of the states inside the box are distinct from all other states in the diagram

General method continued regular expr  NFA R + S q0q0 q1q1  MRMR MSMS   R*R* q0q0 q1q1  MRMR   

Road map  NFA regular expression NFA DFA

Example of  NFA to NFA conversion q0q0 q1q1 q2q2 ,b a a   NFA: Transition table of corresponding NFA: states inputs a b q0q0 q1q1 q2q2 {q 1, q 2 } {q 0, q 1, q 2 }    Accepting states of NFA: {q 0, q 1, q 2 }

Example of  NFA to NFA conversion q0q0 q1q1 q2q2 ,b a a   NFA: NFA: q0q0 q1q1 q2q2 a, b a a a a

General method To convert an  NFA to an NFA: –States stay the same –Start state stays the same –The NFA has a transition from q i to q j labeled a iff the  NFA has a path from q i to q j that contains one transition labeled a and all other transitions labeled  –The accepting states of the NFA are all states that can reach some accepting state of  NFA using only  -transitions

Why the conversion works In the original  -NFA, when given input a 1 a 2 …a n the automaton goes through a sequence of states: q 0  q 1  q 2  …  q m Some  -transitions may be in the sequence: q 0 ...  q i 1 ...  q i 2  …  q i n In the new NFA, each sequence of states of the form: q i k ...  q i k+1 will be represented by a single transition q i k  q i k+1 because of the way we construct the NFA.  a1a1 a2a2  a k+1

Proof that the conversion works More formally, we have the following invariant for any k ≥ 1 : We prove this by induction on k When k = 0, the  NFA can be in more states, while the NFA must be in q 0 After reading k input symbols, the set of states that the  NFA and NFA can be in are exactly the same

Proof that the conversion works When k ≥ 1 (input is not the empty string) –If  NFA is in an accepting state, so is NFA –Conversely, if NFA is an accepting state q i, then some accepting state of  NFA is reachable from q i, so  NFA accepts also When k = 0 (input is the empty string) –The  NFA accepts iff one of its accepting states is reachable from q 0 –This is true iff q 0 is an accepting state of the NFA

From DFA to regular expressions  NFA regular expression NFA DFA

Example Construct a regular expression for this DFA: 1 1 0 0 q1q1 q2q2 (0 + 1)*0 + 

General method We have a DFA M with states q 1, q 2,… q n We will inductively define regular expressions R ij k R ij k will be the set of all strings that take M from q i to q j with intermediate states going through q 1, q 2,… or q k only.

Example 1 1 0 0 q1q1 q2q2 R 11 0 = { , 0} =  + 0 R 12 0 = {1} = 1 R 22 0 = { , 1} =  + 1 R 11 1 = { , 0, 00, 000,...}= 0* R 12 1 = {1, 01, 001, 0001,...}= 0*1

General construction We inductively define R ij k as: R ii 0 = a i 1 + a i 2 + … + a i t +  (all loops around q i and  ) (all q i → q j ) R ij k = R ij k-1 + R ik k-1 (R kk k-1 )*R kj k-1 a path in M qiqi qkqk qjqj R ij 0 = a i 1 + a i 2 + … + a i t if i ≠ j a i 1,a i 2,…,a i t qiqi qiqi qjqj (for k > 0 )

Informal proof of correctness Each execution of the DFA using states q 1, q 2,… q k will look like this: q i → … → q k → … → q k → … → q k → … → q j intermediate parts use only states q 1, q 2,… q k-1 R ik k-1 (R kk k-1 )* R kj k-1 R ij k-1 + state q k is never visited or

Final step Suppose the DFA start state is q 1, and the accepting states are F = {q j 1  q j 2 …  q j t } Then the regular expression for this DFA is R 1j 1 n + R 1j 2 n + ….. + R 1j t n

All models are equivalent  NFA regular expression NFA DFA A language is regular iff it is accepted by a DFA, NFA,  NFA, or regular expression

Example Give a RE for the following DFA using this method: 1 1 0 0 q0q0 q1q1

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.

Similar presentations

Presentation on theme: "CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.

Similar presentations

Presentation on theme: "CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular."— Presentation transcript:

Similar presentations

About project

Feedback