Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Previous Class Discrete math Induction Recursive definitions Today: The quiz 3 questions 10 points for each 15 minutes to complete the quiz
Discrete Math Quiz 1.Show, using mathematical induction, that for every n N it holds that 1 + ∑ i*(i!) = (n+1)! 2.How many functions f: {0,1} n {0,1} are there? (Hint: There are 2 n strings over {0,1}. Consider all possible sequences of inputs. How many ways of selecting the values of the function are there?) 3.Consider a language L defined as follows 1.ε L 2.If x L then x01 L 3.If x L then 10x L 4.No string is in L unless it follows from rules 1, 2, 3. (a) Describe L in English (b) How many strings of length 4, 5, and 6 are there in L? i=1 n 0! = 1
Regular Languages Language L over Σ is regular if it is built from: {}, {ε} {a} for a Σ using the following operations: Union Concatenation Kleene’s star Examples ({a} {b}) * ({a}{b}) * {a} ({a} {b}) * {a}{a} ({a} {b}) * ({a} {b}) * {b}{b} ({a} {b}) * This notation is awful!
Regular Languages Language L over Σ is regular if it is built from: {}, {ε} {a} for a Σ using the following operations: Union Concatenation Kleene’s star Examples {a,b} * {ab} * {a} {a,b} * {aa}{a,b} * {a,b} * {bb} {a,b} * Better… but could be even simpler We can perform some of the simple concatenations and unions
Notation for Regular Expressions A more convenient notation: Drop set parentheses Use + for union Use · for concatenation (or just drop it) Use (…) * for Kleene’s star Examples (a+b) * (ab) * a (a+b) * aa (a+b) * + (a+b) * bb(a+b) * These notation conventions essentially define so called regular expressions. See the book for a formal definition. We often use two additional notations helpers: If r is a regular expression and i is an integer the we also use r + and r i
Regular Expression Fun! Give regular expressions for the following languages! Language of strings over {a,b} of even length The language of strings over {a,b,c} in which all a’s precede all b’s and all b’s precede all c’s The language of strings over {0,1} of length greater than 3 The language of strings of odd length over {a,b} that contain the substring bb The language of strings over {0,1} that do not contain the substring 000. Now… how about L = {a i b i | i N}. Is L regular? By what regular expression?
Properties of Regular Languages For each language class it is natural to ask about its closure properties: Are regular languages closed under: Union? Intersection? Subtraction? Complementation? Kleene’s star? It is easy to show that regular languages are closed under union and Kleene’s star, but what about the other cases? L 1 L 2 L1L2L1L2 If L 1 and L 2 are both regular then is L 1 L 2 regular?
Machines for Regular Languages? Regular expressions Give a description of the language Do not necessarily give a direct algorithm to recognize a language … so what kinds of algorithms do we need to recognize regular languages? Consider the following languages: Strings ending with 0 Strings whose second to last character is 0 Strings with an even number of 0s and 1s Strings ending in 1 and not containing 00 What algorithms work for them? How do we access the input? How much memory do we need?
Finite Automata Finite automata The most restricted model of computation that we look at Input read once from left to right There is no memory, except for one register that contains the current state. There is a fixed, finite number of state for a given FA State: q Already read Remaining part Can an FA accept the language of strings with an even number of 0s and 1s?
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string , 1
Transition Diagrams How to describe an FA computation? What’s inside an FA? States One state is designated as an initial state Some states are designated as accepting states For every state we know to what other state to move based on the symbol that we scan An FA accepts a given string x 1 x 2 …x n iff if you start in the initial state and follow the transitions from state to state then you end up in an accepting state after reading in the whole string , 1 Example: Accepts:001, 111, 01 Rejects: 00, 010
Finite Automata Examples Give transition diagrams for the following languages L: L is a lanugage of strings that contain at least 3 a’s L is a language of strings that contain aaba as a substring L is a language of strings that do not contain aaba as a substring L is a language of strings with an even number of a’s and an even number of b’s
Formal Definition of Finite Automata An FA is a quintuple M = (Q, Σ, q 0, A, δ) where Q is a finite set of states Σ is an alphabet of input symbols q 0 is the initial state (q 0 Q) A Q is a set of accepting states δ is a total function from Q Σ Q : the transition function (often specified as a table) Exercise: Express the FA’s from the previous slide in terms of this formalism Note: FAs are sometimes called DFAs. (D = deterministic)
Formal Definition of Finite Automata Let M = (Q, Σ, q 0, A, δ) be an FA Is the definition complete? It’s not enough to say what an FA is… … we have to define how it works! The transition function δ(q, a) = q’ the transition function says to what state do we move if in state q we see symbol a We extend δ as follows Let δ * (q, x) = q’ such that … q’ is the state to which the FA goes if it starts in state q and reads string x How can we define δ * formally? Definition. We say that a finite automaton M = (Q, Σ, q 0, A, δ) accepts a string x Σ * iff δ*(q 0, x) A. L(M) = set of strings accepted by M
Kleene’s Theorem Why study finite automata? We have defined finite automata and their languages… But how does that help us in the study of regular languages? Kleene’s theorem! A language L is regular if and only if there is a finite automaton that accepts L Consequences of Kleene’s theorem: We can prove facts about regular languages via studying either regular expressions or finite automata In particular Some closure properties of regular languages are easier to prove based on finite automata! We can hope to write a grep program that matches text against regular expressions!
Closure Properties Revisited Using Kleene’s theorem we can now show that regular languages are closed under Union Intersection Subtraction Complement Proof technique: Given L 1 and L 2 Their corresponding FAs M 1 and M 2 run M 1 and M 2 in parallel and decide string membership based on their states Focus on intersection M 1 = (Q 1, Σ, q 1, A 1, δ 1 ) M 2 = (Q 2, Σ, q 2, A 2, δ 2 ) We construct M such that: L(M) = L(M 1 ) L(M 2 ) M = (Q, Σ, q 0, A, δ), where Q = Q 1 Q 2 q 0 = (q 1, q 2 ) A = {(p,q) | p A 1 and q A 2 } For all p Q 1, q Q 2, a Σ, set δ((p,q),a) = (δ 1 (p,a), δ 2 (q,a))
Proving Languages Not Regular We can use Kleene’s theorem to show that certain languages are not regular So called pumping lemma … but we will here directly use properties of FAs to show a language is not regular Consider L = {a i b i | i N } We can show that L is not regular via showing that no FA can possibly accept L. A proof by contradiction.