Chapter Seven: Regular Expressions

Slides:



Advertisements
Similar presentations
Chapter Three: Closure Properties for Regular Languages
Advertisements

1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.
CSCI 2670 Introduction to Theory of Computing September 1, 2005.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Lecture 5 Sept 06, 2012 Regular expressions – examples Converting DFA to regular expression. (same works for NFA to r.e. conversion.) Converting R.E. to.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Lesson No.6 Naveen Z Quazilbash. Overview Attendance and lesson plan sharing Assignments Quiz (10 mins.). Some basic ideas about this course Regular Expressions.
January 9, 2015CS21 Lecture 31 CS21 Decidability and Tractability Lecture 3 January 9, 2015.
1 Regular Expressions Reading: Chapter 3. 2 Regular Expressions vs. Finite Automata Offers a declarative way to express the pattern of any string we want.
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
CSCI 2670 Introduction to Theory of Computing September 11, 2007.
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Theory of Computation Automata Theory Dr. Ayman Srour.
Cpt S 317: Spring 2009 Reading: Chapter 3
3. Regular Expressions and Languages
Closure Properties of Regular Languages
PROPERTIES OF REGULAR LANGUAGES
Regular Expressions.
CSE 105 theory of computation
Formal Language & Automata Theory
Regular Expressions Sections:1.3 page 63 September 17, 2008
Week 14 - Friday CS221.
Closure Properties of Regular Languages
Single Final State for NFA
REGULAR LANGUAGES AND REGULAR GRAMMARS
Hierarchy of languages
Chapter Thirteen: Stack Machines
CS 154, Lecture 3: DFANFA, Regular Expressions.
Definitions Equivalence to Finite Automata
Chapter Nine: Advanced Topics in Regular Languages
Introduction to Finite Automata
Finite Automata Reading: Chapter 2.
Definitions Equivalence to Finite Automata
Regular Expressions.
Chapter Five: Nondeterministic Finite Automata
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Closure Properties of Regular Languages
Closure Properties of Regular Languages
Regular Languages ภาษาปกติ.
CS21 Decidability and Tractability
Definitions Equivalence to Finite Automata
Convert to a DFA: Start state: Final States: P Symbol Q E(Q) a b.
Chapter 1 Introduction to the Theory of Computation
CSE 105 theory of computation
Chapter 1 Regular Language
CSCI 2670 Introduction to Theory of Computing
CHAPTER 1 Regular Languages
Introduction and Chapter One: Fundamentals
COSC 3340: Introduction to Theory of Computation
COSC 3340: Introduction to Theory of Computation
CSE 105 theory of computation
Presentation transcript:

Chapter Seven: Regular Expressions Formal Language, chapter 7, slide 1

The first time a young student sees the mathematical constant π, it looks like just one more school artifact: one more arbitrary symbol whose definition to memorize for the next test. Later, if he or she persists, this perception changes. In many branches of mathematics and with many practical applications, π keeps on turning up. "There it is again!" says the student, thus joining the ranks of mathematicians for whom mathematics seems less like an artifact invented and more like a natural phenomenon discovered. So it is with regular languages. We have seen that DFAs and NFAs have equal definitional power. It turns out that regular expressions also have exactly that same definitional power: they can be used to define all the regular languages, and only the regular languages. There it is again! Formal Language, chapter 7, slide 2

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 3

Concatenation of Languages The concatenation of two languages L1 and L2 is L1L2 = {xy | x ∈ L1 and y ∈ L2} The set of all strings that can be constructed by concatenating a string from the first language with a string from the second For example, if L1 = {a, b} and L2 = {c, d} then L1L2 = {ac, ad, bc, bd} Formal Language, chapter 7, slide 4

Kleene Closure of a Language The Kleene closure of a language L is L* = {x1x2 ... xn | n ≥ 0, with all xi ∈ L} The set of strings that can be formed by concatenating any number of strings, each of which is an element of L Not the same as {xn | n ≥ 0 and x ∈ L} In L*, each xi may be a different element of L For example, {ab, cd}* = {ε, ab, cd, abab, abcd, cdab, cdcd, ababab, ...} For all L, ε ∈ L* For all L containing at least one string other than ε, L* is infinite Formal Language, chapter 7, slide 5

Regular Expressions A regular expression is a string r that denotes a language L(r) over some alphabet Σ Regular expressions make special use of the symbols ε, ∅, +, *, and parentheses We will assume that these special symbols are not included in Σ There are six kinds of regular expressions… Formal Language, chapter 7, slide 6

The Six Regular Expressions The six kinds of regular expressions, and the languages they denote, are: Three kinds of atomic regular expressions: Any symbol a ∈ Σ, with L(a) = {a} The special symbol ε, with L(ε) = {ε} The special symbol ∅, with L(∅) = {} Three kinds of compound regular expressions built from smaller regular expressions, here called r, r1, and r2: (r1 + r2), with L(r1 + r2) = L(r1) ∪ L(r2) (r1r2), with L(r1r2) = L(r1)L(r2) (r)*, with L((r)*) = (L(r))* The parentheses may be omitted, in which case * has highest precedence and + has lowest Formal Language, chapter 7, slide 7

Other Uses of the Name These are classical regular expressions Many modern programs use text patterns also called regular expressions: Tools like awk, sed and grep Languages like Perl, Python, Ruby, and PHP Language libraries like those for Java and the .NET languages All slightly different from ours and each other More about them in a later chapter Formal Language, chapter 7, slide 8

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 9

ab Denotes the language {ab} Our formal definition permits this because a is an atomic regular expression denoting {a} b is an atomic regular expression denoting {b} Their concatenation (ab) is a compound Unnecessary parentheses can be omitted Thus any string x in Σ* can be used by itself as a regular expression, denoting {x} Formal Language, chapter 7, slide 10

ab+c Denotes the language {ab,c} We omitted parentheses from the fully parenthesized form ((ab)+c) The inner pair is unnecessary because + has lower precedence than concatenation Thus any finite language can be defined using a regular expression Just list the strings, separated by + Formal Language, chapter 7, slide 11

ba* Denotes the language {ban}: the set of strings consisting of b followed by zero or more as Not the same as (ba)*, which denotes {(ba)n} * has higher precedence than concatenation The Kleene star is the only way to define an infinite language using regular expressions Formal Language, chapter 7, slide 12

(a+b)* Denotes {a,b}*: the whole language of strings over the alphabet {a,b} The parentheses are necessary here, because * has higher precedence than + a+b* denotes {a} ∪ {b}* Reminder: not "zero or more copies…" That would be a*+b*, which denotes {a}* ∪ {b}* Formal Language, chapter 7, slide 13

ab+ε Denotes the language {ab,ε} Occasionally, we need to use the atomic regular expression ε to include ε in the language But it's not needed in (a+b)*+ε, because ε is already part of every Kleene star Formal Language, chapter 7, slide 14

∅ Denotes {} There is no other way to denote the empty set with regular expressions That's all you should ever use ∅ for It is not useful in compounds: L(r∅) = L(∅r) = {} L(r+∅) = L(∅+r) = L(r) L(∅*) = {ε} Formal Language, chapter 7, slide 15

More Examples (a+b)(c+d) (abc)* a*b* Denotes {ac, ad, bc, bd} Denotes {(abc)n} = {ε, abc, abcabc, …} a*b* Denotes {anbm} = {xy | x ∈ {a}* and y ∈ {b}*} Formal Language, chapter 7, slide 16

More Examples (a+b)*aa(a+b)* (a+b)*a(a+b)*a(a+b)* (a*b*)* Denotes {x ∈ {a,b}* | x contains at least 2 consecutive as} (a+b)*a(a+b)*a(a+b)* Denotes {x ∈ {a,b}* | x contains at least 2 as} (a*b*)* Denotes {a,b}*, same as the simpler (a+b)* Because L(a*b*) contains both a and b, and that's enough: we already have L((a+b)*) = {a,b}* In general, whenever Σ ⊆ L(r), then L((r)*) = Σ* Formal Language, chapter 7, slide 17

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 18

Regular Expression to NFA Goal: to show that every regular expression defines a regular language Approach: give a way to convert any regular expression to an NFA for the same language Advantage: large NFAs can be composed from smaller ones using ε-transitions Formal Language, chapter 7, slide 19

Standard Form To make them easier to compose, our NFAs will all have the same standard form: Exactly one accepting state, not the start state That is, for any regular expression r, we will show how to construct an NFA N with L(N) = L(r), pictured like this: Formal Language, chapter 7, slide 20

Composing Example That form makes composition easy For example, given NFAs for L(r1) and L(r2), we can easily construct one for L(r1+r2): This new NFA still has our special form Formal Language, chapter 7, slide 21

Lemma 7.3 If r is any regular expression, there is some NFA N that has a single accepting state, not the same as the start state, with L(N) = L(r). Proof sketch: There are six kinds of regular expressions We will show how to build a suitable NFA for each kind Formal Language, chapter 7, slide 22

Proof Sketch: Atomic Expressions There are three kinds of atomic regular expressions Any symbol a ∈ Σ, with L(a) = {a} The special symbol ε, with L(ε) = {ε} The special symbol ∅, with L(∅) = {} Formal Language, chapter 7, slide 23

Proof: Compound Expressions There are three kinds of compound regular expressions: (r1 + r2), with L(r1 + r2) = L(r1) ∪ L(r2) Formal Language, chapter 7, slide 24

(r1r2), with L(r1r2) = L(r1) L(r2) (r1)*, with L((r1)*) = (L(r1))* Formal Language, chapter 7, slide 25

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 26

Sketchy Proof That proof left out a number of details To make it more rigorous, we would have to Give the 5-tuple form for each NFA Show that it each NFA accepts the right language More fundamentally, we would have to organize the proof as an induction: a structural induction Formal Language, chapter 7, slide 27

Structural Induction Induction on a recursively-defined structure Here: the structure of regular expressions Base cases: the bases of the recursive definition Here: the atomic regular expressions Inductive cases: the recursive cases of the definition Here: the compound regular expressions Inductive hypothesis: the assumption that the proof has been done for structurally simpler cases Here: for a compound regular expression r, the assumption that the proof has been done for r's subexpressions Formal Language, chapter 7, slide 28

Lemma 7.3, Proof Outline Proof is by induction on the structure of r Base cases: when r is an atomic expression, it has one of these three forms: For each, give NFA N and show L(N) correct Recursive cases: when r is a compound expression, it has one of these three forms: For each, give NFA N, using the NFAs for r's subexpressions as guaranteed by the inductive hypothesis, and show L(N) correct QED Formal Language, chapter 7, slide 29

Outline 7.1 Regular Expressions, Formally Defined 7.2 Regular Expression Examples 7.3 For Every Regular Expression, a Regular Language 7.4 Regular Expressions and Structural Induction 7.5 For Every Regular Language, a Regular Expression Formal Language, chapter 7, slide 30

NFA to Regular Expression There is a way to take any NFA and construct a regular expression for the same language Lemma 7.5: if N is any NFA, there is some regular expression r with L(r) = L(N) A tricky construction, covered in Appendix A For now, just an example of the construction Formal Language, chapter 7, slide 31

Recall this NFA (which is also a DFA) from chapter 3 L(M) = the set of strings that are binary representation of numbers divisible by 3 We'll construct an equivalent regular expression Not as hard as it looks Ultimately, we want the set of strings that take it from 0 to 0, passing through any of the other states But we'll start with some easy pieces Formal Language, chapter 7, slide 32

What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through 0 or 1? Formal Language, chapter 7, slide 33

What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through 0 or 1? Easy: 1* Formal Language, chapter 7, slide 34

What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through 0 or 1? Easy: 1* Then what is a regular expression for the language of strings that take it from 1 back to 1, any number of times, without passing through 0? Formal Language, chapter 7, slide 35

What is a regular expression for the language of strings that take it from 2 back to 2, any number of times, without passing through 0 or 1? Easy: 1* Then what is a regular expression for the language of strings that take it from 1 back to 1, any number of times, without passing through 0? That would be (01*0)*: Go to 2 (the first 0) Go from 2 to 2 any number of times (we already got 1* for that) Go back to 1 (the last 0) Repeat any number of times (the outer (..)*) Formal Language, chapter 7, slide 36

Then what is a regular expression for the language of strings that take it from 1 back to 1, any number of times, without passing through 0? That would be (01*0)* Then what is a regular expression for the language of strings that take it from 0 back to 0, any number of times? Formal Language, chapter 7, slide 37

Then what is a regular expression for the language of strings that take it from 1 back to 1, any number of times, without passing through 0? That would be (01*0)* Then what is a regular expression for the language of string that take it from 0 back to 0, any number of times? That would be (0 + 1(01*0)*1)*: One way to go from 0 to 0 once is with a 0 Another is with a 1, then (01*0)*, then a final 1 That makes 0 + 1(01*0)*1 Repeat any number of times (the outer (..)*) Formal Language, chapter 7, slide 38

So the regular expression is (0 + 1(01*0)*1)* The full construction in Appendix A uses a similar approach, and works on any NFA It defines the regular expression in terms of smaller regular expressions that correspond to restricted paths through the NFA Putting Lemmas 7.3 and 7.5 together, we have... Formal Language, chapter 7, slide 39

Theorem 7.5 (Kleene's Theorem) A language is regular if and only if it is L(r) for some regular expression r. Proof: follows from Lemmas 7.3 and 7.5 This makes our third way of defining the regular languages: By DFA By NFA By regular expression These three have equal power for defining languages Formal Language, chapter 7, slide 40