79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.

Slides:



Advertisements
Similar presentations
Regular Grammars Formal definition of a regular expression.
Advertisements

1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
CS5371 Theory of Computation
127 The Chomsky Hierarchy(review) Recursively Enumerable Sets Turing Machines Post System Markov Algorithms,  -recursive Functions Regular Expression.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
1 1. Show the result of each of the following set operations in terms of set property. Write your sets as simple as possible. (a) L 0  L 4 (b) L 0  L.
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Normal forms for Context-Free Grammars
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Finite-State Machines with No Output
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Lecture 03: Theory of Automata:08 Finite Automata.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Theory of Languages and Automata
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
1 Homework #7 (Models of Computation, Spring, 2001) Due: Section 1; April 16 (Monday) Section 2; April 17 (Tuesday) 2. Covert the following context-free.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture 07: Formal Methods in SE Finite Automata Lecture # 07 Qaisar Javaid Assistant Professor.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 1. Let A ={r, p, i } and B = { w, o, r, l, d }. What does each of the following (a), (b) and (c) denote? Briefly explain in plain English. (a) A * B.
Lecture # 12. Nondeterministic Finite Automaton (NFA) Definition: An NFA is a TG with a unique start state and a property of having single letter as label.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
1 1. Eliminate all  -transitions from the following FA without changing the number of states and the language accepted by the automaton. You should also.
Lecture 09: Theory of Automata:2014 Asif NawazUIIT, PMAS-Arid Agriclture University Rawalpindi. Kleene’s Theorem and NFA.
L ECTURE 3 T HEORY OF AUTOMATA. E QUIVALENT R EGULAR E XPRESSIONS Definition Two regular expressions are said to be equivalent if they generate the same.
1 Key to Homework #3 (Models of Computation, Spring, 2001) Finite state control with a 4-way read/write head Figure (a)Figure (b) (over)
Kleene’s Theorem and NFA
Context-Free Grammars: an overview
Regular Expressions.
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Complexity and Computability Theory I
Language Recognition (12.4)
REGULAR LANGUAGES AND REGULAR GRAMMARS
A HIERARCHY OF FORMAL LANGUAGES AND AUTOMATA
CHAPTER 2 Context-Free Languages
Kleene’s Theorem Muhammad Arif 12/6/2018.
Midterm (Models of Computation, Fall, 2000)
Language Recognition (12.4)
Key Answers for Homework #7
Presentation transcript:

79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression. (2)  is a regular expression and denotes the set {  }. (3) For every a  , a is a regular expression and denotes the set {a}. (4) If r and s are regular expressions that denote the sets R and S, respectively, then (r + s), ( rs ), and ( r * ) are regular expressions that denote, respectively, the sets R  S, RS, and R *. We may omit parentheses from a regular expression if it expresses the same set under the assumption that the star * has higher precedence than concatenation or +, and that concatenation has higher precedence than +. For a regular expression r, by L(r) we denote the set of strings that is expressed by regular expression r. While studying formal languages, we often expresse the languages in terms of set notation, like {a i b i | i > 0}. Such set notation is practical only when the language property is simple enough to describe. However, all regular languages can be expressed succinctly in terms of a regular expression, which is defined as follow.

80 Regular expression (cont’ed) For example regular language {a i b j | i, j  0} can be expressed in regular expression a * b *, and regular language {xaaybbz | x, y, z  {a, b} * } can be expressed as (a+b) * aa(a+b) * bb(a+b) *. We can easily prove that a * b * is a regular expression according to the definition; since a and b is regular expressions, respectively, denoting the sets {a} and {b}, by definition part (4) expressions a * and b * are regular expressions, which denote, respectively, the sets {a} * and {b} *. Since a * and b * are regular expression, the concatenation a * b * is also a regular expression by definition part (4), which denotes the set {a} * {b} *, which is equivalent to {a i b j | i, j  0}. By similar argument we can show that (a+b) * aa(a+b) * bb(a+b) * is a regular expression which denotes the regular language above. Later we will see that every regular language can be expressed in a regular expression, and if a language is expressible in a regular expression, then that language is regular.

81 Chomsky Hierarchy of Languages and Related Models We have studied four types of formal grammars and their languages, and four different computational models that recognize the languages, together with other related models, such as L-systems, syntax flow graph, and regular expressions. Now we will study more closely about their relationships. The table on the next page summarizes the relationship among those models. This relationship, called the Chomsky hierarchy (after Noam Chomsky, who defined the classes of languages) is one of the most significant achievement in computer science. In the table the vertical relationship  denotes proper containment and the horizontal relationship  denotes the characterizations. For example, the class of context-free languages properly contains regular languages, finite state machines can only recognize regular languages, and the languages recognized by finite state machines can be expressed by regular expressions. Many powerful models have been introduced (for example, the ones shown at upper right corner), which turned out to be computationally equivalent to the Turing machines and their languages, also called recursively enumerable sets.

82 The Chomsky Hierarchy Recursively Enumerable Sets (type 0) Turing Machines Post System, Markov Algorithms,  -recursive Functions Regular Expression Context-sensitive Languages(type 1) Context-free Languages(type 2) Regular Languages(type3) Linear-bounded Automata Pushdown Automata Finite State Automata Languages (grammars) MachinesOther Models

83 Characterization Theorem among Regular Grammars, FA’s and Regular Expressions We only prove the characterization (i.e., horizontal relationship) at the level of regular languages, and later prove the vertical relations for the lower two levels only. Theorem. (1) A language L is regular if and only if it is accepted by an FA M. (2) A language L can be expressible in terms of a regular expression if and only if L is accepted by an FA M. Proof of (1-a): If L is regular, then there is an FA M which accepts L. We construct an FA M with any regular grammar G whose language is L. Without loss of generality, assume G has production rules of the form A  xB or A  x, where x is  or a single terminal symbol, i.e., |x| = 1. Otherwise, we can easily convert the rules into these restricted forms without affecting the language of the grammar. For example, if there is rule A  abbB in a grammar, this rule can be converted to a set of rules as follows without changing the language, where B i are new non-terminal symbols. A  aB 1 B 1  bB 2 B 2  bB A  abbB is equivalent to

84 Suppose the grammar is given as G = (V T, V N, P, S), We construct an FA M from G using the rules shown below. Let A, B  V N and a  V T  {  }. Proof of Characterization Theorem(cont’ed) a a a AB A F F is a new accepting state A A is an accepting state Let A be the start state A start We can prove that L(G) = L(M), i.e., the language accepted by M is exactly the language generated by the grammar G. A  aB | aA A  a A   If A is the start symbol For each production rule the following type Construct a state transition in M as follows: 

85 a AB b A  bB | aA Define A as the start symbol. A A A   Production rule of GState transition of M start Proof of (1-b): If L is the language accepted by an FA M, then there is a regular G which generates L. Let M = ( Q, , , q 0, F ). Construct a regular grammar G from M according to the rules shown blow, where A, B  Q and a, b   {  }. Proof of Characterization Theorem(cont’ed) 

86 Characterization Theorem(examples) Example 1. (Regular grammar  FA): S  aS | bbcB B  bA | a A  aS | bB |  S A B a b b c b b a a  b a a b b a a a c Example 2. (FA  regular grammar): Name the states c b A C a a b b a a a E S B D S  aS | aA A  bB B  bB | bS | aD |  D  aC C  aB | cE E   Transform to grammar

87 Proof of Characterization Theorem(cont’ed) Going along the definition of regular expression, we show how to construct an FA for a given regular expression. (This is proof by induction.) Assume that the alphabet is . 1. If the regular expression is , , or a  , which respectively denote the empty set, {  }, and {a}. Then for each case we construct the following FA. 2. Suppose that for regular expressions r 1 and r 2, we have constructed FA M 1 And M 2, which recognize the language expressed by r 1 and r 2, respectively. Then we can construct FA M 1+2, M 12, and M 1 * which respectively recognize the languages expressed by regular expressions r 1 + r 2, r 1 r 2, and (r 1 ) *, as follows: start a   a Proof of (2)-(a): If a language L can be expressible in terms of a regular expression, then L is accepted by an FA M.

88 Proof of Characterization Theorem(cont’ed) M2M2 M1M1   M 1+2 If L(M 1 ) = L( r 1 ) and L(M 2 ) = L(r 2 ), then L(M 1+2 ) = L( r 1 + r 2 ), L(M 12 ) = L(r 1 r 2 ), and L(M 1 * ) = L((r 1 ) * ). New start M1 M1 M2M2  M 12 start M1M1     M1*M1* New start

89 Proof of Characterization Theorem(cont’ed) Definition: Generalized state transition graph. For all strings expressed by a regular expression r, if an FA M takes transition from a state p to a state q, we write  (p, r) = q, and draw state transition as the following Figure (a) shows. Figure (b) is an example. p q r p q (ab+c) * Figure (a)Figure (b) The state transition graphs of M can be considered as a generalized state transition graphs of special case, where each edge label has a regular expression expressing one string of length 1 or zero (for the case of  transition). By further generalizing , for a path label w = r 1 r 2 …r i (i.e., a concatenated sequence of regular expressions), let  (p, w) = q denotes the sequence of transitions along a path with labels of regular expressions r 1, r 2, …, r i. Proof of (2)-(b): If L is a language L accepted by an FA M, then L can be expressible in terms of a regular expression.

90 For a generalized state transition graph G, let L(G) be the set of strings defined as follows, where q 0 is the start state and F is the set of accepting states. Clearly L(G) = L(M). L(G) = {x | x  L(w), w is a path label such that  (q 0, w) = q f  F } Given a generalized state transition graph G of an FA, we can eliminate a state from G, and transform it to another generalized state transition graph G' such that L(G) = L(G'). Suppose that q is a non-accepting state in a state transition graph G. Suppose q has a self-loop, and is on a path between its two neighboring states r and s as shown in figure (a) below. (Dotted arrows indicate other possible transitions.) State q can be eliminated and generalized transitions can be added without changing the language of the automaton as figure (b) shows. af * b af * c df * c df * b r s r q s a b c d f (a) G (b) G'

91 Now, we give an example for transforming a state transition graph G into a regular expression using the above technique. Consider an FA whose state transition graph is shown in figure (a) below. Clearly, if an automaton has k  1 Accepting states, then the language of the automaton is the union of the languages accepted by k accepting states. So we compute a regular expression r i for the language L i accepted by each of the k accepting state, and find the regular expression for the language of the automaton; r = r 1 + r r k For example, the language accepted by the automaton shown below is the union of the languages accepted by state 0 and 1. a start b a a b a  b b b b  (a) ba b (b) b a start b a a b b  b 0

92 For this example, we first compute the regular expression for the language accepted by state 4 by changing state 0 to non-accepting state. Leaving the start state and the accepting state, we eliminate all other states, one at a time. Eliminating state 2 will give the generalized state transition graph shown in (b). We could eliminated state 1 or 3 first. In general it is better to choose a state which does not induce too many new links. Before eliminating state 3, we merge links which have the same origin and destination using the + operator, and get figure (c) below. a ba start b a a b b b  (b) ba b b start ba+a b a a b b b (c) ba b+  b

93 Eliminating state 3 gives the graph shown in figure (d), and start ba+a b a a b b b (c) ba b+  b start ba+a b a b b (d) b+  bba ba bb

94 start ba+a b a b+bb b (d) b+  bba ba Finally eliminating state 1 we get the graph in figure (e). Notice that regular expression b+bb on the self-loop of state 4 has been simplified to b, because looping on b or bb is equivalent to looping on b. 4b ba a(ba+a) * b start 0 b a(ba+a) * (b+  ) bba(ba+a) * (b+  ) bba(ba+a) * b (e)

95 4b ba a(ba+a) * b start 0 b a(ba+a) * (b+  ) bba(ba+a) * (b+  ) bba(ba+a) * b (e) By merging edges which have the same origin and destination, we get the final transition graph (f), from which we can construct a regular expression r 4 whose language is exactly the language accepted by state 4. 4 a(ba+a) * b+b start 0 a(ba+a) * (b+  ) bba(ba+a) * (b+  )+b bba(ba+a) * b+ba (f)

96 4 a(ba+a) * b+b start 0 a(ba+a) * (b+  ) bba(ba+a) * (b+  )+b bba(ba+a) * b+ba (f) In general suppose a generalized transition graph with the start state and an accepting state is given with each edge labeled with a regular expression as shown in figure (g) below. Then regular expression r 2 shown in the figure expresses the language accepted by the automaton. 2 1 start r 11 r 22 r 12 r 21 r 2 = (r 11 ) * r 12 (r 22 + r 21 (r 21 ) * r 12 ) * (g) By substituting r ij in the expression in figure (g) with corresponding regular expression from figure (f), we get the regular expression r 4 for the language accepted by state 4.

97 Now to construct a regular expression for the language accepted by the other accepting state, which is the start state, we can start with figure (f) by changing the start state back to accepting state and state 4 to non-accepting state as shown in figure (h). This is the general case as shown in figure (i) whose regular expression can be given as r 1 in the figure. Substituting corresponding regular expressions from figure (h), we get a regular expression r 0 which denotes the language accepted by state 0. Finally we get a regular expression r = r 0 + r 4 which denotes the language accepted by automaton M. 0 4 a(ba+a) * b+b start a(ba+a) * (b+  ) bba(ba+a) * (b+  )+b bba(ba+a) * b+ba (h) 1 2 start r 11 r 22 r 12 r 21 r 1 = (r 11 + r 12 (r 22 ) * r 21 ) * (i)