Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.

Slides:



Advertisements
Similar presentations
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Advertisements

CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Deterministic Finite Automata (DFA)
Closure Properties of CFL's
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS5371 Theory of Computation
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Fall 2004COMP 3351 Regular Expressions. Fall 2004COMP 3352 Regular Expressions Regular expressions describe regular languages Example: describes the language.
Great Theoretical Ideas in Computer Science.
1 Programming Languages (CS 550) Scanner and Parser Generators Jeremy R. Johnson.
Languages & Strings String Operations Language Definitions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Pushdown Automata (PDA) Intro
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Regular Expressions Hopcroft, Motawi, Ullman, Chap 3.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
CSCI 2670 Introduction to Theory of Computing September 13.
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Deterministic Finite Automata COMPSCI 102 Lecture 2.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 8 NFA Subset Construction & Epsilon Transitions
using Deterministic Finite Automata & Nondeterministic Finite Automata
Great Theoretical Ideas in Computer Science for Some.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Scribing K SAMPATH KUMAR 11CS10022 scribing. Definition of a Regular Expression R is a regular expression if it is: 1.a for some a in the alphabet ,
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Conversions Regular Expression to FA FA to Regular Expression.
Regular Expressions CS 130: Theory of Computation HMU textbook, Chapter 3.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
Deterministic Finite Automata Nondeterministic Finite Automata.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Lecture 15: Theory of Automata:2014 Finite Automata with Output.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
Chapter 3 Lexical Analysis.
CSE 105 theory of computation
PDAs Accept Context-Free Languages
Language Recognition (12.4)
REGULAR LANGUAGES AND REGULAR GRAMMARS
CS 154, Lecture 3: DFANFA, Regular Expressions.
Compiler Construction
Language Recognition (12.4)
Finite-State Machines with No Output
Presentation transcript:

Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular Expressions (RE) Describing patterns with regular expressions Converting regular expressions to programs Theorems –The languages (Regular Languages) recognized by FSA and generated by RE are the same –There are languages generated by grammars that are not Regular

Problem 1 Using the grammar –  –  ( ) –   Show two different parse trees for the input –()()()

Problem 2 Show the sequence of calls made by the program in Fig on the inputs –(()()) –())(

Problem 3 Consider the following grammar –  |  –  0|1|2|3|4|5|6|7|8|9 Design a recursive-descent parser for this grammar; that is, write a pair of functions, one for and the other for

Problem 4 Find all words that contain all of the vowels in alphabetical order. abstemious adj : sparing in use of food or drink : temperate — abstemiously adv — abstemiousness n (c)2000 Zane Publishing, Inc. and Merriam- Webster, Incorporated. All rights reserved.

Problem 4 Solution 1 S0 a  -a S1 e  -e S2 i  -i S3 o  -o S4 u  -u S5  >  = set of all letters

Problem 4 Solution 2 $ grep '.*a.*e.*i.*o.*u.*' < /usr/dict/words adventitious facetious sacrilegious

Problem 5 Partial Anagram: Find all words that can be made from the letters in Washington/ a, ago, ah, an, angst, …

Problem 5 Grammar   w|   a|   s|  …  o| 

Generating Subsets Let S = {a,b,c} –Review basic notions of set theory (Sec. 7.2 & 7.3) The power set of S, P(S) is the set of all subsets of S – including S and the empty set P(S) = {b,c}, {b}, {c}, {} {a,b,c},{a,b},{a,c},{a},

Recursive Program to Generate P(S) PowerSet(S) if S = {} return {{}}; else S’ = PowerSet(S/First(S)); S = S’; for s in S’ do S = S  (First(S) union s); return S;

Generating Permutations Let S = [a,b,c] The permutations of S are –[a,b,c] –[a,c,b] –[b,a,c] –[b,c,a] –[c,a,b] –[c,b,a]

Recursive Program to Generate Perm(L), L = [a 1,…, a n ] S = Perm(L) if Length(L) = 1 return {L}; else for a in L do S’ = Perm(L/a); // delete a from L for s in S’ do S = S  [a,s];

Alternate Approach Instead of generating all possibilities and checking the result to see if it is a word, check each word to see if it is a partial anagram. To check a word –see if it has the right letters –make sure each letter occurs an allowable number of times

Problem 5 - Solution 1 S = {a,g,h,i,n,o,s,t,w} S0 /n S S1 S0 a  -a S1 a  -a S5  Check Letters Filter Double a’s

Problem 5 - Solution 1, cont. S0 a  S1 a  -a S2  S3 g  -g S4  S18 w  -w S19  g … w

Problem 5 Solution 2 $tr A-Z a-z </usr/dict/words | \ egrep '^[aghinostw]*$' | \ egrep –v \ 'a.*a|g.*g|h.*h|i.*i|n.*n.*n|o.*o|s.*s|t.*t|w.* w' a ago ah an angst

State Machines and Automata Finite set of states, start state, Accepting States Transition from state to state depending on next input The language accepted by a finite automata is the set of input strings that end up in accepting states

Problem 6 Create a finite state automata that accepts strings of a’s and b’s with an even number of a’s. S1 b S0 a b a > abbbabaabbb

Problem 6 Program to implement FSA S1 b S0 a b a > bool EA() { S0: x = getchar(); if (x == ‘b’) goto S0; if (x == ‘a’) goto S1; if (x == ENDM) return true; S1: x = getchar(); if (x == ‘b’) goto S1; if (x == ‘a’) goto S0; if (x == ENDM) return false; }

Regular Expressions In the algebra of regular expressions, an atomic operand is one of the following: –A character – L(x) = {x} –The symbol  – L(  ) = {  } –The symbol  – L(  ) = {} –A variable whose value can be any pattern defined by a regular expression

Regular Expressions There are three operators used to build regular expressions: –Union R|S – L(R|S) = L(R)  L(S) –Concatenation RS – L(RS) = {rs, r  R and s  S} –Closure R* – L(R*) = { ,R,RR,RRR,…}

Regular Expressions a|(ab) (a|(ab))|(c|(bc)) a* a*b* (ab)* a|bc*d letter = a|b|c|…|z|A|B|C|…|Z|_ digit = 0|1|2|3|4|5|6|7|8|9 letter(letter|digit)*

Problem 7 Create a regular expression for the language that consists of strings of a’s and b’s with an even number of a’s. S1 b S0 a b a > b*|(b*ab*a)*

Problem 8 Create a grammar that generates the language that consists of strings of a’s and b’s with an even number of a’s. S1 b S0 a b a >  b  a    b  a

Equivalence of Regular Expressions and Finite Automata The languages accepted by finite automata are equivalent to those generated by regular expressions –Given any regular expression R, there exists a finite state automata M such that L(M) = L(R) – see Problems 9 and 10 for an indication of why this is true. –Given any finite state automata M, there exists a regular expression R such that L(R) = L(M) – see Problem 7 for an indication why this is true.

Proof of Equivalence of Regular Expressions and Finite Automata Sec of the text proves that there is a finite state automata that recognizes the language generated by any given regular expression. The proof is by induction on the number of operators in the regular expression and uses a finite state automata with  transitions. Epsilon transitions are introduced to simplify the construction used in the proof. It is then shown that any finite state automata with  transitions can be converted to a regular finite state automata.

Proof of Equivalence of Regular Expressions and Finite Automata Sec of the text shows how to derive a regular expression that generates the same language that is accepted by a given finite state automata. The basic idea is to combine the transitions in each node along all paths that lead to an accepting state. The combination of the characters along the paths are described using regular expressions. See Problem 7 for an example.

Proof of Equivalence of Regular Expressions and Finite Automata The proofs given in Sections 10.8 and 10.9 are constructive: an algorithm is given that constructs a finite state automata given a regular expression, and an algorithm is given that derives the regular expression given a finite state automata. This means the conversion process can be implemented. In fact, it is commonly the case that regular expressions are used to describe patterns and that a program is created to match the pattern based on the conversion of a regular expression into a finite state automata.

Finite State Automata from Regular Expressions Base case Union Concatenation Closure

Problem 9 Construct a finite state automata with  transitions that accepts the language generated by the regular expression (a|bc) S1  S6>S0 S2 a S3S4S5    bc

Problem 10 Find an equivalent finite state automata to the one in problem 9 that does not use  transitions S1 S5 > a S4 b c The construction provided uses  transitions. S2

Grammars and Regular Expressions Given a regular expression R, there exists a grammar with syntactic category such that L(R) = L( ). There are grammars such that there does NOT exist a regular expression R with L( ) = L(R) –  a b|  –L( ) = {a n b n, n=0,1,2,…}

Proof that a n b n is not Recognized by a Finite State Automata The proof is a proof by contradiction. In this type of proof, we assume that something is true and then show that this leads to a contradiction (something that is false). The only way out of this situation is that the assumption was wrong. This implies that what we assumed true is in fact false. To show that there is no finite state automata that recognizes the language L = {a n b n, n = 0,1,2,…}, we assume that there is a finite state automata M that recognizes L and show that this leads to a contradiction.

Proof that a n b n is not Recognized by a Finite State Automata Since M is a finite state automata it has a finite number of states. Let the number of states = m. Since M recognizes the language L all strings of the form a k b k must end up in accepting states. Choose such a string with k = n which is greater than m. Since n > m there must be a state s that is visited twice while the string a n is read [we can only visit m distinct states and since n > m after reading (m+1) a’s, we must go to a state that was already visited].

Proof that a n b n is not Recognized by a Finite State Automata Suppose that state s is reached after reading the strings a j and a k (j  k). Since the same state is reached for both strings, the finite state machine can not distinguish strings that begin with a j from strings that begin with a k. Therefore, the finite state automata must either accept or reject both of the strings a j b j and a k b j. However, a j b j should be accepted, while a k b j should not be accepted. The only way out of this contradiction is that the assumption that there was a finite state automata that recognizes the language L was wrong.