Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Slides:



Advertisements
Similar presentations
Grammar types There are 4 types of grammars according to the types of rules: – General grammars – Context Sensitive grammars – Context Free grammars –
Advertisements

CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Theory Of Automata By Dr. MM Alam
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
1 Non-Deterministic Automata Regular Expressions.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Fall 2004COMP 3351 Another NFA Example. Fall 2004COMP 3352 Language accepted (redundant state)
Costas Busch - LSU1 Non-Deterministic Finite Automata.
1 A Single Final State for Finite Accepters. 2 Observation Any Finite Accepter (NFA or DFA) can be converted to an equivalent NFA with a single final.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Grammars CPSC 5135.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Prof. Busch - LSU1 NFAs accept the Regular Languages.
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
 Regular Grammar and Regular Language [Def 3.1] Regular Grammar(use to in lexical analysis) Type 3 grammar(regular grammar, RG) Type 3 grammar(regular.
CHAPTER 1 Regular Languages
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Finite Automata Chapter 1. Automatic Door Example Top View.
Regular Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Finite Automata & Regular Languages Sipser, Chapter 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
1 Chapter Constructing Efficient Finite Automata.
98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Expressions CS 130: Theory of Computation HMU textbook, Chapter 3.
1 Section 11.3 Constructing Efficient Finite Automata First we’ll see how to transform an NFA into a DFA. Then we’ll see how to transform a DFA into a.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Non Deterministic Automata
Complexity and Computability Theory I
Regular grammars Module 04.1 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Regular expressions Module 04.3 COP4020 – Programing Language Concepts Dr. Manuel E. Bermudez.
Non-Deterministic Finite Automata
DFA-> Minimum DFA Module 05.4 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
NFA->DFA Module 05.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Programming Language Concepts
Regular Expression to NFA
Regular Expression to NFA
COP46– Programming Language Translators Dr. Manuel E. Bermudez
Lexical Analysis Uses formalism of Regular Languages
Presentation transcript:

Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators

Regular Expressions A compact, easy-to-read language description. Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.

Regular Expressions Definition: A regular expression over an alphabet Σ is recursively defined as follows: 1.ø denotes language ø 2.ε denotes language {ε} 3.a denotes language {a}, for all a  Σ. 4.(P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s. 5.(PQ) denotes L(P)·L(Q), where P, Q are r.e.’s. 6.P* denotes L(P)*, where P is r.e. To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +

Regular Expressions Examples: (O + 1)*: any string of O’s and 1’s. (O + 1)*1: any string of O’s and 1’s, ending with a 1. 1*O1*: any string of 1’s with a single O inserted. Letter (Letter + Digit)*: an identifier. Digit Digit*: an integer. Quote Char* Quote: a string. † # Char* Eoln: a comment. † {Char*}: another comment. † † Assuming that Char does not contain quotes, eoln’s, or }.

Regular Expressions Conversion from Right-linear grammars to regular expressions Example: S → aSR → aS → bR → ε What does S → aS mean? L(S)  {a}·L(S) S → bR means L(S)  {b}·L(R) S → ε means L(S) {ε}

Regular Expressions Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε} or S = aS + bR + ε Similarly, R → aS means R = aS. Thus, S = aS + bR + ε R = aS System of simultaneous equations, in which the variables are nonterminals.

Regular Expressions Solving systems of simultaneously equations. S = aS + bR + ε R = aS Back substitute R = aS: S = aS + baS + ε = (a + ba) S + ε Question: What to do with equations of the form: X = X + β ?

Regular Expressions Answer: β  L(x), so α β  L(x), αα β  L(x), ααα β  L(x), … Thus α *β = L(x). In our case, S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*

Regular Expressions Right-linear regular grammar ↓ regular expression 1. A = α 1 + α 2 + … + α n if A → α 1 → α 2. → α n

Regular Expressions 2.If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α. If equation is of the form X = α X + β, where X does not occur in either α or β, then replace the equation with X = α *β. Note: Some algebraic manipulations may be needed to obtain the form X = α X + β. Important: Catenation is not commutative!!

Regular Expressions Example: S → aR → abaUU → aS → bU → U → b → bR S = a + bU + bR R = abaU + U = (aba + ε) U U = aS + b Back substitute R: S = a + bU + b(aba + ε) U U = aS + b

Regular Expressions Back substitute U: S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb = (ba + babaa)S + (a + bb + babab) therefore S = (ba + babaa)*(a + bb + babab) repeats

Regular Expressions Summarizing: RG R RG L Minimum DFA RENSA DFA Done Soon

Regular Expressions Regular Expression ↓ NFA Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state. Conversions:  if ø 21

Regular Expressions if ε if a if P + Q if P· Q or 1 12 a 12 ε Q P ε ε ε PQ ε 1P ε Q2 ε ε

Regular Expressions  if P* Example: (b (aba + ε) a)* (b (aba + ε) a)* 1P ε 2 ε ε ε b a b

Regular Expressions (b (aba + ε) a)* a a ab a ε ε

Regular Expressions (b (aba + ε) a)* ab a ε ε εε ε ε ab a ε ε εε ε ε 21 b ε

Regular Expressions (b (aba + ε) a) * ab a ε ε εε ε ε 21 b ε ε a

Regular Expressions (b (aba + ε) a)* εa ε ε 8 13 aε 14 1 εb 10 εε 59 ε ε 11 ε a 15 ε ε

Regular Expressions Regular Expression ↓ NFA Start With: ALGORITHM 2 E

Regular Expressions Apply Rules: a* a + b ab εε a ab a b

Regular Expressions Algorithm 1: Builds FSA bottom up Good for machines Bad for humans Algorithm 2: Builds FSA top down Bad for machines Good for humans Arguable

Regular Expressions Example (Algorithm 2): (a + b)* (aa + bb) (a + b)*aa + bb εε aa bb a + b εε a b aa bb

Regular Expressions Example (Algorithm 2): ba(a + b)* ab baεεab a b

Regular Expressions Deterministic Finite-State Automata (DFA’s) Definition: A deterministic FSA is defined just like an NFA, except that δ: Q x Σ → Q, rather than δ: Q x Σ union {ε} → 2Q Thus, both and are impossible. εa a

Regular Expressions Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s. Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.

Regular Expressions Conversion from NFA’s to DFA’s: “Simulate” all moves of the NFA with the DFA. The start state of the DFA is the start state of the NFA (say, S), together with states that are ε- reachable from S. Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states. New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state. The final states in the DFA are those that contain any final state of the NFA.

Regular Expressions Example: a*b + ba* NFA ε b b ε ε ε a a

Regular Expressions DFA Input State a b a b b a a a

Regular Expressions In general, if NFA has N states, the DFA can have as many as 2 N states. Example: ba (a + b)* ab ε a ε ε ε baε 012 b ε ε NFA

Regular Expressions DFA Input State a b

Regular Expressions a b a b b a b a01 ab

Regular Expressions State Minimization Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’. Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.

Regular Expressions Example: S = {1, 2, 3, 4, 5} Π 1 = { {1, 2, 3, 4}, {5} } Π 2 = { {1, 2, 3,}, {4}, {5} } Π 3 = { {1, 3}, {2}, {4}, {5} } Note: Π 2 is a refinement of Π 1, and Π 3 is a refinement of Π 2.

Regular Expressions Minimization Algorithm: 1.Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable. 2.Partition all states into two groups (final states and non-final states). 3.Complete the “Next State” table for each group, by specifying transitions from group to group. Form the next partition: split groups in which Next State table entries differ. Repeat 3 until no further splitting is possible. 4.Determine start and final states.

Regular Expressions Example: Π 0 = { {1, 2, 3, 4}, {5} } State a b b a b b b a a a b a Split {4} from partition {1,2,3,4}

Regular Expressions Π 1 = { {1, 2, 3}, {4}, {5} } State a b Split {2} from partition {1,2,3} a b b b a a a

Regular Expressions Π 2 = { {1, 3}, {2}, {4}, {5} } State a b No more splitting Minimal DFA a a a a b b b

Regular Expressions Summary of Regular Languages Smallest class in the Chomsky hierarchy. Appropriate for lexical analysis. Four representations: RG R, RG L, RE and FSA. All four are equivalent; there are algorithms to perform transformations among them. Various advantages and disadvantages among these four, for language designer, implementor, and user. FSA’s can be made deterministic, and minimal.