Regular Languages and Expressions

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 Midterm I review Reading: Chapters Test Details In class, Wednesday, Feb. 25, :10pm-4pm Comprehensive Closed book, closed notes.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Lecture 3: Closure Properties & Regular Expressions Jim Hook Tim Sheard Portland State University.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
1 Languages and Finite Automata or how to talk to machines...
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Definitions Equivalence to Finite Automata
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
Costas Busch - LSU1 Non-Deterministic Finite Automata.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Great Theoretical Ideas in Computer Science.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong NFA to DFA.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
Regular Expressions Hopcroft, Motawi, Ullman, Chap 3.
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
CSE 311 Foundations of Computing I Lecture 27 FSM Limits, Pattern Matching Autumn 2012 CSE
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Overview of Previous Lesson(s) Over View  Algorithm for converting RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse.
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 8 NFA Subset Construction & Epsilon Transitions
Lecture Notes 
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
Lecture 10 Closure Properties of Regular Languages Topics: Extended RegExpr Thompson Construction Test 1 Post Mortem October 1, 2008 CSCE 355 Foundations.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Expressions CS 130: Theory of Computation HMU textbook, Chapter 3.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Finite Automata A simple model of computation. 2 Finite Automata2 Outline Deterministic finite automata (DFA) –How a DFA works.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter 2 Finite Automata
Jaya Krishna, M.Tech, Assistant Professor
Decision Properties of Regular Languages
Regular Expression We shall build expressions from the symbols using simple operations include concatenation, union and kleen closure. Several intuitive.
4. Properties of Regular Languages
Transition Diagrams Lecture 3 Fri, Jan 21, 2005.
Closure Properties of Regular Languages
Subject Name: FORMAL LANGUAGES AND AUTOMATA THEORY
Presentation transcript:

Regular Languages and Expressions Surinder Kumar Jain, University of Sydney

Regular Languages & Expressions Automaton DFA NFA Ε-NFA CFG as a DFA Equivalence Minimal DFA Expressions Definition Conversion from/to Automaton Regular Langauges Pumping Lemma – proving regularness Closures

Deterministic Finite Automaton A system with many states Can transition from one state to another Usually caused by external input Set of states is finite System is in one state at any given time

DFA Mathematical Definition of a DFA A = (Q, Σ,δ, q0,F) Q : States, DFA is in one of these finite states at any time. Σ : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. δ : Transition function. Given a state and an input symbols, gives the next DFA state Function over QxΣ -> Q. q0 : Initial DFA state F : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols.

DFA Example Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

Transition Diagrams start receive accept pay reject Accepted pay Waiting Pending Paid Paid reject Paid Rejected Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

Language Set of alphabets Concatenation (joining) Strings A subset of strings is a language A DFA defines a language Alphabet set is the set of input symbols Concatenation - one symbol follows another Acceptance – sequence of symbols takes DFA from start state to one of the accepting states

Non-deterministic Finite Automaton (DFA) Five-tuple like a DFA, (Q, Σ,δ, q0,F) Transition function returns a set not one state Several outgoing arcs with same symbol In several states at the same time Language of NFA

Equivalence of DFA & NFA Any NFA language can be described by some DFA Adding non-determinism does not give any thing more Why use NFAs then : Easier to make for some languages May have fewer states and less complex Algorithm to convert NFA to DFA For n state NFA,DFA may have up to 2n states Can throw away inaccessible states Observation : DFA has practically the same number of states as NFA though it often has more transitions

NFA to DFA conversion For an NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd} Qd = Powerset of Q δd(S, a) = Up in S δ(p,a) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, NFA operates on sets of states. Given a state, NFA gives a set of new states Make all possible sets of DFA states as NFA states Transit from one set of states to a new set of all possible state set Any set with an accepting state is the accepting state in NFA

NFA to DFA conversion complexity O(2n) (number of subsets of a set) Efficient algorithm Do not construct the entire power set Start with start state Only construct subsets that can reach an accepting state from the start state The number of states in DFA is much less than 2n. DFA has practically the same number of states as NFA though it often has more transitions

εpsilon - NFA Includes ε (the empty string, not in alphabet set) as a transition ε is identity in concatenation a.ε = ε.a = a for all a Spontaneous transition without an input

Equivalence to NFA An ε-NFA language can be described by some NFA Every NFA can be described by some DFA Adding ε transition does not give any thing more Why use ε-NFAs then : Easier to make for some languages Useful in proving equivalence of languages

Conversion to NFA Conversion aims to remove ε transitions Define a new set of states ε are contained inside the set No ε arc leaves or enters the new set of states Epsilon closure (eclose) For a state, set of all states reachable spontaneously Follow the ε arcs recursively and include reachable states in the epsilon closure

epsilon-NFA to DFA conversion For an ε-NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd} Qd = { eclose(q) | q = eclose(q) and q in Q } δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set Make all eclose sets as DFA states Transit from one set of states to a new set of all eclose state set Any set with an accepting state is the accepting state in NFA

Programs as Automatan An imperative program can be represented as a Control Flow Graph (CFG) with statements at nodes and predicates at edges It can be converted into a CFG with both statements and predicates at edges by pushing node statements up incoming edges Such a CFG is a DFA Program points are States Statements are input symbols that change program state from program point to point

Regular Expression Algebraic expression to denote languages Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets The language is generated using rules : L(ε) = empty set L(Ø) = empty set L(a) = a for all alphabets a L(p+q) = L(p) U L(q) L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1

Regular Expression Example a+b.c The language generated is : { a, b.c } a.b.c*.d the language generated is : { a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … } A finite way to express an infinite language

Equality of Languages DEFINITION Two regular expression (or automaton) are EQUAL if they both generate same languages Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)* = (ε + b).(a.b)*.(ε+a)

Algebraic laws of regular expressions p + q = q + p (p + q) + r = p + (q + r) (p.q).r = p.(q.r) Ø + p = p + Ø = p ε.p = p.ε = p Ø.p = p.Ø = Ø p.(q=r) = p.q + p.r (p + q).r = p.r + q.r p + p = p (p*)* = p* Ø* = ε ε* = ε p.p* = p*.p (p + q)* = (p*.q*)*

Finite Automaton and Regular Expressions Every language defined by a finite automaton is also defined by some regular expression defined by a regular expression is also defined by some DFA

DFA to Regular expression Hopcroft’s formula Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1) Rij(n) is the regular expression of all paths from i to j. (n is the number of states) States are sorted in some order and numbered 1 to n Rij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k Computed for all i,j for k=0, then k=1,…,k=n Rs,f1(n)+…+Rs,fk(n) is the regular expression of the DFA s is the start state, f1,…,fk are accepting states, n is the number of states.

DFA to RE - complexity Hopcroft formula is O(n34n), n3 to compute the table and 4n as size of regular expression grows by 4 every time. In practice it is close to O(n3) By simplifying the regular expression at every step and using judicious algorithm avoiding recomputation of Rkk(k) Most DFAs have almost n and not 2n accessible states A faster state elimination method close to O(n2) is also available

RE to Automatan conversion Regular expression is converted to ε-NFA ε-NFA can the be converted to NFA and to DFA RE to ε-NFA conversion rules : ε -> One edge (two state) DFA with ε transition Ø -> Two state DFA with no edges a -> Two state with “a” transition + -> A new start/accept statejoining two arguments of + in parallel . -> Accept of first is start of second * -> An ε edge joining star/accept of argument and a new start/accept state Convert resulting ε-NFA to a DFA

Direct conversion Augment regular expression r to (r).# Position number for each occurrence of alphabet Compute for each node of syntax tree nullable (ε in the language) firstpos (set of possible first alphabets) lastpos (set of possible last alphabets) Compute for each position followpos (set of possible next alphabet after this position) Construct the DFA

Applications Unix text search, search matching patterns (grep) Lexical/Parser analysis Parse text against a regular expression find set of first tokens at this expression root find set of last tkens at this expression root can the expression at this root be null set find set of next tokens after an alphabet position in a regular expression Efficient search of patterns in very large repository (web text search)

Regular Language DEFINITION A language (a set of strings) is defined to be a regular language if it can be defined by a finite automaton by a DFA or by an NFA or by an ε-NFA or by a regular expression Four different ways to describe a regular language

Pumping Lemma If L is a regular language then there exists integer n such that for every string w in L we can break w into x, y, z such that w=x.y.z y  ε |x.y| =< n x.yk.z is in L (for all k >= 0) Proof based on For a DFA of length n any string of length > n must revisit a state Used to prove that a language is not regular

Closure property Language is a set of string over finite alphabets Language operators : Union of two languages L(A  B) = L(A)  L(B) - re Intersection Concatenation L(A.B) = { a.b | a in A, b in B} Kleene Closure L(A*) = { an | a in A, n >= 0 } a0 = ε for all a and an = an-1 Compliment L(A’) = { a | a not in A } (with respect to some overall alphabet set) - dfa Difference L(A-B) = L(A) – L(B) - dfa switch q0 F Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } Homomorphism – replace an alphabet with another regular expression Inverse homomorphism

Decision properties Is the language described empty? Is a particualr string in the described language? Do two different of languages actually describe the same language?

Conversions Decision properties may require conversion between various forms. Can the conversion be done in reasonable time? Conversion Complexity Computing ε closures O(n3) Warshall’s O(n) Subset construction O(2n) NFA to DFA O(n32n) (In practice O(n3s) DFA to NFA conversion O(n) NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less) Regular Expression to εNFA Regular Expression to NFA O(n3) Regular Expression to DFA O(n34n^32^n)

Equivalence of automata Equivalence of two states States p and q in an automaton are Defined to be equivalent if For all input strings applied at state p or q p ends up in an accepting state if and only if q also ends up in an accepting state The accepting state reached by p does not have to be same accepting state as that reached by q

Minimization of DFA If two states p and q are equivalent we can combine them together into a single state it wont affect the language accepted by the DFA This process of combining states together is called Minimization Table-filling algorithm can find if two states are equivalent or not. Complexity O(n2) Non-equivalent pairs are distinguishable

MinimuM DFA Minimum DFA is unique Equivalence of two Regular Languages Eliminate all states not reachable from start Determine which states are equivalent Partition states into blocks of equivalent states Equivalence is transitive Thus no state is in two blocks Equivalence of two Regular Languages Convert them into their minimum DFAs and check for isomorphism Union method Make a minimum DFA of the union of the two Start state of the two original DFAs must be equivalent if and only if DFAs are equivalent