Chapter Nine: Advanced Topics in Regular Languages

Slides:



Advertisements
Similar presentations
Theory of Computing Lecture 23 MAS 714 Hartmut Klauck.
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Magic Numbers and Subset Construction Samik Datta Sayantan Mahinder.
Formal Language, chapter 9, slide 1Copyright © 2007 by Adam Webber Chapter Nine: Advanced Topics in Regular Languages.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS5371 Theory of Computation
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Fall 2004COMP 3351 Another NFA Example. Fall 2004COMP 3352 Language accepted (redundant state)
Costas Busch - LSU1 Non-Deterministic Finite Automata.
Foundations of (Theoretical) Computer Science Chapter 1 Lecture Notes (Section 1.2: NFA’s) David Martin With some modifications.
Final Exam Review Cummulative Chapters 0, 1, 2, 3, 4, 5 and 7.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lexical Analysis Constructing a Scanner from Regular Expressions.
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
 2005 SDU Lecture13 Reducibility — A methodology for proving un- decidability.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Lecture Notes 
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
1/29/02CSE460 - MSU1 Nondeterminism-NFA Section 4.1 of Martin Textbook CSE460 – Computability & Formal Language Theory Comp. Science & Engineering Michigan.
Theory of Computation Automata Theory Dr. Ayman Srour.
Costas Busch - LSU1 Deterministic Finite Automata And Regular Languages.
Theory of Computation Automata Theory Dr. Ayman Srour.
Chapters 11 and 12 Decision Problems and Undecidability.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CIS Automata and Formal Languages – Pei Wang
Non Deterministic Automata
Copyright © Cengage Learning. All rights reserved.
Regular Expressions.
Linear Bounded Automata LBAs
CSE 105 theory of computation
Turing Machines Acceptors; Enumerators
Chapter 2 FINITE AUTOMATA.
Jaya Krishna, M.Tech, Assistant Professor
REGULAR LANGUAGES AND REGULAR GRAMMARS
Hierarchy of languages
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Jaya Krishna, M.Tech, Assistant Professor
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Non-Deterministic Finite Automata
CS 154, Lecture 3: DFANFA, Regular Expressions.
Chapter Two: Finite Automata
Principles of Computing – UFCFA3-30-1
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non-Deterministic Finite Automata
Decidable Languages Costas Busch - LSU.
Non Deterministic Automata
Finite Automata Reading: Chapter 2.
Chapter Five: Nondeterministic Finite Automata
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Instructor: Aaron Roth
Instructor: Aaron Roth
CSE 105 theory of computation
Chapter 1 Regular Language
CHAPTER 1 Regular Languages
CSE 105 theory of computation
Presentation transcript:

Chapter Nine: Advanced Topics in Regular Languages Formal Language, chapter 9, slide 1

There are many more things to learn about finite automata than are covered in this book. There are many variations with interesting applications, and there is a large body of theory. Especially interesting, but beyond the scope of this book, are the various algebras that arise around finite automata. This chapter gives just a taste of some of these advanced topics. Formal Language, chapter 9, slide 2

Outline 9.1 DFA Minimization 9.2 Two-Way Finite Automata 9.3 Finite-State Transducers 9.4 Advanced Regular Expressions Formal Language, chapter 9, slide 3

DFA Minimization Questions of DFA size: Given a DFA, can we find one with fewer states that accepts the same language? What is the smallest DFA for a given language? Is the smallest DFA unique, or can there be more than one "smallest" DFA for the same language? All these questions have neat answers… Formal Language, chapter 9, slide 4

Eliminating Unnecessary States Unreachable states, like some of those introduced by the subset construction, can obviously be eliminated Even some of the reachable states may be redundant… Formal Language, chapter 9, slide 5

Example: Equivalent States In both q3 and q4, the machine rejects, no matter what the rest of the input string contains They're equivalent and can be combined… Formal Language, chapter 9, slide 6

Still More Equivalent States In both q1 and q2, the machine accepts if and only if the rest of the string consists of 0 or more as They're equivalent and can be combined… Formal Language, chapter 9, slide 7

Minimized No more equivalencies This is a minimum-state DFA for the language, {xay | x ∈ {b}* and y ∈ {a}*} Formal Language, chapter 9, slide 8

State Equivalence Informally: two states are equivalent when the machine's decision after any remaining input will be the same from either state Formally: Define a little language L(M,q) for each state q: L(M,q) = {x ∈ Σ* | δ*(q,x) ∈ F} That's the language of strings that would be accepted if q were used as the start state Now we can define state equivalence: q and r are equivalent if and only if L(M,q) = L(M,r) Formal Language, chapter 9, slide 9

We have: L(M,q0) = {xay | x ∈ {b}* and y ∈ {a}*} L(M,q1) = {x | x ∈ {a}*} L(M,q2) = {x | x ∈ {a}*} L(M,q3) = {} L(M,q4) = {} So q1 ≡ q2 and q3 ≡ q4 Formal Language, chapter 9, slide 10

DFA Minimization Procedure Two steps: 1. Eliminated states that are not reachable from the start state 2. Combine all equivalent states, so that no two remaining states are equivalent to each other Step 2 is the construction of a new DFA whose states are the equivalence classes of the original: the quotient construction Formal Language, chapter 9, slide 11

Theorem 9.1 Every regular language has a unique minimum-state DFA, and no matter what DFA for the language you start with, the minimization procedure finds it. (Stated here without proof) Resulting DFA is unique up to isomorphism That is, unique except perhaps for state names, which of course have no effect on L(M) So our minimization procedure is safe and effective Safe, in that it does not change L(M) Effective, in that it finds the structurally unique smallest DFA for L(M) Formal Language, chapter 9, slide 12

Automating Minimization Is there an algorithm that can efficiently detect equivalent states and so perform the minimization? Yes: a DFA with state set Q and alphabet Σ can be minimized in O(|Σ| |Q| log |Q|) time (reference in the book) Formal Language, chapter 9, slide 13

Minimizing NFAs Results are not as clean for NFAs We can still eliminate unreachable states We can still combine equivalent states, using a similar definition of equivalence But the result is not necessarily a unique minimum-state NFA for the language (reference in the book) Formal Language, chapter 9, slide 14

Outline 9.1 DFA Minimization 9.2 Two-Way Finite Automata 9.3 Finite-State Transducers 9.4 Advanced Regular Expressions Formal Language, chapter 9, slide 15

Two-Way Finite Automata DFAs and NFAs read their input once, left to right We can try to make these models more powerful by allowing re-reading Treat the input like a tape, and allow the automaton to move its read head left or right on each move Two-way deterministic finite automata (2DFA) Two-way nondeterministic finite automata (2NFA) Formal Language, chapter 9, slide 16

2DFA Example Input string x1x2…xn-1xn Special endmarker symbols frame the input The head can't move past them Formal Language, chapter 9, slide 17

2DFA Differences Transition function returns a pair of values: The next state The direction (L or R) to move the head Computation ends, not at the end of the string, but when a special state is entered One accept state: halt and accept when entered One reject state: halt and reject when entered Can these define more than the regular languages? Formal Language, chapter 9, slide 18

Theorems 9.2.1 And 9.2.2 There is a 2DFA for a language L if and only if L is regular. There is a 2NFA for a language L if and only if L is regular. (Stated here without proof) So adding two-way reading to finite automata does not increase their definitional power A little more tweaking will give us more power later: adding the ability to write as well as read yields LBAs (linear bounded automata), which are much more powerful Adding the ability to write unboundedly far past the end markers yields the Turing machines, still more powerful Formal Language, chapter 9, slide 19

Outline 9.1 DFA Minimization 9.2 Two-Way Finite Automata 9.3 Finite-State Transducers 9.4 Advanced Regular Expressions Formal Language, chapter 9, slide 20

Finite-State Transducers Our DFAs and NFAs have a single bit of output: accept/reject For some applications we need more output than that Finite-state machines with string output are called finite-state transducers Formal Language, chapter 9, slide 21

Output On Each Transition A transition labeled a,x works like a DFA transition labeled a, but also outputs string x Transforms input strings into output strings Formal Language, chapter 9, slide 22

Action For Input 10#11# Formal Language, chapter 9, slide 23

Transducers Given a sequence of binary numbers, it outputs the same numbers rounded down to the nearest even We can think of it as modifying an input signal Finite-state transducers have signal-processing applications: Natural language processing Speech recognition Speech synthesis (reference in the book) They come in many varieties deterministic / nondeterministic output on transition / output in state software / hardware Formal Language, chapter 9, slide 24

Outline 9.1 DFA Minimization 9.2 Two-Way Finite Automata 9.3 Finite-State Transducers 9.4 Advanced Regular Expressions Formal Language, chapter 9, slide 25

Regular Expression Equivalence Chapter 7 grading: decide whether a regular expression is a correct answer to a problem That is, decide whether two regular expressions (your solution and mine) define the same language We've seen a way to automate this: convert to NFAs (as in Appendix A) convert to DFAs (subset construction) minimize the DFAs (quotient construction) compare: the original regular expressions are equivalent if the resulting DFAs are identical But this is extremely expensive Formal Language, chapter 9, slide 26

Cost of Equivalence Testing The problem of deciding regular expression equivalence is PSPACE-complete Informally: PSPACE-complete problems are generally believed (but not proved) to require exponential time More about this in Chapter 20 The problem gets even worse when we extend regular expressions... Formal Language, chapter 9, slide 27

Regular Expressions With Squaring Add one more kind of compound expression: (r)2, with L((r)2) = L((r)(r)) Obviously, this doesn't add power But it does allow you to express some regular languages much more compactly Consider {0n | n mod 64 = 0}, without squaring and with squaring: (0000000000000000000000000000000000000000000000000000000000000000)* (((((0000)2)2)2)2)* Formal Language, chapter 9, slide 28

New Complexity The problem of deciding regular expression equivalence, when squaring is allowed, is EXPSPACE- complete Informally: EXPSPACE-complete problems require exponential space and at least exponential time More about this in Chapter 20 Cost is measured as a function of the input size -- and we've compressed the input size using squaring The problem gets even worse when we extend regular expressions in other ways... Formal Language, chapter 9, slide 29

Regular Expressions With Complement Add one more kind of compound expression: (r)C, with L((r)C) defined to be the complement of L(r) Obviously, this doesn't add power; regular languages are closed for complement But it does allow you to express some regular languages much more compactly See Chapter 9, Exercise 3 Formal Language, chapter 9, slide 30

Still More Complexity The problem of deciding regular expression equivalence, when complement is allowed, requires NONELEMENTARY TIME Informally: the time required to compare two regular expressions of length n grows faster than for any fixed-height stack of exponentiations More about this in Chapter 20 Formal Language, chapter 9, slide 31

Star Height The star height of a regular expression is the nesting depth of Kleene stars a+b has star height 0 (a+b)* has star height 1 (a*+b*)* has star height 2 etc. It is often possible, and desirable, to simplify expressions in a way that reduces star height ∅* defines the same language as ε (a*+b*)* defines the same language as (a+b)* Formal Language, chapter 9, slide 32

Star Height Questions Is there some fixed star height (perhaps 1 or 2) that suffices for any regular language? Is there an algorithm that can take a regular expression and find the minimum possible star height for any equivalent expression? Formal Language, chapter 9, slide 33

Star Height Questions For basic regular expressions, the answers are known: Is there some fixed star height (perhaps 1 or 2) that suffices for any regular language? -- No, we need arbitrary star height to cover the regular languages Is there an algorithm that can take a regular expression and find the minimum possible star height for any equivalent expression? -- Yes, there is an algorithm for minimizing star height When complement is allowed, these questions are still open Formal Language, chapter 9, slide 34

Generalized Star-Height Problem In particular, when complement is allowed, it is not known whether there is a regular language that requires a star height greater than 1! This is one of the most prominent open problems surrounding regular expressions Formal Language, chapter 9, slide 35