Instructor: Aaron Roth

Instructor: Aaron Roth aaroth@cis.upenn.edu
CIS 262 Automata, Computability, and Complexity Spring Instructor: Aaron Roth Lecture: February 13, 2019

Course Logistics Midterm Date: Wednesday, February 27. (In class)
(oops -- never mind) Midterm Date: Monday, March 18. (In class)

Recap Regular languages are closed under:
Union, Intersection, Complement Concatenation, Kleene-* To show that regular languages are closed under OP Consider two arbitrary DFAs M1 and M2 Show how to construct a DFA (or NFA or e-NFA) M’ that accepts L(M1 ) OP L(M2) States/transitions of M’ are defined in terms of those of M1 and M2

Prefix Operation A string u is prefix of w if w = u.v for some string v Prefixes of 011 = e, 0, 01, 011 Prefix(L) = Set of prefixes of all strings in L = { u | there exists a string v such that u.v is in L } For example, L = { w | w ends in a } Prefix(L) = S* L’ = { w | w does not contain any a symbols } Prefix(L’) = L’

Closure Under Prefix Operation
If L is regular, is Prefix(L) guaranteed to be regular ? Consider DFA M for L; goal: construct machine M’ for Prefix(L) M’ should act like M on its input w When should it accept ? As long as there exists an extension that can lead to an accepting state of M M’ has same states, initial state, and transition function as M State q is final in M’ if some state in F is reachable from q F’ = { q | there exists v such that d*(q,v) is in F }

Edit1 Operation For two strings u and v, distance(u,v)=1 if they differ in exactly one symbol For S = { a,b }, strings at distance 1 from ab : aa, bb Edit1(L) = { w | w in L or there is a string u in L with distance(u,w)=1 } For example, L = { w | w contains the substring “ACC” } for S={A,C,G,T} Edit1(L) = { w | w contains ACC or CCC or GCC or TCC or AAC or AGC or ATC or ACA or ACG or ACT }

Closure Under Edit1 Operation
If L is regular, is Edit1(L) guaranteed to be regular ? Consider DFA M for L; Goal: construct machine M’ for Edit1(L) M’ should act like M on its input w, but at some step, while reading a symbol s from w, it can update the state of M using a transition on another symbol s’ Challenges: 1. When to replace and which symbol to use as a replacement ? 2. How to ensure that at most one symbol is replaced ? Solutions: 1. Use nondeterminism 2. Maintain, besides state of M, a bit to remember if a symbol has already been changed

Consider DFA M = (Q, S, q0, F, d) Goal: construct NFA M’ for Edit1(L(M)) State of M’ is of the form (q, b) where q is a state of M and b is 0/1 b=1 means that one symbol has been already replaced b is initially 0 and q is initially q0 In state (q,b), on input symbol s, if b=0 then update q using s-transition of M keeping b=0 or update q using s’-transition of M keeping b=1 else update q using s-transition of M keeping b=1 Accept if state q is a final state of M (value of b does not matter)

Consider DFA M = (Q, S, q0, F, d) Goal: construct NFA M’ for Edit1(L(M)) Precise definition of M’ States of M’ : Q’ = Q x { 0, 1 } Initial state of M’ : (q0, 0) Final states of M’ : F x { 0, 1 } Transition function of M’ : D’( (q, 0), s) = { (d(q, s), 0 ) } U { (d(q, s’), 1) | s’ != s } D’( (q, 1), s) = { (d(q, s), 1) }

Regular Expressions High-level specification language for expressing regular patterns Examples: S* ACC S* : Strings that contain the substring ACC S* 0 : Strings that end with symbol 0 Practical use: text search, spam filters, lexical analysis … Supported in many programming languages (awk, sed, perl, JavaScript) and text editors (emacs, Word …) We will focus on “core” regular expressions with a small set of basic operators, practical implementations support a rich set of operators (that can be defined in terms of basic ones)

Regular Expressions: Definition
Let S be a finite alphabet Defining Syntax: Rules for constructing regular expressions Defining Semantics: Associating a language L(r) with each regular expression r L(r) is the set of strings that match the pattern r

e is a regular expression Only the empty string matches this reg-ex: L(e) = { e } F is a regular expression No string matches this reg-ex: L(F) = { } For each symbol s in S, s is a regular expression The only string matching reg-ex s is the string s itself: L(s) = { s } If r is a regular expression, so is ( r ) Parantheses used only for parsing: L( ( r) ) = L(r)

5. If r and r’ are regular expressions, then so is r.r’ A string w matches r.r’ if it can be split in two parts w=u.v such that u matches r and v matches r’ That is, L(r.r’) = L(r) . L(r’) 6. If r and r’ are regular expressions, then so is r U r’ A string matches r U r’ if it matches either r or r’ L(r U r’) = L(r) U L(r’) 7. If r is a regular expression, then so is r* A string w matches r* if w can be split into multiple (0 or more) parts such that each part matches r: L(r*) = L(r)*

Notational Conventions
Many times “.” is omitted: 01 stands for the reg-ex 0.1 If S ={a,b}, then the regular expression (a U b) is abbreviated as S r* means 0 or more repetitions of r; r+ mean one or more repetitions of r, and is an abbreviation for r.r* Operator precedences: * highest then . then U ab* means a . (b)* ab U c means (a.b) U c a U b* means a U (b)* Parantheses used as needed: (ab)*, a (b U c), (a U b)*

Regular Expressions: Examples
S = { a, b } a* b S* S* a S* b S* S* abaa S* aabb S* (S S)* (a U e) b* a* F F*

Regular Expressions: Examples
S= { a, b } Write regular expressions for: { w | last symbol in w = first symbol in w } a U b U a S*a U b S*b { w | count(w,a) modulo 3 = 0 } b* (a b* a b* a b*)*

Regular Expression: Phone Numbers
What are valid (US) phone numbers ? Should have 10 digits, with an optional 1 at the beginning Can optionally be split into three blocks, with . or – as separators ( 1 U 1. U 1- U e ) D D D (. U – U e ) D D D (. U – U e) D D D D where D stands for a digit: (0 U 1 U 2 … U 9) This allows Write a reg-ex that disallows this (i.e. both . and – are not used in same number)

Regular Expressions in Practice
Additional operators and abbreviations useful in practice Intersection: r & r’ Example: constraints on a legal password (should have at least 8 characters, at least one numeral, at least one capital letter …) Negation/complementation: ~r Optional use: r ? means (r U e) Counting: D4 means D.D.D.D Character ranges: [0 – 9], [a – p] Note: class of regular languages should be closed under such operators

From Regular Expressions to NFAs
Goal: Given a regular expression r, construct an e-NFA M(r) that accepts the language L(r) Construction by induction on the structure of r r equals e r equals F r equals a r equals ( r’ ) : M(r) is same as M(r’) a

From Regular Expressions to NFA
r equals r1.r2 Build M(r1) Build M(r2)

r equals r1.r2 Build M(r) from M(r1) and M(r2) using concatenation construction M(r1) M(r2) e e

6. r equals r1 U r2 Build M(r1) Build M(r2)

6. r equals r1 U r2 Build M(r) from M(r1) and M(r2) by adding a new initial state M(r1) e M(r2) e

7. r equals r’ * Build M(r’) Apply Kleene-* construction e e e

Example Translation (a b)* (a U e ) (a b) (a U e) (a b)* (a b)*(a U e)

Instructor: Aaron Roth

Similar presentations

Presentation on theme: "Instructor: Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Aaron Roth

Similar presentations

Presentation on theme: "Instructor: Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback