Instructor: Aaron Roth

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
1 Lecture 20 Regular languages are a subset of LFSA –algorithm for converting any regular expression into an equivalent NFA –Builds on existing algorithms.
Homework #2 Solutions.
1 Languages and Finite Automata or how to talk to machines...
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Topics Automata Theory Grammars and Languages Complexities
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Dept. of Computer Science & IT, FUUAST Automata Theory 2 Automata Theory III Languages And Regular Expressions Construction of FA’s for given languages.
Topic #3: Lexical Analysis
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
1 Chapter 1 Introduction to the Theory of Computation.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CSCI 2670 Introduction to Theory of Computing September 1, 2005.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
CHAPTER 1 Regular Languages
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
1 Regular Expressions Reading: Chapter 3. 2 Regular Expressions vs. Finite Automata Offers a declarative way to express the pattern of any string we want.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
CSCI 2670 Introduction to Theory of Computing September 11, 2007.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
Deterministic Finite Automata Nondeterministic Finite Automata.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Theory of Languages and Automata By: Mojtaba Khezrian.
CIS 262 Automata, Computability, and Complexity Fall Instructor: Aaron Roth
Cpt S 317: Spring 2009 Reading: Chapter 3
Theory of Computation Lecture #
Chapter 3 Lexical Analysis.
Formal Language & Automata Theory
CSE 105 theory of computation
Formal Language & Automata Theory
CS 154, Lecture 3: DFANFA, Regular Expressions.
COSC 3340: Introduction to Theory of Computation
Compiler Construction
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
COMPILERS LECTURE(6-Aug-13)
Chapter 1 Introduction to the Theory of Computation
CSE 105 theory of computation
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Lexical Analysis Uses formalism of Regular Languages
CSE 105 theory of computation
Presentation transcript:

Instructor: Aaron Roth aaroth@cis.upenn.edu CIS 262 Automata, Computability, and Complexity Spring 2019 http://www.seas.upenn.edu/~cse262/ Instructor: Aaron Roth aaroth@cis.upenn.edu Lecture: February 13, 2019

Course Logistics Midterm Date: Wednesday, February 27. (In class) (oops -- never mind) Midterm Date: Monday, March 18. (In class)

Recap Regular languages are closed under: Union, Intersection, Complement Concatenation, Kleene-* To show that regular languages are closed under OP Consider two arbitrary DFAs M1 and M2 Show how to construct a DFA (or NFA or e-NFA) M’ that accepts L(M1 ) OP L(M2) States/transitions of M’ are defined in terms of those of M1 and M2

Prefix Operation A string u is prefix of w if w = u.v for some string v Prefixes of 011 = e, 0, 01, 011 Prefix(L) = Set of prefixes of all strings in L = { u | there exists a string v such that u.v is in L } For example, L = { w | w ends in a } Prefix(L) = S* L’ = { w | w does not contain any a symbols } Prefix(L’) = L’

Closure Under Prefix Operation If L is regular, is Prefix(L) guaranteed to be regular ? Consider DFA M for L; goal: construct machine M’ for Prefix(L) M’ should act like M on its input w When should it accept ? As long as there exists an extension that can lead to an accepting state of M M’ has same states, initial state, and transition function as M State q is final in M’ if some state in F is reachable from q F’ = { q | there exists v such that d*(q,v) is in F }

Edit1 Operation For two strings u and v, distance(u,v)=1 if they differ in exactly one symbol For S = { a,b }, strings at distance 1 from ab : aa, bb Edit1(L) = { w | w in L or there is a string u in L with distance(u,w)=1 } For example, L = { w | w contains the substring “ACC” } for S={A,C,G,T} Edit1(L) = { w | w contains ACC or CCC or GCC or TCC or AAC or AGC or ATC or ACA or ACG or ACT }

Closure Under Edit1 Operation If L is regular, is Edit1(L) guaranteed to be regular ? Consider DFA M for L; Goal: construct machine M’ for Edit1(L) M’ should act like M on its input w, but at some step, while reading a symbol s from w, it can update the state of M using a transition on another symbol s’ Challenges: 1. When to replace and which symbol to use as a replacement ? 2. How to ensure that at most one symbol is replaced ? Solutions: 1. Use nondeterminism 2. Maintain, besides state of M, a bit to remember if a symbol has already been changed

Closure Under Edit1 Operation Consider DFA M = (Q, S, q0, F, d) Goal: construct NFA M’ for Edit1(L(M)) State of M’ is of the form (q, b) where q is a state of M and b is 0/1 b=1 means that one symbol has been already replaced b is initially 0 and q is initially q0 In state (q,b), on input symbol s, if b=0 then update q using s-transition of M keeping b=0 or update q using s’-transition of M keeping b=1 else update q using s-transition of M keeping b=1 Accept if state q is a final state of M (value of b does not matter)

Closure Under Edit1 Operation Consider DFA M = (Q, S, q0, F, d) Goal: construct NFA M’ for Edit1(L(M)) Precise definition of M’ States of M’ : Q’ = Q x { 0, 1 } Initial state of M’ : (q0, 0) Final states of M’ : F x { 0, 1 } Transition function of M’ : D’( (q, 0), s) = { (d(q, s), 0 ) } U { (d(q, s’), 1) | s’ != s } D’( (q, 1), s) = { (d(q, s), 1) }

Regular Expressions High-level specification language for expressing regular patterns Examples: S* ACC S* : Strings that contain the substring ACC S* 0 : Strings that end with symbol 0 Practical use: text search, spam filters, lexical analysis … Supported in many programming languages (awk, sed, perl, JavaScript) and text editors (emacs, Word …) We will focus on “core” regular expressions with a small set of basic operators, practical implementations support a rich set of operators (that can be defined in terms of basic ones)

Regular Expressions: Definition Let S be a finite alphabet Defining Syntax: Rules for constructing regular expressions Defining Semantics: Associating a language L(r) with each regular expression r L(r) is the set of strings that match the pattern r

Regular Expressions: Definition e is a regular expression Only the empty string matches this reg-ex: L(e) = { e } F is a regular expression No string matches this reg-ex: L(F) = { } For each symbol s in S, s is a regular expression The only string matching reg-ex s is the string s itself: L(s) = { s } If r is a regular expression, so is ( r ) Parantheses used only for parsing: L( ( r) ) = L(r)

Regular Expressions: Definition 5. If r and r’ are regular expressions, then so is r.r’ A string w matches r.r’ if it can be split in two parts w=u.v such that u matches r and v matches r’ That is, L(r.r’) = L(r) . L(r’) 6. If r and r’ are regular expressions, then so is r U r’ A string matches r U r’ if it matches either r or r’ L(r U r’) = L(r) U L(r’) 7. If r is a regular expression, then so is r* A string w matches r* if w can be split into multiple (0 or more) parts such that each part matches r: L(r*) = L(r)*

Notational Conventions Many times “.” is omitted: 01 stands for the reg-ex 0.1 If S ={a,b}, then the regular expression (a U b) is abbreviated as S r* means 0 or more repetitions of r; r+ mean one or more repetitions of r, and is an abbreviation for r.r* Operator precedences: * highest then . then U ab* means a . (b)* ab U c means (a.b) U c a U b* means a U (b)* Parantheses used as needed: (ab)*, a (b U c), (a U b)*

Regular Expressions: Examples S = { a, b } a* b S* S* a S* b S* S* abaa S* aabb S* (S S)* (a U e) b* a* F F*

Regular Expressions: Examples S= { a, b } Write regular expressions for: { w | last symbol in w = first symbol in w } a U b U a S*a U b S*b { w | count(w,a) modulo 3 = 0 } b* (a b* a b* a b*)*

Regular Expression: Phone Numbers What are valid (US) phone numbers ? 1-215-200-1091 2152001091 1.215.2001091 Should have 10 digits, with an optional 1 at the beginning Can optionally be split into three blocks, with . or – as separators ( 1 U 1. U 1- U e ) D D D (. U – U e ) D D D (. U – U e) D D D D where D stands for a digit: (0 U 1 U 2 … U 9) This allows 215.200-1091 Write a reg-ex that disallows this (i.e. both . and – are not used in same number)

Regular Expressions in Practice Additional operators and abbreviations useful in practice Intersection: r & r’ Example: constraints on a legal password (should have at least 8 characters, at least one numeral, at least one capital letter …) Negation/complementation: ~r Optional use: r ? means (r U e) Counting: D4 means D.D.D.D Character ranges: [0 – 9], [a – p] Note: class of regular languages should be closed under such operators

From Regular Expressions to NFAs Goal: Given a regular expression r, construct an e-NFA M(r) that accepts the language L(r) Construction by induction on the structure of r r equals e r equals F r equals a r equals ( r’ ) : M(r) is same as M(r’) a

From Regular Expressions to NFA r equals r1.r2 Build M(r1) Build M(r2)

From Regular Expressions to NFA r equals r1.r2 Build M(r) from M(r1) and M(r2) using concatenation construction M(r1) M(r2) e e

From Regular Expressions to NFA 6. r equals r1 U r2 Build M(r1) Build M(r2)

From Regular Expressions to NFA 6. r equals r1 U r2 Build M(r) from M(r1) and M(r2) by adding a new initial state M(r1) e M(r2) e

From Regular Expressions to NFA 7. r equals r’ * Build M(r’) Apply Kleene-* construction e e e

Example Translation (a b)* (a U e ) (a b) (a U e) (a b)* (a b)*(a U e)