LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

Lecture 9,10 Theory of AUTOMATA
Regular Expressions and DFAs COP 3402 (Summer 2014)
Deterministic Finite Automata (DFA)
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong. Administrivia We'll postpone Homework 4 review until next week …
3.2 Pumping Lemma for Regular Languages Given a language L, how do we know whether it is regular or not? If we can construct an FA to accept the language.
CS21 Decidability and Tractability
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/4.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Lecture 8 Sept 29, 2011 Regular expressions – examples Converting DFA to regular expression. (same works for NFA to r.e. conversion.) Converting R.E. to.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
CS 3240 – Chapter 4.  Closure Properties  Algorithms for Elementary Questions:  Is a given word, w, in L?  Is L empty, finite or infinite?  Are L.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Homework 6 graded.
Properties of Regular Languages
CS355 - Theory of Computation Regular Expressions.
Conversions & Pumping Lemma CPSC 388 Fall 2001 Ellen Walker Hiram College.
CS 203: Introduction to Formal Languages and Automata
Deterministic Finite Automata COMPSCI 102 Lecture 2.
CSCI 3130: Formal languages and automata theory Tutorial 3 Chin.
1 Find as many examples as you can of w, x, y, z so that w is accepted by this DFA, w = x y z, y ≠ ε, | x y | ≤ 7, and x y n z is in L for all n ≥ 0.
Great Theoretical Ideas in Computer Science for Some.
1 Advanced Theory of Computation Finite Automata with output Pumping Lemma Theorem.
Lecture # 16. Applications of Incrementing and Complementing machines 1’s complementing and incrementing machines which are basically Mealy machines are.
1 Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
CSE 105 theory of computation
Pumping Lemma.
CSE 105 theory of computation
Standard Representations of Regular Languages
CSE322 PUMPING LEMMA FOR REGULAR SETS AND ITS APPLICATIONS
CSE 105 theory of computation
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
PROPERTIES OF REGULAR LANGUAGES
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
PARSE TREES.
Intro to Theory of Computation
Intro to Data Structures
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong.
Elementary Questions about Regular Languages
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Great Theoretical Ideas in Computer Science
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
Pumping Lemma.
Chapter 4 Properties of Regular Languages
CS21 Decidability and Tractability
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong.
Recap lecture 26 Example of nonregular language, pumping lemma version I, proof, examples,
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Recap lecture 25 Intersection of two regular languages is regular, examples, non regular languages, example.
Recap lecture 23 Mealy machines in terms of sequential circuit.
Mealy and Moore Machines
CHAPTER 1 Regular Languages
Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
CSCI 2670 Introduction to Theory of Computing
MA/CSSE 474 Theory of Computation
PZ02B - Regular grammars Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section PZ02B.
Regular Expressions.
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong

Today's Topics Reminder: no class on Tuesday (out of town at a meeting) Homework 7: due date next Wednesday night Efficient string matching (Knuth-Morris-Pratt algorithm) Pumping Lemma for regular languages

Homework 7 Question 1: convert this 4-state NDFSA into a deterministic FSA How many states does the DFSA have? Σ = {0, 1} double circle = final state initial state = A comma = disjunction

Homework 7 Question 2: Find a regex for this machine (show your intermediate steps)

Homework 7 Question 3: implement the DFSA in Perl or Python Question 4: implement the regex in Perl or Python Test your code on possible Σ* inputs: 0 1 00 01 10 11 000 001 010 011 100 101 110 111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 etc.. Question 5: based on your testing, can you succinctly describe the language accepted in English?

Applications of FSA Let’s take a look at one application of regular language technology: Efficient String Matching

String Matching Matched! Example: Naïve algorithm: Pattern (P): aaba Corpus (C): aaabbaabab Naïve algorithm: compare pattern P against corpus C from left to right, shifting one character at a time P: aaba C: aaabbaabab P: _aaba P: __aaba P: ___aaba C: aaabbaabab P: ____aaba P: _____aaba Matched!

Knuth-Morris-Pratt (KMP) Example Pattern: aaba we can do better (i.e. use fewer comparisons) if we use a FSA to represent the Pattern plus extra links for restarts in the event of a failure all backpointers are restarts for failed character 1 2 3 4 a b [^a] [^b] [^a] [^a] Suppose the alphabet Σ = {a,b}, then restarts can be for the character following the failed character 1 2 3 4 a b b a b b

Knuth-Morris-Pratt (KMP) Example: Pattern: aaba Corpus {a,b}: aaabbaabab Other possible algorithms: e.g. Boyer-Moore (start match from the back of the pattern) 1 2 3 4 a b KMP: Corpus: a a a b b a a b a b State: 0 1 2 2 3 0 1 2 3 4

Beyond Regular Languages anbn = {ab, aabb, aaabbb, aaaabbbb, ... } n≥1 is not a regular language That means no FSA, RE (or Regular Grammar) can be built for this set Informally, let’s think about a FSA implementation … We only have a finite number of states to play with … We’re only allowed simple free iteration (looping)

Beyond Regular Languages L = a+b+ s x y a b Having a frequency table is not permitted. Not allowed: %freq = (); … $freq{$state}++;

A Formal Tool: The Pumping Lemma [See also discussion in JM 16.2.1, pages 533–534] Let L be a regular language, then there exists a number p > 0 where p is a pumping length (sometimes called a magic number) such that every string w in L with |w| ≥ p can be written in the following form w = xyz with strings x, y and z such that |xy| ≤ p, |y| > 0 and xy i z is in L for every integer i ≥ 0. BTW: there is also a pumping lemma for Context-Free Languages

A Formal Tool: The Pumping Lemma Restated: For every (sufficiently long) string w in a regular language there is always a way to split the string into three adjacent sections, call them x, y and z, (y nonempty), i.e. w is x followed by y followed by z And y can be repeated as many times as we like (or omitted) And the modified string is still a member of the language Essential Point! To prove a language is non-regular: show that no matter how we split the string, there will be modified strings that can't be in the language

A Formal Tool: The Pumping Lemma Example: show that anbn is not regular Proof (by contradiction): pick a sufficiently long string in the language e.g. a..aab..bb (#a’s = #b’s) Partition it according to w = xyz then show xy i z is not in L i.e. string does not pump

A Formal Tool: The Pumping Lemma aaaa..aabbbb..bb Case 1: w = xyz, y straddles the ab boundary what happens when we pump y? y y y Case 2: w = xyz, y is wholly within the a’s what happens when we pump y? Case 3: w = xyz, y is wholly within the b’s what happens when we pump y?

Perl regex anbn With Perl regex we could use (?{…Perl code...})

A Formal Tool: The Pumping Lemma Prime number testing prime number testing using Perl’s extended “regular expressions” Using unary notation, e.g. 5 = “11111” /^(11+?)\1+$/ will match anything that’s greater than 1 that’s not prime L = {1n | n is prime} is not a regular language

A Formal Tool: The Pumping Lemma 1n = 111..1111..11111 such that n is a prime number x y z For any split of the string Pump y such that i = length(x+z), giving yi What is the length of string w=xyiz now? In x yxz z , how many copies of xz do we have? Answer is y+1 i.e. pumped number can be factorized into (1+|y|)|xz| The resulting length is non-prime since it can be factorized

A Formal Tool: The Pumping Lemma 1n = 111..1111..11111 such that n is a prime number x y z Illustration of the calculation: 1111 1111 111 (eleven) 1111 1111 1111 1111 1111 1111 1111 1111 111 4 + 4*7 + 3 = 5*7 which isn't prime Another look: 1111 111 1111 (re-arrange eleven) 1111 111 1111 1111 1111 1111 111 111 111 111 (make 4 bundles of 4; 4 bundles of 3)

A Formal Tool: The Pumping Lemma Another angle to reduce the mystery, let's think in terms of FSA. We know: we can't control the loops we are restricted to a finite number of states assume (without loss of generality) there are no e- transitions Suppose there are a total of p states in the machine Supose we have a string in the language longer than p What can we conclude? Answer: we must have visited some state(s) more than once! Also: there must be a loop (or loops) in the machine! Also: we can repeat or skip that loop and stay inside the language!