LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong. Administrivia We'll postpone Homework 4 review until next week …
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING/C SC/PSYC 438/538 Lecture 4 9/1 Sandiway Fong.
CS5371 Theory of Computation Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)
Chapter 2: Algorithm Discovery and Design
Aho-Corasick String Matching An Efficient String Matching.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
Lecture 3 Aug 31, 2011 Goals: Chapter 2 (algorithm analysis) Examples: Selection sorting rules for algorithm analysis discussion of lab – permutation generation.
1 More Applications of the Pumping Lemma. 2 The Pumping Lemma: Given a infinite regular language there exists an integer for any string with length we.
LING 388: Language and Computers Sandiway Fong Lecture 10: 9/26.
Modern Information Retrieval Chapter 4 Query Languages.
Foundations of (Theoretical) Computer Science Chapter 1 Lecture Notes (Section 1.4: Nonregular Languages) David Martin With some.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Do Now 12/7/10 Take out HW from last night. Copy HW in your planner.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
The Golden Chain εNFA  NFA  DFA  REGEX. Regular Expressions.
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Invitation to Computer Science, Java Version, Second Edition.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Continuing with Perl Homework 3: first Perl homework – due Sunday by midnight – one PDF file, by .
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Homework 6 graded.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
CS355 - Theory of Computation Regular Expressions.
CS 203: Introduction to Formal Languages and Automata
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
CSCI 3130: Formal languages and automata theory Tutorial 3 Chin.
1 Applications of pumping lemma(Dr. Torng) Applications of Pumping Lemma –General proof template What is the same in every proof What changes in every.
Models of Computing Regular Expressions 1. Formal models of computation What can be computed? What is a valid program? What is a valid name of a variable.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong. Homework 4 Submit one PDF file Your submission should include code and sample runs Due date Monday 21.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
LING/C SC/PSYC 438/538 Lecture 9 Sandiway Fong. Adminstrivia Homework 4 graded Homework 5 out today – Due Saturday night by midnight – (Gives me Sunday.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong. Today's Topics A note on the UIUC POS Tagger Fun with POS Tagging Perl regex wrap-up.
LING/C SC/PSYC 438/538 Online Lecture 7 Sandiway Fong.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
Grep Allows you to filter text based upon several different regular expression variants Basic Extended Perl.
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
Theory of Computation Languages.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Regular Expressions
Algorithm Discovery and Design
LING/C SC 581: Advanced Computational Linguistics
Building Java Programs
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong.
Non-Regular Languages
CS 461 – Sept. 16 Review Pumping lemma Applications of FA:
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong.
CSE 311: Foundations of Computing
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong

Adminstrivia

More practice … Programming example: Let's write a simple program to count word frequencies in a text corpus – # different words – # words – top 100 ranked words in terms of frequency – Zipf's Law: freq is inversely proportional to rank We'll use your homework corpus WSJ9_001e.txt

More practice … See: – WordAndLetterFrequencies/ WordAndLetterFrequencies/ – Brown corpus:

Character Frequency Counting Sample code is rather interesting: More verbose but easier to read perhaps:

Character Frequency Counting More verbose but easier to read perhaps:

Character Frequency Counting Output for: – "This is a slightly simplified version of a rather complicated piece Perl code."

Prime Number Testing using Perl Regular Expressions Another example: – the set of prime numbers is not a regular language – L prime = {2, 3, 5, 7, 11, 13, 17, 19, 23,.. } Turns out, we can use a Perl regex to determine membership in this set.. and to factorize numbers Turns out, we can use a Perl regex to determine membership in this set.. and to factorize numbers /^(11+?)\1+$/

Prime Number Testing using Perl Regular Expressions L = {1 n | n is prime} is not a regular language Keys to making this work: \1 backreference unary notation for representing numbers, e.g. – “five ones” = 5 – “six ones” = 6 unary notation allows us to factorize numbers by repetitive pattern matching – (11)(11)(11) “six ones” = 6 – (111)(111) “six ones” = 6 numbers that can be factorized in this way aren’t prime – no way to get nontrivial subcopies of “five ones” = 5 Then /^(11+?)\1+$/ will match anything that’s greater than 1 that’s not prime can be proved using the Pumping Lemma for regular languages (later) can be proved using the Pumping Lemma for regular languages (later)

Prime Number Testing using Perl Regular Expressions Let’s analyze this Perl regex /^(11+?)\1+$/ ^ and $ anchor both ends of the strings, forces (11+?)\1+ to cover the string exactly (11+?) is non-greedy match version of (11+) \1+ provides one or more copies of what we matched in (11+?) Question: is the non-greedy operator necessary? Question: is the non-greedy operator necessary?

Prime Number Testing using Perl Regular Expressions Compare /^(11+?)\1+$/ with /^(11+)\1+$/ i.e. non-greedy vs. greedy matching finds smallest factor vs. largest –90021 factored using 3, not a prime (0 secs) vs. –90021 factored using 30007, not a prime (0 secs) affects computational efficiency for non-primes Puzzling behavior: same output non-greedy vs. greedy factored using , not a prime (48 secs vs. 13 secs) Puzzling behavior: same output non-greedy vs. greedy factored using , not a prime (48 secs vs. 13 secs)

Prime Number Testing using Perl Regular Expressions is of formal (i.e. geeky) interest only…

Prime Number Testing using Perl Regular Expressions testing with prime numbers only can take a lot of time to compute … Prime Numbers

Prime Number Testing using Perl Regular Expressions /^(11+?)\1+$/ vs. /^(11+)\1+$/ i.e. non-greedy vs. greedy matching finds smallest factor vs. largest –90021 factored using 3, not a prime (0 secs) vs. –90021 factored using 30007, not a prime (0 secs) Puzzling behavior: same output non-greedy vs. greedy factored using , not a prime (48 secs vs. 13 secs) Puzzling behavior: same output non-greedy vs. greedy factored using , not a prime (48 secs vs. 13 secs)

Prime Number Testing using Perl Regular Expressions nearest primes to preset limit * *32771 = = 98313

Prime Number Testing using Perl Regular Expressions When preset limit is exceeded: Perl’s regex matching fails quietly

Prime Number Testing using Perl Regular Expressions Can also get non-greedy to skip several factors Example: pick non-prime = 3 x 5 x (prime factorization) Non-greedy: missed factors 3 and 5 … Because 3 * = * = limit 15 * = Because 3 * = * = limit 15 * = greedy version

Prime Number Testing using Perl Regular Expressions Results are still right so far though: – wrt. prime vs. non-prime But we predict it will report an incorrect result for – 1,070,009,521 – it should claim (incorrectly) that this is prime – since =