Download presentation
Presentation is loading. Please wait.
Published byDenis Carter Modified over 9 years ago
1
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong
2
Adminstrivia
3
More practice … Programming example: Let's write a simple program to count word frequencies in a text corpus – # different words – # words – top 100 ranked words in terms of frequency – Zipf's Law: freq is inversely proportional to rank We'll use your homework corpus WSJ9_001e.txt
4
More practice … See: – http://demonstrations.wolfram.com/ZipfsLawAppliedTo WordAndLetterFrequencies/ http://demonstrations.wolfram.com/ZipfsLawAppliedTo WordAndLetterFrequencies/ – Brown corpus: http://finnaarupnielsen.wordpress.com/
5
Character Frequency Counting Sample code is rather interesting: More verbose but easier to read perhaps:
6
Character Frequency Counting More verbose but easier to read perhaps:
7
Character Frequency Counting Output for: – "This is a slightly simplified version of a rather complicated piece Perl code."
8
Prime Number Testing using Perl Regular Expressions Another example: – the set of prime numbers is not a regular language – L prime = {2, 3, 5, 7, 11, 13, 17, 19, 23,.. } Turns out, we can use a Perl regex to determine membership in this set.. and to factorize numbers Turns out, we can use a Perl regex to determine membership in this set.. and to factorize numbers /^(11+?)\1+$/
9
Prime Number Testing using Perl Regular Expressions L = {1 n | n is prime} is not a regular language Keys to making this work: \1 backreference unary notation for representing numbers, e.g. – 11111 “five ones” = 5 – 111111 “six ones” = 6 unary notation allows us to factorize numbers by repetitive pattern matching – (11)(11)(11) “six ones” = 6 – (111)(111) “six ones” = 6 numbers that can be factorized in this way aren’t prime – no way to get nontrivial subcopies of 11111 “five ones” = 5 Then /^(11+?)\1+$/ will match anything that’s greater than 1 that’s not prime can be proved using the Pumping Lemma for regular languages (later) can be proved using the Pumping Lemma for regular languages (later)
10
Prime Number Testing using Perl Regular Expressions Let’s analyze this Perl regex /^(11+?)\1+$/ ^ and $ anchor both ends of the strings, forces (11+?)\1+ to cover the string exactly (11+?) is non-greedy match version of (11+) \1+ provides one or more copies of what we matched in (11+?) Question: is the non-greedy operator necessary? Question: is the non-greedy operator necessary?
11
Prime Number Testing using Perl Regular Expressions Compare /^(11+?)\1+$/ with /^(11+)\1+$/ i.e. non-greedy vs. greedy matching finds smallest factor vs. largest –90021 factored using 3, not a prime (0 secs) vs. –90021 factored using 30007, not a prime (0 secs) affects computational efficiency for non-primes Puzzling behavior: same output non-greedy vs. greedy 900021 factored using 300007, not a prime (48 secs vs. 13 secs) Puzzling behavior: same output non-greedy vs. greedy 900021 factored using 300007, not a prime (48 secs vs. 13 secs)
12
Prime Number Testing using Perl Regular Expressions is of formal (i.e. geeky) interest only…
13
Prime Number Testing using Perl Regular Expressions testing with prime numbers only can take a lot of time to compute … Prime Numbers 100003 200003 300007 400009 500009 600011 700001 800011 900001 1000003 1100009 1200007 1300021 1400017 1500007
14
Prime Number Testing using Perl Regular Expressions /^(11+?)\1+$/ vs. /^(11+)\1+$/ i.e. non-greedy vs. greedy matching finds smallest factor vs. largest –90021 factored using 3, not a prime (0 secs) vs. –90021 factored using 30007, not a prime (0 secs) Puzzling behavior: same output non-greedy vs. greedy 900021 factored using 300007, not a prime (48 secs vs. 13 secs) Puzzling behavior: same output non-greedy vs. greedy 900021 factored using 300007, not a prime (48 secs vs. 13 secs)
15
Prime Number Testing using Perl Regular Expressions http://www.xav.com/perl/lib/Pod/perlre.html nearest primes to preset limit 32749 32771 3*32749 32766 3*32771 = 98247 = 98313
16
Prime Number Testing using Perl Regular Expressions When preset limit is exceeded: Perl’s regex matching fails quietly
17
Prime Number Testing using Perl Regular Expressions Can also get non-greedy to skip several factors Example: pick non-prime 164055 = 3 x 5 x 10937 (prime factorization) Non-greedy: missed factors 3 and 5 … Because 3 * 54685 = 164055 5 * 32811 = 164055 32766 limit 15 * 10937 = 164055 Because 3 * 54685 = 164055 5 * 32811 = 164055 32766 limit 15 * 10937 = 164055 greedy version
18
Prime Number Testing using Perl Regular Expressions Results are still right so far though: – wrt. prime vs. non-prime But we predict it will report an incorrect result for – 1,070,009,521 – it should claim (incorrectly) that this is prime – since 1070009521 = 32711 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.