LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia.
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
LING 388: Language and Computers Sandiway Fong Lecture 17.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong. Continuing with Perl Homework 3: first Perl homework – due Sunday by midnight – one PDF file, by .
Programming Languages Meeting 13 December 2/3, 2014.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong. Today’s Topics Did you read Chapter 1 of JM? – Short Homework 2 (submit by midnight Friday) Today is Perl.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
LING/C SC/PSYC 438/538 Lecture 3 8/30 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
LING 408/508: Programming for Linguists Online Lecture 6 September 14 th.
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong. Homework 4 Submit one PDF file Your submission should include code and sample runs Due date Monday 21.
LING/C SC/PSYC 438/538 Lecture 9 Sandiway Fong. Adminstrivia Homework 4 graded Homework 5 out today – Due Saturday night by midnight – (Gives me Sunday.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong. Today's Topics A note on the UIUC POS Tagger Fun with POS Tagging Perl regex wrap-up.
LING/C SC/PSYC 438/538 Online Lecture 7 Sandiway Fong.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
Lecture 9: Part of Speech
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong.
Grep Allows you to filter text based upon several different regular expression variants Basic Extended Perl.
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 5 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING 388: Computers and Language
LING 388: Computers and Language
LING 388: Computers and Language
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 6 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Lecture 7 HMMs – the 3 Problems Forward Algorithm
LING 408/508: Computational Techniques for Linguists
LING 408/508: Computational Techniques for Linguists
Programming Languages
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong.
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Lecture 31 – Practice Exercises 7
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 4 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong

Today's Topics Homework 5: regex Review of ungraded homework exercises More Perl regex

Homework 5 File: hw5.pos Source: Penn Treebank POS (Part-of-Speech) tagged sentences

Homework 5 You may use either Perl or Python, e.g. for one-line answers: perl -ne 'code END { code }' hw5.pos Write a regular expression to extract all the proper nouns (POS tag: NNP singular or NNPS plural). Hint: you may wish to print the nouns out to debug your regex… Question 1: How many proper noun (tokens) are there? Question 2: How many different proper nouns (types)? Question 3: How many different plural proper nouns (types)? Question 4: What is the most frequent proper noun and its frequency? Question 5: What is the most frequent plural proper noun and its frequency? Question 6: Print the top 5 proper nouns and frequencies

Homework 5 Extra Credit Question 7: print out the frequency table for determiners (POS tag: DT) in hw5.pos. Note: be case-insensitive In your opinion, does this table follow Zipf's Law?

Homework 5 Due date: Usual instructions: next Monday midnight 438/538 Homework 5 Your Name one PDF file!

Review: Ungraded Homework Exercises

Review: Ungraded Homework Exercises Perl one-liners: perl -ne 'm/regex/ and print "$.:$&\n"' -e execute Perl code -n put code inside implicit while loop (uses default variable $_) $. is line number $& is whole match grouping (..) and backreferences, e.g. \1 anchors ^ and $

Review: Ungraded Homework Exercises #3: repeated.txt #4: ab.txt #5: integerline.txt

Zipf's Law Zipf's Law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. See: http://demonstrations.wolfram.com/ZipfsLawAppliedToWordAndLetterFrequencies/ Brown Corpus (1,015,945 words): only 135 words are needed to account for half the corpus. On a Log – Log scale: almost straight line http://www.learnholistically.it/esp-clil/wfk2.htm https://finnaarupnielsen.wordpress.com/2013/10/22/zipf-plot-for-word-counts-in-brown-corpus/

Character Frequency Counting Sample code is rather interesting: -e flag - evaluate the right-hand side as an expression Generally (see next slide): (?{ Perl code }) Slightly modified but easier to read: note: lc(..) for lowercase

(?{ Perl code })

Character Frequency Counting Does Zipf's Law apply to character frequencies?

Character Frequency Counting