LING 388: Computers and Language

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

LING 388: Language and Computers Sandiway Fong Lecture 5: 9/8.
Learning Ruby Regular Expressions Get at practice page by logging on to csilm.usu.edu and selecting PROGRAMMING LANGUAGES|Regular Expressions.
Regular Expressions, Backus-Naur Form and Reverse Polish Notation.
Using Cabal and the Hackage Package Database. Hackage Hackage is a database of Haskell packages (or modules) written by others and available for public.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
LING 408/508: Programming for Linguists Lecture 19 November 4 th.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Python programs How can I run a program? Input and output.
Introduction to Engineering MATLAB – 6 Script Files - 1 Agenda Script files.
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Introduction to Bash Programming Ellen Zhang. Previous three classes What have we learnt so far ?
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Post-Module JavaScript BTM 395: Internet Programming.
Searching and Regular Expressions. Proteins 20 amino acids Interesting structures beta barrel, greek key motif, EF hand... Bind, move, catalyze, recognize,
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
CHEMISTRY FORM 3 (Mauritius)
REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
LING 408/508: Programming for Linguists Lecture 20 November 16 th.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
SlideSet #19: Regular expressions SY306 Web and Databases for Cyber Operations.
1 Project 4: Palindrome Detector. 2 Assignment Write a C++ program that reads a line of text from the keyboard and reports to the user whether the text.
Introduction to Automata Theory Theory of Computation Lecture 6 Tasneem Ghnaimat.
Recap: Nondeterministic Finite Automaton (NFA) A deterministic finite automaton (NFA) is a 5-tuple (Q, , ,s,F) where: Q is a finite set of elements called.
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
Regular Expressions Upsorn Praphamontripong CS 1110
Perl Regular Expression in SAS
Theory of Computation Lecture # 9-10.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
Introduction to computing
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
LING 388: Computers and Language
LING 388: Computers and Language
LING 388: Computers and Language
LING 388: Computers and Language
Conquer the Word! Use letter-sound matches and structural analysis to decode grade level words.
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
LING 388: Computers and Language
LING 388: Computers and Language
IOTA HOW TO START BUILDING.
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
i206: Lecture 19: Regular Expressions, cont.
LING 408/508: Computational Techniques for Linguists
CEV208 Computer Programming
Class Examples.
LING 408/508: Computational Techniques for Linguists
Regular expressions, egrep, and sed
Put the dots on the shift keys.
3.1 Basic Concept of Directory and Sub-directory
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.
CSE 303 Concepts and Tools for Software Development
Chapter 11: Indexing and Hashing
12. Web Spidering These notes are based, in part, on notes by Dr. Raymond J. Mooney at the University of Texas at Austin.
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Lab 8: Regular Expressions
Basic 9 Mr. Husch.
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
LING 388: Computers and Language
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING 388: Computers and Language Lecture 15

Administrivia Reminder Homework 7 due Friday (or Saturday) night by midnight Printout of last lecture's terminal available as lecture14.txt

Python regex Methods: Key: RE = regex raw string, String = where to search import re re.match(RE, String) matching must start from start of String re.search(RE, String) searches anywhere in String re.findall(RE, String) re.finditer(RE, String) use with loop for m in re.finditer() re.sub(RE, SUB, String) SUB = regex raw string to substitute for RE

Substitution examples Using re.sub(RE, SUB, String) Example: import re text =  "Google is a tech giant. Google is the most valuable company in the world." re.sub(r"Google","Microsoft",text) 'Microsoft is a tech giant. Microsoft is the most valuable company in the world.' text 'Google is a tech giant. Google is the most valuable company in the world.' re.sub(r"Google","Microsoft",text,1) 'Microsoft is a tech giant. Google is the most valuable company in the world.'

Substitution examples Using re.sub(RE, SUB, String) Substitution using .sub() with backreferences and grouping: Suppose we want to change section{one} into subsection{one} [^}] means any character but } (..) capturing group

Running Python on the command line in Windows

More Python regex practice Download wordlist.py (Brown Corpus words) to your computer Put it on the same directory as your Python Then run the following:

Python regex practice Exercise 1: Exercise 2: Exercise 3: produce a list of all the words in wordlist that having two a's in a row aa = [word for word in wordlist if re.search('aa',word)] len(aa) Exercise 2: are there more words with two b's in a row? Exercise 3: words with two p's or b's or d's in a row – which is the most frequent?

Python regex practice Exercise 4: Exercise 5: Exercise 6: Exercise 7: find a word with both bb and dd in it Exercise 5: are there any words with pp and dd? Exercise 6: find words ending in zac. How many are there? Recall: meta-character for the end of line anchor is $ Exercise 7: find words beginning in anti. How many are there? Hint: some cases may begin with a capital letter

Python regex practice Look for words with prefix "pre" Are all of them correct? (cf. pretend) Devise a search that looks for words beginning with 'pre' but also contains the rest of the word as a word in the Brown corpus