Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388: Computers and Language

Similar presentations


Presentation on theme: "LING 388: Computers and Language"— Presentation transcript:

1 LING 388: Computers and Language
Lecture 15

2 Administrivia Reminder Homework 7 due Friday (or Saturday) night by midnight Printout of last lecture's terminal available as lecture14.txt

3

4 Python regex Methods: Key: RE = regex raw string, String = where to search import re re.match(RE, String) matching must start from start of String re.search(RE, String) searches anywhere in String re.findall(RE, String) re.finditer(RE, String) use with loop for m in re.finditer() re.sub(RE, SUB, String) SUB = regex raw string to substitute for RE

5 Substitution examples
Using re.sub(RE, SUB, String) Example: import re text =  "Google is a tech giant. Google is the most valuable company in the world." re.sub(r"Google","Microsoft",text) 'Microsoft is a tech giant. Microsoft is the most valuable company in the world.' text 'Google is a tech giant. Google is the most valuable company in the world.' re.sub(r"Google","Microsoft",text,1) 'Microsoft is a tech giant. Google is the most valuable company in the world.'

6 Substitution examples
Using re.sub(RE, SUB, String) Substitution using .sub() with backreferences and grouping: Suppose we want to change section{one} into subsection{one} [^}] means any character but } (..) capturing group

7 Running Python on the command line in Windows

8 More Python regex practice
Download wordlist.py (Brown Corpus words) to your computer Put it on the same directory as your Python Then run the following:

9 Python regex practice Exercise 1: Exercise 2: Exercise 3:
produce a list of all the words in wordlist that having two a's in a row aa = [word for word in wordlist if re.search('aa',word)] len(aa) Exercise 2: are there more words with two b's in a row? Exercise 3: words with two p's or b's or d's in a row – which is the most frequent?

10 Python regex practice Exercise 4: Exercise 5: Exercise 6: Exercise 7:
find a word with both bb and dd in it Exercise 5: are there any words with pp and dd? Exercise 6: find words ending in zac. How many are there? Recall: meta-character for the end of line anchor is $ Exercise 7: find words beginning in anti. How many are there? Hint: some cases may begin with a capital letter

11 Python regex practice Look for words with prefix "pre"
Are all of them correct? (cf. pretend) Devise a search that looks for words beginning with 'pre' but also contains the rest of the word as a word in the Brown corpus


Download ppt "LING 388: Computers and Language"

Similar presentations


Ads by Google