Download presentation
Presentation is loading. Please wait.
1
LING 388: Computers and Language
Lecture 15
2
Administrivia Reminder Homework 7 due Friday (or Saturday) night by midnight Printout of last lecture's terminal available as lecture14.txt
4
Python regex Methods: Key: RE = regex raw string, String = where to search import re re.match(RE, String) matching must start from start of String re.search(RE, String) searches anywhere in String re.findall(RE, String) re.finditer(RE, String) use with loop for m in re.finditer() re.sub(RE, SUB, String) SUB = regex raw string to substitute for RE
5
Substitution examples
Using re.sub(RE, SUB, String) Example: import re text = "Google is a tech giant. Google is the most valuable company in the world." re.sub(r"Google","Microsoft",text) 'Microsoft is a tech giant. Microsoft is the most valuable company in the world.' text 'Google is a tech giant. Google is the most valuable company in the world.' re.sub(r"Google","Microsoft",text,1) 'Microsoft is a tech giant. Google is the most valuable company in the world.'
6
Substitution examples
Using re.sub(RE, SUB, String) Substitution using .sub() with backreferences and grouping: Suppose we want to change section{one} into subsection{one} [^}] means any character but } (..) capturing group
7
Running Python on the command line in Windows
8
More Python regex practice
Download wordlist.py (Brown Corpus words) to your computer Put it on the same directory as your Python Then run the following:
9
Python regex practice Exercise 1: Exercise 2: Exercise 3:
produce a list of all the words in wordlist that having two a's in a row aa = [word for word in wordlist if re.search('aa',word)] len(aa) Exercise 2: are there more words with two b's in a row? Exercise 3: words with two p's or b's or d's in a row – which is the most frequent?
10
Python regex practice Exercise 4: Exercise 5: Exercise 6: Exercise 7:
find a word with both bb and dd in it Exercise 5: are there any words with pp and dd? Exercise 6: find words ending in zac. How many are there? Recall: meta-character for the end of line anchor is $ Exercise 7: find words beginning in anti. How many are there? Hint: some cases may begin with a capital letter
11
Python regex practice Look for words with prefix "pre"
Are all of them correct? (cf. pretend) Devise a search that looks for words beginning with 'pre' but also contains the rest of the word as a word in the Brown corpus
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.