Download presentation
Presentation is loading. Please wait.
1
LING 388: Computers and Language
Lecture 14
2
Administrivia Homework 7 out today – due Friday night by midnight
3
Unicode characters ok in Python 3.x
Python regex recap Unicode characters ok in Python 3.x Summary: \w a character [A-Za-z0-9_] \d [0-9] \b word boundary \s space character [ \t\n\r\f\v] Operators: * zero or more repeats + one or more repeats ( ) grouping Raw string (avoid escaping \): r"\w+" Negation: \W anything not in \w \D anything not in \d Methods: m = re.search(pattern, string) return match object or None m = re.match(pattern, string) l = re.findall(pattern, string) return list of strings/tuples Full Documentation:
4
Python regex More examples from
5
The trouble with re.findall()
Only capturing groups (…) are reported Example: >>> text = "ababcababababacabd" >>> import re >>> re.findall(r'(ab)+', text) ['ab', 'ab', 'ab'] >>> re.findall(r'((ab)+)', text) [('abab', 'ab'), ('abababab', 'ab'), ('ab', 'ab')]
6
The trouble with re.findall()
Example (using list comprehension): >>> text = "ababcababababacabd" >>> [tuple[0] for tuple in re.findall(r'((ab)+)', text)] ['abab', 'abababab', 'ab']
7
Review examples Regex for money: $ followed by digits
comma (for thousands, optional) decimal point (optional)
8
Python regex Other useful meta-characters: ^ matches beginning of line
$ matches end of line \n n = group number, must match identically to group
9
Python's re module
10
Python's re module
11
Python's re module
12
Homework 7 What went wrong on the High Street in 2018?
?intlink_from_url= ext/long-reads&link_location=live-reporting-story hw7.txt Using regexs in Python, find: Find the numbers in the article. List them. How many of them are there? Find all the named entities (approximately everything beginning with an uppercase letter denoting people, places, organizations etc.), e.g. Toys R Us or New Look. List them. How many of them are there? How could you filter out the words at the beginning of each sentence that aren't really named entities? Show your code. How many named entities now?
13
Homework 7 One PDF file Show your Python work
Submission by to me by Friday night
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.