Download presentation
Presentation is loading. Please wait.
Published byTodd Young Modified over 9 years ago
1
REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University
2
Course organization 08-Sept-2014NLP, Prof. Howard, Tulane University 2 http://www.tulane.edu/~howard/LING3820/ http://www.tulane.edu/~howard/LING3820/ The syllabus is under construction. http://www.tulane.edu/~howard/CompCultEN/ http://www.tulane.edu/~howard/CompCultEN/
3
The quiz was the review. Review 08-Sept-2014 3 NLP, Prof. Howard, Tulane University
4
Open Spyder 08-Sept-2014 4 NLP, Prof. Howard, Tulane University
5
§4. Regular expressions 08-Sept-2014 5 NLP, Prof. Howard, Tulane University
6
Regular expressions, or regex >>> import re re.findall(pattern, target string) 08-Sept-2014NLP, Prof. Howard, Tulane University 6
7
4.2. Fixed-length matching 08-Sept-2014 7 NLP, Prof. Howard, Tulane University
8
The test string >>> S = '''This above all: to thine own self be true,... And it must follow, as the night the day,... Thou canst not then be false to any man.''' 08-Sept-2014NLP, Prof. Howard, Tulane University 8
9
Strings as regular expressions >>> re.findall(' be ', S) [' be ', ' be '] 08-Sept-2014NLP, Prof. Howard, Tulane University 9
10
Match one character of a disjunction with | >>> re.findall(' to | be | it | as ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to '] >>> set(re.findall(' to | be | it | as ', S)) set([' it ', ' as ', ' to ', ' be ']) 08-Sept-2014NLP, Prof. Howard, Tulane University 10
11
Match a group of characters with capturing or non-capturing parentheses, () >>> re.findall(' (to|be|it|as) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] R>>> re.findall(' (?:to|be|it|as) ', S) [' to ', ' be ', ' it ', ' as ', ' be ', ' to '] The default behavior of parentheses is to capture the string inside them in the output. The ?: prefix turns capturing off. For the rest of this discussion, we prefer to exclude the spaces from the output. 08-Sept-2014NLP, Prof. Howard, Tulane University 11
12
Match one character of a range with [] and its negation with [^] >>> re.findall(' ([a-z][a-z]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([^0-9][^0-9]) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' ([a-e][a-e]) ', S) ['be', 'be'] >>> re.findall(' ([^a-e][^a-e]) ', S) ['to', 'it', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 12
13
Match a number of repetitions of a character with {} >>> re.findall(' ([a-z]{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 13
14
Match any character with. >>> re.findall(' (..) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] >>> re.findall(' (.{2}) ', S) ['to', 'be', 'it', 'as', 'be', 'to'] 08-Sept-2014NLP, Prof. Howard, Tulane University 14
15
4.2.7. and following Next time 08-Sept-2014NLP, Prof. Howard, Tulane University 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.