Download presentation
Presentation is loading. Please wait.
Published byRose Hudson Modified over 9 years ago
1
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008 Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008
2
2 Overview Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far! Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far!
3
3 Regular Expressions Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n) Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n)
4
4 Exercises >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’) >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’)
5
5 Regular Expressions Module Build in module Enhances basic functionality >>import re Build in module Enhances basic functionality >>import re
6
6 Regular Expressions - Syntax.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or
7
7 Functions split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6 split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6
8
8 Exercises >>import re >>string = “10 20 30 40” >>re.split(‘\s*, string) >>re.search(‘a’, string) >>import re >>string = “10 20 30 40” >>re.split(‘\s*, string) >>re.search(‘a’, string)
9
9 Problem You have 500 of the following data tables in separate text files
10
10 Desired Format
11
11 Rules The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited
12
12 Hints Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.