Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.

Similar presentations


Presentation on theme: "1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008."— Presentation transcript:

1 1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008 Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008

2 2 Overview Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far! Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far!

3 3 Regular Expressions Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n) Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n)

4 4 Exercises >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’) >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’)

5 5 Regular Expressions Module Build in module Enhances basic functionality >>import re Build in module Enhances basic functionality >>import re

6 6 Regular Expressions - Syntax.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or

7 7 Functions split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6 split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6

8 8 Exercises >>import re >>string = “10 20 30 40” >>re.split(‘\s*, string) >>re.search(‘a’, string) >>import re >>string = “10 20 30 40” >>re.split(‘\s*, string) >>re.search(‘a’, string)

9 9 Problem You have 500 of the following data tables in separate text files

10 10 Desired Format

11 11 Rules The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited

12 12 Hints Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py


Download ppt "1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008."

Similar presentations


Ads by Google