1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.

Slides:



Advertisements
Similar presentations
Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

COMP 116: Introduction to Scientific Programming Lecture 37: Final Review.
Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
Introduction To Perl Susan Lukose. Introduction to Perl Practical Extraction and Report Language Easy to learn and use.
Hossain Shahriar Announcement and reminder! Tentative date for final exam need to be fixed! Topics to be covered in this lecture(s)
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Oracle 11g: SQL Chapter 10 Selected Single-Row Functions.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
Regular Expression What is Regex? Meta characters Pattern matching Functions in re module Usage of regex object String substitution.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Module 6 – Generics Module 7 – Regular Expressions.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
Regular Expressions The ultimate tool for textual analysis.
(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
JavaScript III ECT 270 Robin Burke. Outline Validation examples password more complex Form validation Regular expressions.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
Java Script Pattern Matching Using Regular Expressions.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
Python – May 16 Recap lab Simple string tokenizing Random numbers Tomorrow: –multidimensional array (list of list) –Exceptions.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
Assignment #2. Regular Expression (RE) Represent a string pattern – Consists of regular characters and wild cards Assignment #2: implement a subset of.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Python Pattern Matching and Regular Expressions Peter Wad Sackett.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Regular Expressions.
Regular Expressions Upsorn Praphamontripong CS 1110
String Methods Programming Guides.
CS 330 Class 7 Comments on Exam Programming plan for today:
CIRC Summer School 2017 Baowei Liu
Regular Expressions in Perl
CSC1018F: Functional Programming
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong.
Advanced String handling
LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong.
CSCI 431 Programming Languages Fall 2003
CS 1111 Introduction to Programming Fall 2018
Introduction to Computer Science
Regular Expression: Pattern Matching
Presentation transcript:

1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008 Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008

2 Overview Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far! Regular Expressions Regular Expressions Module Formatting a dataset Utilize all skills learned so far!

3 Regular Expressions Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n) Regular expressions enable string manipulation, searching, and substitution Useful built in methods: –count(sub, start = 0, end=max)#returns the number of non-overlapping occurrences of substring –find(sub, start = 0, end = max)#returns position of first occurrence of string –isalnum() #returns True if all letters or numbers. Otherwise, returns False –isdigit()#returns True if when all characters are digits –lower()#lower case –strip()#removes end of line character (i.e., \n)

4 Exercises >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’) >>string = ‘the brown fox’ >>string.count(‘o’) >>string.find(‘o’) >>string.isalnum() >>string.isdigit() >>string.split(‘b’)

5 Regular Expressions Module Build in module Enhances basic functionality >>import re Build in module Enhances basic functionality >>import re

6 Regular Expressions - Syntax.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or.matches any character but \n *matches zero or more cases of the previous string +matches one or more cases of the previous string \dmatches one digit \Dmatches one non-digit \smatches a whitespace characters \Smatches any non-whitespace character \wmatches one alphanumeric character \Wmatches any non-alphanumeric character |alternative match, or

7 Functions split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6 split(pattern, string)# returns list split by pattern search(pattern, string) #returns location of string Examples import re string = ‘the brown fox’ re.split(‘\s*’, string)[‘the’,’brown’,’fox’] re.split(‘b|w’, string)['the ', 'ro', 'n fox'] re.search(‘z’, string)None f = re.search(‘o’, string) f.start()6

8 Exercises >>import re >>string = “ ” >>re.split(‘\s*, string) >>re.search(‘a’, string) >>import re >>string = “ ” >>re.split(‘\s*, string) >>re.search(‘a’, string)

9 Problem You have 500 of the following data tables in separate text files

10 Desired Format

11 Rules The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited The table Taxa.txt is an example of such a file Number of lines in header is not always consistent All headers have (‘Study:’, ‘Author:’, and ‘Date:’) Table always begins with Taxon_ID Number of columns and rows varies Table is space-delimited

12 Hints Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py Break the exercise into simple tasks open file read a line file evaluate a line with a regular expression loop through lines print to a file close files More hints in taxa.py