Python Pattern Matching and Regular Expressions Peter Wad Sackett.

Slides:



Advertisements
Similar presentations
Python: Regular Expressions
Advertisements

Regular Expression ASCII Converting. Regular Expression Regular Expression is a tool to check if a string matches some rules. It is a very complicated.
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
28-Jun-15 String and StringBuilder Part I: String.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Scripting Languages Chapter 8 More About Regular Expressions.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Last Updated March 2006 Slide 1 Regular Expressions.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 9 More About Strings.
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Regular Expressions Chapter 11 Python for Informatics: Exploring Information
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions.
Review Please hand in your practicals and homework Regular Expressions with grep.
VBScript Session 13.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
CPTG286K Programming - Perl Chapter 7: Regular Expressions.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expression What is Regex? Meta characters Pattern matching Functions in re module Usage of regex object String substitution.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
17-Feb-16 String and StringBuilder Part I: String.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
INLS 560 – S TRINGS Instructor: Jason Carter. T YPES int list string.
Python Exceptions and bug handling Peter Wad Sackett.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Python I/O Peter Wad Sackett. 2DTU Systems Biology, Technical University of Denmark Classic file reading 1 infile = open(’filename.txt’, ’r’) for line.
Python Lists and Sequences Peter Wad Sackett. 2DTU Systems Biology, Technical University of Denmark List properties What are lists? A list is a mutable.
Python Simple file reading Peter Wad Sackett. 2DTU Systems Biology, Technical University of Denmark Simple Pythonic file reading Python has a special.
Strings in Python String Methods. String methods You do not have to include the string library to use these! Since strings are objects, you use the dot.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Hands-on Regular Expressions Simple rules for powerful changes.
Regular Expressions Upsorn Praphamontripong CS 1110
Strings and Serialization
Lecture 19 Strings and Regular Expressions
String Processing Upsorn Praphamontripong CS 1110
Strings Part 1 Taken from notes by Dr. Neil Moore
String and StringBuilder
String and StringBuilder
CS 1111 Introduction to Programming Fall 2018
Python Lists and Sequences
String and StringBuilder
CS 1111 Introduction to Programming Spring 2019
String methods 26-Apr-19.
Python Strings.
REGEX.
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Presentation transcript:

Python Pattern Matching and Regular Expressions Peter Wad Sackett

2DTU Systems Biology, Technical University of Denmark Simple matching with string methods Just checking for the presence of a substring in a string, use in mystr = ’I am here’ if ’am’ in mystr:print(’present) if ’are’ not in mystr:print(’absent’) The in operator also works with lists, tubles, sets and dicts. Finding the position of the substring, returns -1 if not present mystr.find(’am’) mystr.find(’am’, startpos, endpos) Method rfind does the same from the other direction index is similar to find, but raises ValueError if not present mystr.index(’are’) Methods startswith and endswith can be considered special cases of find. They give a True/False value. mystr.startswith(’I’) mystr.endswith(’ere’,-3)

3DTU Systems Biology, Technical University of Denmark Simple checks of strings The following methods returns True if the string only contains character of the appropiate type, False otherwise. Needs at least one char to return True. isalpha()alphabetic isdigit()digits isdecimal()float numbers, etc isnumeric()similar, covers special chars like ½ isalnum()all of above islower()contains only lowercase isupper()contains only uppercase isspace()contains only whitespace

4DTU Systems Biology, Technical University of Denmark Replacement and removal Returns a string with all occurrences of substring replaced mystr = ’Fie Fye Foe’ mystr.replace(’F’, ’L’) Result: Lie Lye Loe You can replace something with nothing. Where is that useful? Stripping strings, default whitespace rightStripppedString = mystr.rstrip() leftStripppedString = mystr.lstrip() bothSidesStripppedString = mystr.strip() You can specify which chars should be stripped. All are stripped until one is encountered, which should be be removed mystr.strip(’ieF’) Result: ’ Fye Fo’  Notice the leading space

5DTU Systems Biology, Technical University of Denmark Translation Translation is an efficient method to replace chars with other chars First make a char-to-char translation table translationTable = str.maketrans(’ATCG’,’TAGC’) Then use the table dna = ’ATGATGATCGATCGATCGATGCAT’ complementdna = dna.translate(translationTable) The dna has now been complemented. Chars not mentioned in the translation table will be untouched. This method has a use-case close to our hearts. OldNew AT TA CG GC

6DTU Systems Biology, Technical University of Denmark Regular Expressions - regex Regular expressions are very powerful pattern matching Python unfortunately made them cumbersome Uses the re library Full and complex documentation at The library supports both precompiled (more efficient) regex and simple regex. The general forms are regex = re.compile(pattern) result = regex.method(string) versus result = re.method(pattern, string) You will have a hard time understanding the following without an explanation.

7DTU Systems Biology, Technical University of Denmark Regex Patterns - Classes Any simple chars just matches themselves Built-in character classes \smatches a whitespace \Smatches a non-whitespace \dmatches a digit \Dmatches a non-digit \wmatches a wordchar which is a-zA-Z0-9_ \Wmatches a non-wordchar \nmatches newline.matches anything but newline Make your own classes with [] [aB4-6]matches only one of the chars aB456 [^xY]matches anything but x and Y

8DTU Systems Biology, Technical University of Denmark Regex Patterns - Quantifiers A single simple char just matches itself, but a quantifier can be added to determine how many times. ?Zero or one time +One or more times *Zero or more times The {} can be used to make a specific quantification {4}Four times {,3}At most three times {5,}Minimum five time {3,5}Between three and five times Quantifiers are greedy, can be made non-greedy with extra ? A few examples: A{3,4}C?Match AAAA AAA AAAAC AAAC \s\w{4}\sMatch any four-letter word in a sentence

9DTU Systems Biology, Technical University of Denmark Regex Patterns - Groups The parenthesis denote a group. A group belongs together. Example: ABC(xyz)?DEF matches both ABCDEF and ABCxzyDEF Either the entire group xyz is matched once or not (?) The content of the group can be captured, see later. Non-capturing group (?: ) The pipe sign | means or A(BC|DEF)G matches either ABCG or ADEFG Other special chars ^Must be first, bind a match to the start of line $Must be last, bind a match to the end of line \bWord-boundary, could be whitespace, comma, BoL, EoL

10DTU Systems Biology, Technical University of Denmark General flow of a regex Regular expressions are often used in loops Static regexes in loops benefit from compiling Compile the regex to generate a regex object myregexobj = re.compile(pattern) Use a method on regex object to generate a match object mymatchobject = myregexobject.search(string) These two steps can be combined mymatchobject = re.search(pattern, string) The match object can be investigated for matches mymatchobj.group(0)# Entire match mymatchobj.group(1)# First group mymatchobj.start(1)# Start of first group in string mymatchobj.end(1)# End of first group in string

11DTU Systems Biology, Technical University of Denmark Using regex - example Testing if there is a match mystr = ’In this string is an accession AB somewhere’ accregex = re.compile(r”\b[A-Z]{1,2}\d{6,8}\b”) if accregex.search(mystr) is not None: print(’Yeah, there is an accession number somewhere”) Capturing a match, notice parenthesis mystr = ’In this string is an accession AB somewhere’ accregex = re.compile(r”\b([A-Z]{1,2}\d{6,8})\b”) result = accregex.search(mystr) if result is None: print(’No match”) else: print(”I got one”, result.group(1))

12DTU Systems Biology, Technical University of Denmark Methods of re library Compile a regex, base for the rest of the methods compile(pattern) Find a match anywhere in the string search(string) Find a match only in the beginning of the string match(string) Split string on a pattern split(string) Return all matches as list of strings findall(string) Return string where matches are replaced with replacement string Count = 0 means all occurences sub(replacement, count=0)