1.0.1.8 – Introduction to Perl 12/12/2015 1.0.1.8.7 - Introduction to Perl - Searching and Replacing Text 1 1.0.1.8.7 Introduction to Perl Session 7 ·

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Arrays A list is an ordered collection of scalars. An array is a variable that holds a list. Arrays have a minimum size of 0 and a very large maximum size.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
1 Perl Syntax: substitution s// and character replacement tr//
Lists Introduction to Computing Science and Programming I.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
7.1 Last time on: Pattern Matching. 7.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
CMPT 120 Lists and Strings Summer 2012 Instructor: Hassan Khosravi.
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Subroutines Just like C, PERL offers the ability to use subroutines for all the same reasons – Code that you will use over and over again – Breaking large.
Tutorial 14 Working with Forms and Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
1 An Introduction to Perl Part 2 CSC8304 – Computing Environments for Bioinformatics - Lecture 8.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
Regular Expressions in.NET Ashraya R. Mathur CS NET Security.
Input Validation with Regular Expressions COEN 351.
PMS /134/182 HEX 0886B6 PMS /39/80 HEX 5E2750 PMS /168/180 HEX 00A8B4 PMS /190/40 HEX 66CC33 By Adrian Gardener Date 9 July 2012.
Invitation to Computer Science, Java Version, Second Edition.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
– Introduction to Perl 10/20/ Introduction to Perl - Lists and Arrays Introduction to Perl Session 3 · lists and arrays.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
– Introduction to Perl 10/25/ Introduction to Perl - Recipes and Idioms Introduction to Perl Session 8 · recipes and.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Prof. Alfred J Bird, Ph.D., NBCT -bird.wikispaces.umb.edu/ Office – McCormick 3rd floor.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
– Intermediate Perl 6/3/ Intermediate Perl - sort/grep/map Intermediate Perl – Session 4 · scope · strict pragma · my.
Post-Module JavaScript BTM 395: Internet Programming.
By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.
Python uses boolean variables to evaluate conditions. The boolean values True and False are returned when an expression is compared or evaluated.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
For more notes and topics VISIT: IMPLEMENTATION OF STACKS eITnotes.com.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
1 Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl CSCI 431 Programming Languages Fall 2003.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex Introduction to Perl Session 2 · manipulating.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
A Few More Functions. One more quoting operator qw// Takes a space separated sequence of words, and returns a list of single-quoted words. –no interpolation.
C++ String Class nalhareqi©2012. string u The string is any sequence of characters u To use strings, you need to include the header u The string is one.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
More Sequences. Review: String Sequences  Strings are sequences of characters so we can: Use an index to refer to an individual character: Use slices.
2.1 Scalar data - revision numeric e-14 ( = 6.35 × )‏ operators: + (addition) - (subtraction) * (multiplication) / (division)
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 8.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Lecture 19 Strings and Regular Expressions
CST8177 sed The Stream Editor.
Containers and Lists CIS 40 – Introduction to Programming in Python
CSC 108H: Introduction to Computer Programming
Context.
CSCI 431 Programming Languages Fall 2003
Algorithms Take a look at the worksheet. What do we already know, and what will we have to learn in this term?
Functions, Regular expressions and Events
Introduction to Computer Science
Presentation transcript:

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text Introduction to Perl Session 7 · global searches · context of =~ · replacement operator

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 2 Recap of substr() · we’ve seen how substr() can be used to manipulate a string · extract, insert, replace, remove · regions affected by substr() are defined by position, not content # get the first 5 characters substr($string,0,5); # insert “abc” at position 3 (after 3rd character) substr($string,3,0,”abc”); # replace first 5 characters with “abc” substr($string,0,5) = “abc”; # replace first 5 characters and retrieve what was replaced $old = substr($string,0,5,”abc”); # remove 5 characters at position 3 substr($string,3,5,””);

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 3 Recap of =~ · we used the operator =~, which binds a string to a pattern match, to test a string using a regular expression · so far, we only tested whether the regex matched · we will now look at how to extract · what was matched · a*b can match b, ab, aab, aaab,... · how many times a match was found · where in the string a match was found · we will also see how to use the replacement operator s/// to replace parts of a string which match a regex if( $string =~ /REGEX/ ) {... }

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 4 Capturing Matches · capture brackets are used to extract parts of a string that matched a regex · text captured is available via special variables $string = “sheep”; if ( $string =~ /e*p/ ) { # we know it matched, but we don’t know what part of $string matched } if ( $string =~ /(e*p)/ ) { # text within capture brackets available in pattern buffer $1 $matched = $1; print “$matched in $string matched”; } eep in sheep matched

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 5 Pattern Buffers · the pattern buffers $1, $2, $3 store the text matched within first, second, third,... set of capture brackets · $n is an empty string if no text matched $string = “53 big sheep”; if ( $string =~ /(\d+) \w+ (\w+)/ ) { ($number,$animal) = ($1,$2); print “saw $number $animal”; } saw 53 sheep

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 6 Pattern Buffers · buffers are reassigned values on each successful search · be careful when using $n, since values may become reset or go out of scope · $n defined until end of current code block or next successful search, which ever first · use special to determine number/location of submatches match start match end $string = “53 big sheep”; if ( $string =~ /(\d+) \w+ (\w+)/ ) { $string =~ /(pig)/; # pattern buffers not reset $string = ~ /(.ig)/; # pattern buffers reset ($number,$animal) = ($1,$2); print “saw $number $animal”; }

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 7 Bypassing Pattern Buffers · the match operator can return the matched text directly, depending on the context · in scalar context, =~ returns the number of captured matches · in list context, =~ returns the text of captured matches · we have already seen the use of =~ in scalar context · now we turn to =~ in list context $string = “53 big sheep”; # scalar context, no capture brackets – returns 0/1 match success my $result = $string =~ /\w/; $result  1

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 8 Match List Context · =~ will return the patterns that matched within the capture brackets · remember that the pattern buffers $1, $2, $3 will store the contents captured by the brackets · several special variables store pattern buffer result stores offsets of the end of each pattern match stores offsets of the start of each pattern match · $+ stores the last pattern match · $#- or $#+ stores the number of patterns matched · $n can be expressed as substr($string, $+[n], $+[n] - $-[n] ); $string = “53 big sheep”; = $string =~ /(\w)(\w)  qw(5 3 b)

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 9 $+ · three special variables help interrogate the search results $string = “ ”; = $string =~ /.([1-3]+)..([6-8]+)/; # $+ stores the last successfully matched subpattern print $+; 678 stores the positions of match starts of subpatterns # $-[0] holds the offset of start of the whole match stores the positions of match ends of subpatterns # $-[0] holds the offset of end of the whole match

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 10 Global Matching · so far, we’ve written a regular expression that may match multiple parts of interest in a string · we can find all match instances of a regular expression by using global matching · global matching is toggled using /g flag · in a list context, a global match will return all matches on a string to a pattern $clone = “M0123B03”; if ($clone =~ /(\w)(\d{4})(\w)(\d{2})/) { ($lib,$plate,$wellchr,$wellint) = ($1,$2,$3,$4); } $string = “53 big = $string =~  qw( i e e )

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 11 Example with /g · extracting all subsequences matching a regex # random 1000-mer $seq = make_sequence(bp=>"agtc",len=>1000); # all subsequences matching = $seq =~ /at.gc/g; sub make_sequence { = split("",$args{bp}); $seq = ""; for (1..$args{len}) { $seq.= } return $seq; } atcgc atagc atagc

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 12 /g with capture brackets · capture brackets can be used with /g to narrow down what is returned · if no capture brackets are used, /g behaves as if they flanked the whole pattern · /at.gc/g equivalent to /(at.gc)/g # random 1000-mer $seq = make_sequence(bp=>"agtc",len=>1000); # all subsequences matching = $seq =~ /at(.)gc/g; c a a

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 13 /g with multiple capture brackets · if you have multiple capture brackets in a /g match, each matched subpattern will be added to the list $string = “a1b2c3”; # on each iteration of the match two elements will be pushed onto the = $string =~ /(.)(.)/g; a 1 b 2 c 3

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 14 /g in scalar context · in scalar context, the global match returns 0 or 1 based on the success of the next match in the string · it keeps track of the previous match · used in conjunction with while $seq = make_sequence(bp=>"agtc",len=>1000); while ($seq =~ /(at.gc)/g) { $match = $1; print “matched $match”; } matched atcgc matched attgc matched atcgc

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 15 /g in scalar context · to determine where the match took place, use pos · pos $string returns the position after the last match $seq = make_sequence(bp=>"agtc",len=>1000); while ($seq =~ /(at.gc)/g) { $match = $1; $matchpos = pos $seq; print "matched $match at ",$matchpos-5,” around “,substr($seq,$matchpos-7,9); } matched atcgc at 106 around ccatcgccc matched atggc at 241 around atatggcga matched atggc at 271 around agatggctc matched attgc at 507 around tcattgcgc

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 16 Manipulating Search Cursor · pos($string) returns the current position of the search cursor · within a while loop, this is the position at the end of the last successful match · you can adjust the position of the cursor by changing the value of pos($string) · pos can act like an l-value (just like substr) $seq = make_sequence(bp=>"agtc",len=>10); while ($seq =~ /(..)/g) { print "matched $1 at “, pos $seq; # back up the cursor one character pos($seq)--; } attgatgatt matched at at 2 matched tt at 3 matched tg at 4 matched ga at 5... · adjusting cursor position is the only way to return overlapping search results · in this example, we return all pairs of adjacent bases in the string, not just abutting ones · a search finds pair bp[i]bp[i+1] and the cursor is at i+2 at the end of the search · to find bp[i+1]bp[i+2] we need to back the cursor up to i+1

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 17 Replacement Operator · we have seen how substr() can be used to replace subtext at specific position · what if we want to replace all occurrences of one substring with another? · we use s/REGEX/REPLACMENT/ · REPLACEMENT is not a regular expression – it is a string $seq = make_sequence(bp=>"agtc",len=>60); print $seq # replaces first substring matching “a” with “x” $seq =~ s/a/x/; print $seq; gtattgtgggaccttcctttcatcccgaagcattccgcgatgtggtccccggacctcagt gtxttgtgggaccttcctttcatcccgaagcattccgcgatgtggtccccggacctcagt # /g forces replacement everywhere $seq =~ s/a/x/g; print $seq; gtxttgtgggxccttcctttcxtcccgxxgcxttccgcgxtgtggtccccggxcctcxgt

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 18 Replacement Operator · s/// works nicely with capture brackets · here we refer to the successfully captured pattern buffer as $1 in the replacement string · s/// returns the number of replacements made $seq = make_sequence(bp=>"agtc",len=>40); print $seq $seq =~ s/(a)/($1)/g; cccgttaggctgtaccgaacaagtactaacaaagttacta cccgtt(a)ggctgt(a)ccg(a)(a)c(a)(a)gt(a)ct(a)(a)c(a)(a)(a)gtt(a)ct(a)

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 19 Replacement Operator · remember that the replacement string is not a regular expression, but a regular string which may incorporate $1, $2, etc $seq = make_sequence(bp=>"agtc",len=>40); print $seq; $seq =~ s/..(a)../..$1../g; print $seq; cccgtcaattgtttagtttactttaaaagtaacgaatttc cccg..a..tgt..a....a....a..a..a....a..tc

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 20 /e with Replacement Operator · the replacement operator has a allows you to execute the replacement string as if it were Perl code · in this example, the replacement is global, so it continues to replace all instances of \d · for each instance (a digit) it replaces it with 1+$1 (e.g. 1+2, 1+3, ) · before the replacement is made, it evaluates the expression (e.g. to yield 3, 4, 5...) $string = “12345”; $seq =~ s/(\d)/1+$1/eg; print $seq; 23456

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 21 Example of /e · replace all occurrences of a given basepair with a random base pair · /e is very powerful, but be diligent in its use · you are creating and evaluating Perl code at run time · some obvious security issues come to mind, if the code depends on user input $seq = make_sequence(bp=>"agtc",len=>40); print $seq; $seq =~ s/a/make_sequence(bp=>”agtc”,len=>1)/eg; print $seq; gtcccttgacaccatactggccggatacgtgagcccacga gtcccttggcgccattctggccgggttcgtgagcccgcgc

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 22 Example of /e · a common use of /e is to use sprintf to reformat the matched string · if you’re working for a dictatorship, you could use this censoring one-liner # replace all numbers with decimals with 3-decimal counterparts $seq =~ s/(\d+\.\d+)/sprintf(“%.3f”,$1)/eg; # replace 40 characters on left/right of a keyword # with [censored NNN characters] message $seq =~ s/(.{40}government.{40})/sprintf(“[censored %d characters]”,length($1))/eg;

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 23 Transliteration with tr/// · a quick and dirty replacement can be made with the transliteration operator, which replaces one set of characters with another · tr/SEARCHLIST/REPLACEMENTLIST/ · in this example, a  1 t  2 g  3 c  4 $seq = make_sequence(bp=>"agtc",len=>40); print $seq; $seq =~ tr/atgc/1234/; print $seq; ttgagtgatcagcgtgctcccgtaatggtcagaaaaacag

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 24 Transliteration with /d - deletion · you can use tr to delete characters · /d deletes found but unreplaced characters $seq = make_sequence(bp=>"agtc",len=>40); print $seq; $seq =~ tr/at//d; print $seq; ccgcgttgcgatgcttgattgaatttcagacccggcctgt ccgcggcggcggcgcccggccg print $seq; $seq =~ tr/gcat/12/d; print $seq; ggtcctccaacaggagtttacgttaatgattgtgcaaagg

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 25 Transliteration with /s - squashing · /s squashes repeated transliterated characters into a single instance · helpful to collapse spaces · if you do not provide a replacement list, then tr will squash repeats without altering rest of string $x = " "; $x =~ tr/1234/abcd/ # abbcccdddd $x =~ tr/1234/abcd/s # abcd $y = " "; $y =~ tr/ /_/s # 1_22_333_4444 $y =~ tr/ / /s # $y =~ tr/ //s # same as above $x = " "; $x =~ tr/0-9//s #

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text 26 Transliteration returns number of replacements · number of transliterations made is returned · use this to count replacements, or characters $x = " "; $cnt = $x =~ tr/1234/abcd/ # $x  abbcccdddd $cnt  10 $cnt = $x =~ tr/0-9// # $x unchanged $cnt  10 $y = "encyclopaedia"; $cnt = $y =~ tr/aeiou// # $y unchanged $cnt  6 # /c complements the search list – i.e., replace all non-vowel characters $cnt = $y =~ tr/aeiou//c # $y unchanged $cnt  7

– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text Introduction to Perl Session 7 · you now know · context of match operator · replacing text with s/// · use of transliteration tr///