Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
Programming and Perl for Bioinformatics Part III.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
Regular Expressions.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
Scripting Languages Chapter 8 More About Regular Expressions.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Lecture 8 perl pattern matching features
Sys.Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 18: Regular Expressions in PHP.
Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction To Perl Susan Lukose. Introduction to Perl Practical Extraction and Report Language Easy to learn and use.
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Regular Expressions CISC/QCSE 810. Recognizing Matching Strings ls *.exe translates to "any set of characters, followed by the exact string ".exe" The.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
1 Perl Syntax: control structures Learning Perl, Schwartz.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Searching and Regular Expressions. Proteins 20 amino acids Interesting structures beta barrel, greek key motif, EF hand... Bind, move, catalyze, recognize,
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
JavaScript, Part 2 Instructor: Charles Moen CSCI/CINF 4230.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Python Pattern Matching and Regular Expressions Peter Wad Sackett.
Chapter 18 The HTML Tag
CS 330 Class 7 Comments on Exam Programming plan for today:
Regular Expressions and perl
CSCI 431 Programming Languages Fall 2003
Functions, Regular expressions and Events
CIT 383: Administrative Scripting
Introduction to Computer Science
Perl Regular Expressions – Part 1
Presentation transcript:

Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print "No"; }

Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print "No"; } =~ is a binding operator and means: perform the following action on this variable. The following action m/atg/ in this case is a substring search, with the "m" for "match"' and substring "atg".

Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print "No"; } If the substring occurs, the statement will return TRUE and the if- block will be executed. The value of $sequence does not change by the match.

Finding substrings, repeated my $sequence = "gatgcaggctcgctagcggct"; my $count = 0; while($sequence =~ m/ggc/g) { $count++; } print "$count matches for gcc\n";

m//g 'g' option allows repeated matching, because the position of the last match is remembered

Finding substrings, repeated my $sequence = "gatgcaggctcgctagcggct"; my $count = 0; while($sequence =~ m/ggc/g) { $count++; } print "$count matches for gcc\n";

Finding substrings, repeated my $sequence = "gatgcaggctcgctagcggct"; my $codon = "ggc"; my $count = 0; while($sequence =~ m/$codon/g) { $count++; } print "$count matches for $codon\n";

Position after last match my $sequence = "gatgcaggctcgctagcggct"; my $codon = "ggc"; print "looking for $codon from 0\n"; while($sequence =~ m/$codon/g) { print "found, will continue from: "; print pos($sequence),"\n"; }

Position after last match my $sequence = "gatgcaggctcgctagcggct"; my $codon = "ggc"; pos($sequence) = 10; print "looking for $codon from 10\n"; while($sequence =~ m/$codon/g) { print "found, will continue from: "; print pos($sequence),"\n"; }

Replacing substrings my $sequence = "gatgcagaattcgctagcggct"; print $sequence,"\n"; #Replace the EcoRI site with '******' $sequence =~ s/gaattc/******/; # gatgca******gctagcggct #Replace all the other characters with space $sequence =~ s/[^*]/ /g; print $sequence,"\n"; Output: gatgcagaattcgctagcggct ******

Examples of regular expressions s/World/Wur/ replaces World with Wur, making "Hello World" "Hello Wur" s/t/u/ replaces the first 't' with 'u', "atgtag" becomes "augtag" s/t/u/g replaces all 't's with 'u's, "atgtag" becomes "auguag" s/[gatc]/N/g replaces all g,a,t,c's with N, "atgtag" becomes "NNNNNN" s/[^gatc]//g replaces all characters that are not g,a,t or c with nothing s/a{3}/NNN/g replaces all 'aaa' with 'NNN', "taaataa" becomes "tNNNtaa" m/sq/i match 'sq', 'Sq', 'sQ' and SQ: case insensitive m/^SQ/ match 'SQ' at the beginning of the string m/^[^S]/ match strings that do not begin with 'S' m/att?g/ match 'attg' and 'atg' m/a.g/ match 'atg', 'acg', 'aag', 'agg', 'a g', 'aHg' etc. s/(\w+) (\w+)/$2 $1/ swap two words, "one two" => "two one" m/atg(…)*?(ta[ag]|tga)/ matches an ORF

The matched strings are stored my $text = "This is a piece of text\n"; print $text; $word = 0; while($text =~ /(\w+)\W/g) { $word++; print "word $word: $1\n"; }

The matched strings are stored my $text = "one two"; $text =~ /(\w+) (\w+)/g print "word one:$1 "; print "word two:$2 "; print "complete string: $&";

The matched strings are stored my $sequence = "gatgcaggctcgctagcggct"; while ($sequence =~ m/([acgt]{3})/g) { print "$1\n"; }

Special characters \ttab \nnewline \rreturn (CR) \b"word" boundary \Bnot a "word" boundary \wmatches any single character classified as a "word" character (alphanumeric or _) \Wmatches any non-"word" character \smatches any whitespace character (space, tab, newline) \Smatches any non-whitespace character \dmatches any digit character, equiv. to [0-9] \Dmatches any non-digit character \xhhcharacter with hex. code hh

Metacharacters ^beginning of string $end of string.any character except newline *match 0 or more times +match 1 or more times ?match 0 or 1 times; or shortest match |alternative ( )grouping, or storing [ ]set of characters { }repetition modifier \quote or special

Repetition a*zero or more a's a+ one or more a's a? zero or one a's (i.e., optional a) a{m} exactly m a's a{m,} at least m a's a{m,n} at least m but at most n a's a{0,n} at most n a's $mRNAsequence = "aaaauaaaaa"; $mRNAsequence =~ m/a{2,}ua{3,}/;

Greediness Pattern matching in Perl by default is greedy, which means that it will try to match as much characters as possible. This can be prevented by appending the ? Operator to the expression $sequence = "atgtagtagtagtagtag"; #This will replace the entire string: s/atg(tag)*// #This will stop matching at the first tag: s/atg(tag)*?//

open SEQFILE, "example1.fasta"; my $sequence = ""; my $ID = ; while ( ) { chomp; $sequence.= $_; } print $ID; print $sequence,"\n"; #SmaI striction (ccc^ggg) $sequence =~ s/cccggg/ccc^ggg/g; #PvuII striction (cag^ctg) $sequence =~ s/cagctg/cag^ctg/g; = split '\^', $sequence; print "\n", "-"x90, "\n"; print "Digested sequence:\n",$sequence,"\n\n"; print "-"x90,"\n"; print "Fragments:\n"; foreach { print $fragment,"\n"; print "-"x90,"\n"; }

>BTBSCRYR Bovine mRNA for lens beta-s-crystallin... tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttcaaggccgccactatgacagcgattgcgactgtgcagatttcc acatgtacctgagccgctgcaactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgggtacatgtacatcctaccccgg ggcgagtatcctgagtaccagcactggatgggcctcaacgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttcagat ctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttccatcatggagcagttccacatgcgggaggtccactcctgtaagg tgctggagggcgcctggatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagtaccggaagcccgtcgactggggtgca gcttccccagctgtccagtctttccgccgcattgtggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattataaataa aacaattggcatgc Digested sequence: tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttcaaggccgccactatgacagcgattgcgactgtgcagatttcc acatgtacctgagccgctgcaactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgggtacatgtacatcctacccc^g gggcgagtatcctgagtaccagcactggatgggcctcaacgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttcaga tctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttccatcatggagcagttccacatgcgggaggtccactcctgtaag gtgctggagggcgcctggatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagtaccggaagcccgtcgactggggtgc agcttccccag^ctgtccagtctttccgccgcattgtggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattataaat aaaacaattggcatgc Fragments: tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttcaaggccgccactatgacagcgattgcgactgtgcagatttcc acatgtacctgagccgctgcaactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgggtacatgtacatcctacccc ggggcgagtatcctgagtaccagcactggatgggcctcaacgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttcag atctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttccatcatggagcagttccacatgcgggaggtccactcctgtaa ggtgctggagggcgcctggatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagtaccggaagcccgtcgactggggtg cagcttccccag ctgtccagtctttccgccgcattgtggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattataaataaaacaattggc atgc

Exercises 6.Create a script to find the DNA fragments you get after cutting the sequence in the example1.fasta file with AluI and with AvaI 7.Find the open reading frames in the example1.fasta sequence 8.Translate the open reading frames to protein, using the standard genetic code from the Geneticcode database (