Programming and Perl for Bioinformatics Part III.

Slides:



Advertisements
Similar presentations
1 Perl Syntax: substitution s// and character replacement tr//
Advertisements

Designing Algorithms Csci 107 Lecture 4. Outline Last time Computing 1+2+…+n Adding 2 n-digit numbers Today: More algorithms Sequential search Variations.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
Scripting Languages Chapter 5 Hashes. Hash Data structure, not unlike an array – it will hold any number of values It indexes values by name – not by.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Hashes a “hash” is another fundamental data structure, like scalars and arrays. Hashes are sometimes called “associative arrays”. Basically, a hash associates.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
Practical Extraction & Report Language Picture taken from
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
Guide To UNIX Using Linux Third Edition
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
Tutorial 14 Working with Forms and Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Practical Extraction & Report Language PERL Joseph Beltran.
PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Introduction to Perl & BioPerl Dr G. P. S. Raghava Bioinformatics Centre Bioinformatics Centre IMTECH, Chandigarh Web:
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Lecture 8 perl pattern matching features
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
4 1 Array and Hash Variables CGI/Perl Programming By Diane Zak.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Prof. Alfred J Bird, Ph.D., NBCT -bird.wikispaces.umb.edu/ Office – McCormick 3rd floor.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
1 Perl Syntax: control structures Learning Perl, Schwartz.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Iteration While / until/ for loop. While/ Do-while loops Iteration continues until condition is false: 3 important points to remember: 1.Initialise condition.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
1 Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl CSCI 431 Programming Languages Fall 2003.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
A Few More Functions. One more quoting operator qw// Takes a space separated sequence of words, and returns a list of single-quoted words. –no interpolation.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Perl Scripting III Arrays and Hashes (Also known as Data Structures) Ed Lee & Suzi Lewis Genome Informatics.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
The Scripting Programming Language
PZ02CX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02CX - Perl Programming Language Design and Implementation.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Perl References arrays and hashes can only contain scalars (numbers and strings)‏ if we want something more complicated (like an array of arrays) we use.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Strings and Serialization
Perl Programming Language Design and Implementation (4th Edition)
Perl Variables: Array Web Programming.
CSCI 431 Programming Languages Fall 2003
PERL: part II hashes, foreach control statements, and the split function By: Kevin Walton.
Presentation transcript:

Programming and Perl for Bioinformatics Part III

Basic Data Types Perl has three basic data types: Perl has three basic data types: scalar scalar array (list) array (list) associative array (hash) associative array (hash)

Associative Arrays/Hashes List of scalar values (like array) List of scalar values (like array) Elements referred to by key, not index number Elements referred to by key, not index number Elements stored as a list of key-value pairs Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value key value key value key value print $threeletter{'A'}; # “ALA” print $threeletter{'A'}; # “ALA” print $threeletter{'L'}; ? print $threeletter{'L'}; ? exists checks if a specific hash key exists exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};

Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); keysreturns a list of all keys keysreturns a list of all keys valuesreturns a list of all values valuesreturns a list of all values each returns one key-value pair each time it’s called each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?

Associative Arrays Some common functions: Some common functions: keys(%hash) #returns a list of all the keys keys(%hash) #returns a list of all the keys values(%hash) #returns a list of all the values values(%hash) #returns a list of all the values each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array delete($hash{[key]}) #remove the pair associated #with key delete($hash{[key]}) #remove the pair associated #with key

More on Perl Subroutines and Functions Subroutines and Functions A way to organize a program A way to organize a program Wrap up a block of code Wrap up a block of code Have a name Have a name Provide a way to pass values to the block and report back the results Provide a way to pass values to the block and report back the results Regular expression Regular expression

Basics about Subroutines # define a subroutine # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) my ($arg1, $arg2, $arg3, …, $argN) is special variable containing args is special variable containing args print "Please enter something: "; print "Please enter something: ";} # function call # function call myblock($arg1, $arg2, …, $argN); Example Example sub add8A { sub add8A { my ($rna) $rna.= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU " ; $longer_rna = add8A($rna); print " I added 8 As to $rna to get $longer_rna.\n";

More example sub denaturizing { my my = (); = (); foreach $pairs { foreach $pairs { ($A,$B) = split /\s/, $pairs; ($A,$B) = split /\s/, = $A, = $A, $B); } #templates are in the form "A B". Ex. “ACGT =

Variables Scope A variable $a is used both in the subroutine and in the main part program of the program. A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; $a = 1;} print "$a\n"; changeA(); The value of $a is printed three times. Can you guess what values are printed? The value of $a is printed three times. Can you guess what values are printed? $a is a global variable $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; my $a = 1;} print "$a\n"; changeA();

Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) my($input) $dna = $input; $dna = $input; $dna =~ s/A/T/g; $dna =~ s/A/T/g; return $dna; return $dna;} Output?

Regular Expressions Regular Expressions: Language for specifying text strings Regular Expressions: Language for specifying text strings Regular Expressions is a mechanism for specifying character patterns Regular Expressions is a mechanism for specifying character patterns Useful for Useful for Finding files by name Finding files by name Finding text in a file Finding text in a file Finding (or not finding) interesting text in a string Finding (or not finding) interesting text in a string Text based search and replace Text based search and replace Finding and extracting text Finding and extracting text

Pattern Finding Problem: find an ORF in nucleotide sequence Look for start (ATG) and stop codons (TAA, TAG, TGA) Look for start (ATG) and stop codons (TAA, TAG, TGA) Pattern search operator: Pattern search operator: m// or // $string =~ / / returns true if the pattern matches somewhere in $string, false otherwise $string =~ / / returns true if the pattern matches somewhere in $string, false otherwise Example: Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }

Regular Expressions Optional characters ?, * and + Optional characters ?, * and + /colou?r/  color or colour /colou?r/  color or colour ? (0 or 1) ? (0 or 1) /oo*h!/  oh! or ooh! or ooooh! /oo*h!/  oh! or ooh! or ooooh! * (0 or more) * (0 or more) /o+h!/  oh! or ooh! or ooooh! + (1 or more) Wild cards. /beg.n/  begin or began or begun *+*+ Stephen Cole Kleene

White-space characters \t (tab), \n (newline), \r (return) \s: match a whitespace character x: character 'x'.: any character except newline ^r: match at beginning of line r$: match at end of line r|s: match either or (r): group characters (to be saved in $1, $2, etc) [xyz]: character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ]: character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r*: zero or more r's, where r is any regular expression r+: one or more r's r?: zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation) Common Regular Expressions

Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?

Exercise Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctcaga"; if ($dna =~ m/(atg(...)*((tag)|(taa)|(tga)))/) { print $1, "\n"; } else { print "does not exit!\n"; }