1 Perl: subroutines (for sorting). 2 Good Programming Strategies for Subroutines #!/usr/bin/perl # example why globals are bad $one = ; $two = ; $max.

Slides:



Advertisements
Similar presentations
Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China.
Advertisements

The genetic code.
Arrays A list is an ordered collection of scalars. An array is a variable that holds a list. Arrays have a minimum size of 0 and a very large maximum size.
Center for Biological Sequence Analysis Prokaryotic gene finding Marie Skovgaard Ph.D. student
Protein Synthesis (making proteins)
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
Perl Functions Software Tools. Slide 2 Defining a Function l A user-defined function or subroutine is defined in Perl as follows: sub subname{ statement1;
Perl Functions Learning Objectives: 1. To learn how to create functions in a Perl’s program & how to call them 2. To learn how to pass [structured] arguments.
Crick’s early Hypothesis Revisited. Or The Existence of a Universal Coding Frame Axel Bernal UPenn Center for Bioinformatics Jean-Louis Lassez Coastal.
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
Perl Functions Learning Objectives: 1. To learn how to create functions in a Perl’s program & how to call them 2. To learn how to pass [structured] arguments.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Figure S1. Sequence alignment of yeast and horse cyt-c (Identity~60%), green highly conserved residues. There are 40 amino acid differences in the primary.
Dictionaries.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Nature and Action of the Gene
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Introduction to Python for Biologists Lecture 2 This Lecture Stuart Brown Associate Professor NYU School of Medicine.
Math 15 Introduction to Scientific Data Analysis Lecture 10 Python Programming – Part 4 University of California, Merced Today – We have A Quiz!
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
A.B. C. orf60(pOrf60) 042orf orf60(pOrf60-M5 ) orf60(pOrf60-M1) orf60(pOrf60-M4) 042orf60 042orf60(pOrf60-M5) orf60(pOrf60) 042orf60(pOrf60-M1)
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Perl Chapter 6 Functions. Subprograms In Perl, all subprograms are functions – returns 0 or 1 value – although may have “side-effects” optional function.
Supplementary materials
Dictionaries. A “Good morning” dictionary English: Good morning Spanish: Buenas días Swedish: God morgon German: Guten morgen Venda: Ndi matscheloni Afrikaans:
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Structure and Function of DNA DNA Replication and Protein Synthesis.
RA(4kb)- Atggagtccgaaatgctgcaatcgcctcttctgggcctgggggaggaagatgaggc……………………………………………….. ……………………………………………. ……………………….,……. …tactacatctccgtgtactcggtggagaagcgtgtcagatag.
 Molecules of DNA are composed of long chains of _______.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes is related to the physical properties of the amino acids.
Modelling Proteomes.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Python.
Supplemental Table 3. Oligonucleotides for qPCR
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
DNA and RNA.
Gene architecture and sequence annotation
PROTEIN SYNTHESIS RELAY
Molecular engineering of photoresponsive three-dimensional DNA
Fundamentals of Protein Structure
Python.
Station 2 Protein Synethsis.
6.096 Algorithms for Computational Biology Lecture 2 BLAST & Database Search Manolis Piotr Indyk.
DNA to proteins.
Presentation transcript:

1 Perl: subroutines (for sorting)

2 Good Programming Strategies for Subroutines #!/usr/bin/perl # example why globals are bad $one = ; $two = ; $max = &larger; print "$max\n"; sub larger { if ($one > $two) { $one; } else { $two; } In this example, it is tedious to use the same code for 2 different variables, such as $fred and $barney

3 #!/usr/bin/perl # example why globals are bad $one = ; $two = ; $max = &larger; print "$max\n"; $fred = ; #hack to make "larger" work with 2 different variables $barney = ; $keep_one = $one; # redundant stuff $keep_two = $two; $one = $fred; $two = $barney; $max = &larger; print "$max\n"; $one = $keep_one; $two = $keep_two; sub larger { if ($one > $two) { $one; } else { $two; } #there HAS to be a better way

4 Arguments Passed to Subroutines To pass parameters to a subroutine, simply list them in parentheses after the subroutine call $n = &larger (10, 15); $X = &larger ($one, $two);

5 Arguments Passed to Subroutines New automatic variable parameters passed to a subroutine are automatically stored in this array – as many as are required recall array syntax: $_[0] is first element array –do NOT confuse this with $_ -- the default variable –example: $i is scalar, and distinct and $i[0] – both arrays scope – the variables are local to the subroutine this means that when the subroutine is called, the values of the variables used in calling the subroutine are copied into the automatic argument array – so changes are only local to the subroutine This also means that if a subroutine calls a subroutine, the local scope is maintained – there would be two separate and distinct versions –this is why recursion works –this is taken care of by perl – we don't have to worry about it – we just have to understand that it occurs

6 Arguments Passed to Subroutines We can re-write our subroutine with parameter passing as: #!/usr/bin/perl $x = 5; $y = 18; $max = &larger ($x, $y); #call the subroutine print "$max"; ########################### sub larger { if($_[0] > $_[1]) { $_[0]; #local copy of X } else { $_[1]; #local copy of Y } However, this is hard to read, write, check, and debug Another problem – what happens if an extra parameter is included – you don't know from the context of the name of the subroutine "larger" that it only accepts 2 parameters. $x = larger(10, 15, 30); #30 will be ignored

7 Arguments Passed to Subroutines all variables in perl are global by default you can specify local variables with "my" sub larger { my ($a, $b); #my $a; my $b; ($a, $b) #list assignment # $a = $_[0]; # $b = $_[1]; if ($a > $b) { $a; } else {$b;} #note, you may omit ;'s for single lines of code within a #block }

8 Argument Passing to Sub's The variables $a and $b are private ("scoped") to only this code block – the subroutine They will not affect any other $a/$b in the program Changes to $a/$b in other parts of the program will NOT affect $a/$b in this subroutine The subroutine is modular and reusable, it can be placed into virtually any perl program and operate predictably.

9 More condensed without comments sub larger { my ($a, $b); ($a, $b) if ($a > $b) { $a; } else {$b;} } # even simpler sub larger { my ($a, $b) #subroutines often have this line if ($a > $b) { $a; } else {$b;} }

10 Variable-length Parameter Lists many traditional programming languages require subroutine parameter lists to be strictly typed –predefined number and type of parameters this may be enforced in perl as well sub larger { != 2) { # scalar context print "Warning: 2 arguments expected\n" } : }

11 Another way: Variable-length Parameter Lists #!/usr/bin/perl $max = &larger (3, 5, 10, 4, 5); sub larger{ my ($largest) = # shift element off of the LHS of the array foreach { # default variable used ($_) if($_ > $ largest) { $largest = $_; } $largest; }

12 Pragma: use strict; A pragma is used to convey information to the perl compiler use strict; –tells perl's compiler that it should enforce some good programming rules for this code block –essentially forces you to declare all variables with "my" -w (warnings pragma) – already talked about same as: use warnings; -w applies to all code (modules, subroutines, etc.) #!/usr/bin/perl $i =~ m/tag/; #warnings: Name "main::i" used only once: possible typo at test.pl line2 # Use of uninitialized value in pattern match (m//) at line 2.

13 use strict; #!/usr/bin/perl use strict; $i++; $i =~ m/tag/; # Global symbol "$i" requires explicit package name at./strict.pl line 3. # Global symbol "$i" requires explicit package name at./strict.pl line 4. # Execution of./strict.pl aborted due to compilation errors. perldoc warnings perldoc perllexwarn perldoc strict More useful than appears: example

14 #!/usr/bin/perl # maxBad.pl # Example program to show use of subroutines and strict use strict; = (1, 10, 11, 3, 9, 8, 5, 3); my $max; $max = print "max = $max\n"; if($number[1] == $max) { print "The largest number is at position 1 in array\n"; } sub max { my $large = foreach my $i { if($i > $large) { $large = $i; } $large; } Global symbol requires explicit package name at./maxBad.pl line 12. Execution of./maxBad.pl aborted due to compilation errors. Missing right curly or square bracket at./maxBad.pl line 24, at end of line syntax error at./maxBad.pl line 24, at EOF

15 Comment on Pragmas use warnings; use strict; Ideally these would be used from the beginning of every program that is longer than a few lines. It can be quite a challenge to take a large and complicated program and clean up all the warnings/errors Adding these checks after the program is written and debugged defeats the benefit of reducing development time by finding "mistakes".

16 return Operator return EXPRESSION –immediately returns a value from a subroutine = qw/TTT GTC CTG ATG GTA CGA/; my $index = sub which_codon_is { my foreach (0..$#list) { # indices of list if($this_codon eq $list[$_] ) { # missing ) on web return $_; } return(-1); # -1 if not found } # return is optional here – could just put -1

17 #!/usr/bin/perl # translate # Take input from STDIN, # convert DNA to AA's # # Modified -- to have function call another function $line = <>; while( $line = <>) { chomp($line); #take care of new lines $sequence = $sequence.$line; } # amino acid hash # key values pairs %aminos = ( "TTT", "F", "TTC", "F", "TTA", "L", "TTG", "L", "CTT", "L", "CTC", "L", "CTA", "L", "CTG", "L", "ATT", "I", "ATC", "I", "ATA", "I", "ATG", "M", "GTT", "V", "GTC", "V", "GTA", "V", "GTG", "V", "TCT", "S", "TCC", "S", "TCA", "S", "TCG", "S", "CCT", "P", "CCC", "P", "CCA", "P", "CCG", "P", "ACT", "T", "ACC", "T", "ACA", "T", "ACG", "T", "GCT", "A", "GCC", "A", "GCA", "A", "GCG", "A", "TAT", "Y", "TAC", "Y", "TAA", ".", "TAG", ".", "CAT", "H", "CAC", "H", "CAA", "Q", "CAG", "Q", "AAT", "N", "AAC", "N", "AAA", "K", "AAG", "K", "GAT", "D", "GAC", "D", "GAA", "E", "GAG", "E", "TGT", "C", "TGC", "C", "TGA", ".", "TGG", "W", "CGT", "R", "CGC", "R", "CGA", "R", "CGG", "R", "AGT", "S", "AGC", "S", "AGA", "R", "AGG", "R", "GGT", "G", "GGC", "G", "GGA", "G", "GGG", "G", ); $peptide = &tlate($sequence); print "$peptide\n"; ############# Subroutines sub tlate { $seq = $_[0]; $peptide=""; $length = length($seq); while($length>=3) { ($codon,$seq) = getCodon($seq); if($aminos{$codon}) { $peptide = $peptide.$aminos{$codon}; } else { $peptide = $peptide."."; #just put in a stop for XXX, ATX, etc } $length = length($seq); #print "$length\r"; } print "\n"; return($peptide) } sub getCodon { $sequence = $_[0]; $codon = substr($sequence,0,3); # look up codon $sequence = substr($sequence,3,$length-3); # remove that codon #print "codon=$codon\n"; return($codon,$sequence); }

18 Output./subsubRoutine.pl Main program s1 a = 4 b = 6 s2 a = 6 b = 10 return1 = 60 val = product =

19 Sort Subroutine A sort subroutine might (INCORRECTLY?) be expected to take 2 values, compare those values, and return them in sorted order: # POOR example sub sort_sub { my $a = shift; my $b = shift; # my ($a, $b) #same thing if ($a > $b) { return ($b, $a)} else {return ($a, $b)} }

20 Sort Subroutine Problems this subroutine may be called hundreds/thousands of times inefficient because –allocate variables –assign values –return 2 values alternative way –both $a and $b have been allocated in the "calling" code block –return one coded value -1 if $a < $b #Don't swap 1 if $a > $b # swap 0 if order doesn't matter Note – sorting is a special case that is performed so often that it has been highly optimized (that's why it gets all of this "special" attention)

21 numeric sort subroutine sub by_num { # expect $a and $b if($a $b) {1} else {0} } # ;'s are optional if one line in code block

22 "sort" with custom subroutine The custom sorting subroutine (without &) may be specified for the "sort" operator in perl. It assumes the -1, 1, 0 cases #!/usr/bin/perl use strict; = qw/ = sort sub by_num { if($a $b) {1} else {0} } Note, in "by_num", we need don't need to declare $a (even with strict) If we do, this will not work correctly. This is an example of "pass by reference" The values of the array are passed to the subroutine by reference – the actual values are not passed back and forth. That is why this works – and why changing $a changes the array.

23 sort Syntax (pg 217) sort SUBNAME LIST sort BLOCK list sort LIST #Note, this is a special case where the subroutine is called WITHOUT the &

24 Example #!/usr/bin/perl # sort.pl = (9, 8, 7, 1, 2, 3, 9, 69, = sort print "sorted sub by_num { print "$a $b\n"; if($a $b) {1} else {0} } # example -- sort-brok.pl

25 Shortcut to Sorting sorting like this is so common, a special operator exists to replace: if($a $b) {1} else {0} Therefore, we can also replace the whole = = sort { $a Descending = sort { $b cmp can be used for string comparisons – but this is how the sort operator already sorts.

26 Example Profile #!/usr/bin/perl # sort-prof.pl (for profile)_ # # Note how if I change the value of $a, it changes in the original array use strict; = qw/ /; = sort print print sub by_num { print "$a $b\n"; if($a == 5) { $a = 55} if($a $b) {1} else {0} }

27 Example Profile

28 Sorting a Hash #!/usr/bin/perl use strict; my %score = ("tim" => 195, "tracy" => 205, "indy" => 30); # sort keys of hash, based on values of hash = sort by_score keys %score; foreach { print "$_ $score{$_}\n"; } sub by_score { $score{$b} $score{$a} } #note descending order #tracy 205 #tim 195 #indy 30

29 Regular Expression Example #!/usr/bin/perl $i = " "; # match 1 digit, dashe, 2 digits, dash, and 2 digits if($i =~ m/\d-\d\d-\d\d/) { print "$i\n"; print "$&\n"; } #But what happens if day is 1 digit #match both cases $i = "4-1-69"; if($i =~ m/\d-(\d\d|\d)-\d\d/) { print "$i\n"; print "$&\n"; } #Match 1 or more digits #But, this also matches 100 for day $i = "4-1-69"; if($i =~ m/\d-\d+-\d\d/) { print "$i\n"; print "$&\n"; } #What about 2 digit years VS 4 digit years # Carefull, this will match 19 before 69 $i = " "; if($i =~ m/\d-\d+-\d\d|\d\d\d\d/) { print "$i\n"; print "$&\n"; } # $i = " "; if($i =~ m/\d-\d+-(\d\d\d\d|\d\d)/) { print "$i\n"; print "$&\n"; } #What about 2 digit months # But this will match 13 $i = " "; if($i =~ m/\d+-\d+-\d\d\d\d|\d\d/) { print "$i\n"; print "$&\n"; } #How about this #Doesn work, becuase it matches 1 $i = " "; if($i =~ m/1|2|3|4|5|6|7|8|9|10|11|12-\d+-\d\d\d\d|\d\d/) { print "$i\n"; print "$&\n"; } # Added ^ and $ # Added parenthises for grouping # Will NOT match because of 14 $i = " "; if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-\d+-(\d\d\d\d|\d\d)$/) { print "$i\n"; print "$&\n"; } # Now it will match $i = " "; if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-\d+-(\d\d\d\d|\d\d)$/) { print "$i\n"; print "$&\n"; } # Note limit on days to 1 or 2 digits # also, storing results into vars $i = " "; if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-(\d{1,2})-(\d\d\d\d|\d\d)$/) { print "$i\n"; print "$&\n"; $month = $1; $day = $2; $year = $3; print "$month $day $year\n"; }

30

31 End