Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
Computer Programming for Biologists Revision Subroutines Overview
Computer Programming for Biologists my %seq = (); # initialisation $freq{$char} = 0; # storing a value $freq{$char}++; # changing a value my $aa = $code{$codon}; # extracting foreach my $header (sort keys %seq) { my $seq = $seq{$header}; … } Revision - Hashes
Hash Variables Scalars vs Hash my $A = 0; A 0 my $C = 0; C 0 my $G = 0; G 0 my $T = 0; T 0 Initialisation of values: my %frequency = ();
Hash Variables if ($char eq 'A') { $A++; } elsif ($char eq 'C') { $C++ } elsif ($char eq 'G') { $G++; } elsif ($char eq 'T') { $T++; } Scalars vs Hash C 1 %frequency C 1 Increment: $frequency{$char}++;
Hash Variables Scalars vs Hash G T C A 9 %frequency A 5 C 9 G 7 T 5
Hash Variables Scalars vs Hash G T C A 9 %frequency print "Frequency of A: $A"\n; print "Frequency of C: $C"\n; print "Frequency of G: $G"\n; print "Frequency of T: $T"\n; A 5 C 9 G 7 T 5 foreach my $char (keys %frequency) { print "Frequency of $char: $frequency{$char}\n"; } Output:
Computer Programming for Biologists write your own functions run "programs" within a program Subroutines
Computer Programming for Biologists Definition: sub name_of_routine { # optional arguments e.g. my ($arg1, $arg2) # specify statements statement1; statement2; … # optionally return scalar or list, e.g. return $result1, $result2; } Subroutines special array with arguments to subroutine
Computer Programming for Biologists Subroutines &: (optional) symbol indicating subroutine Usage: name_of_routine; or $rv = &name_of_routine(); = &name_of_routine($arg1, $arg2); (optionally) capture return value(s) (optionally) submit list of arguments
Computer Programming for Biologists Subroutines Example: my $dna = shift; my $rev_comp = &reverse_complement($dna); print "reverse complement:\n".&format($rev_comp, 60); # sub routines: sub reverse_complement { my $out = reverse $out =~ tr/acgtACGT/tgcaTGCA/; return $out; } sub format { my ($sequence, $width) …
Computer Programming for Biologists Subroutines Example: my $dna = shift; my $rev_comp = &reverse_complement($dna); print "reverse complement:\n".&format($rev_comp, 60); # sub routines: sub reverse_complement { my $out = reverse $out =~ tr/acgtACGT/tgcaTGCA/; return $out; } sub format { my ($sequence, $width) … A copy of $dna is passed on Main area stays tidy and Details hidden towards end of script Code is re-usable, can be applied multiple times
Computer Programming for Biologists Can be placed anywhere in the program Normally all subroutines located after main block of text Definition starts with 'sub' followed by name Statements enclosed in curly brackets Text normally written indented Optionally provide arguments Optionally return values Can be nested Subroutines
Computer Programming for Biologists Scenario: Read in DNA sequence Translate in all six reading frames 6 x translation of a sequence Subroutines
Computer Programming for Biologists Inefficient coding: # frame 1: $sequence = $orig_seq; # Block of translation code, e.g. $prot = ''; while ($sequence) { $codon = substr $sequence, 0, 3, ''; $aa = $genetic_code{$codon}; $prot.= $aa; } print "translation: $prot\n"; Subroutines
Computer Programming for Biologists Inefficient coding: # frame 1: $sequence = $orig_seq; # Block of translation code, e.g. $prot = ''; while ($sequence) { $codon = substr $sequence, 0, 3, ''; $aa = $genetic_code{$codon}; $prot.= $aa; } print "translation: $prot\n"; # frame 2: # remove first base substr $sequence, 0, 1, '' Subroutines
Computer Programming for Biologists Inefficient coding: # frame 1: $sequence = $orig_seq; Block of translation code # frame 2: # remove first base substr $sequence, 0, 1, ''; Block of translation code # frame 3: … # frame -1: … Subroutines the same block of code specified 6 times
Computer Programming for Biologists More efficient coding: # frame 1: $sequence = $orig_seq; &translate($sequence); # frame 2: # remove first base substr $sequence, 0, 1, '' &translate($sequence); # frame 3: … # frame -1: … sub translate { $input = shift; … print "translation: $prot\n"; } Subroutines 6 times use of subroutine 1 specification of translation code
Computer Programming for Biologists Alternative: # frame 1: $sequence = $orig_seq; print &translate($sequence), "\n"; # frame 2: # remove first base substr $sequence, 0, 1, '' print &translate($sequence), "\n"; # frame 3: … # frame -1: … sub translate { $input = shift; … return $protein; } Subroutines print return value return translated sequence
Computer Programming for Biologists Other uses – recursion: # calculate factorial value for a given number: $fv = &fact(10); print "factorial 10 is $fv\n"; sub fact { my $val = shift; $fact = 1; if ($val > 1) { $fact = $val * &fact($val-1); } return $fact; } Subroutines call subroutine within itself
Computer Programming for Biologists Other uses – recursion: $val = 10; $fact = $val * &fact($val-1); $fact = 10 * fact(9); $fact = 10 * 9 * fact(8); … $fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * fact(1); $fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1; Subroutines
Computer Programming for Biologists reduce programming effort improve flow increase clarity enable recursion Subroutines
Computer Programming for Biologists Extend your sequence analysis tool: -add translation into protein as subroutine into your script me at with questions or problems Exercises
Computer Programming for Biologists Mock exam! Next week