Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp

Similar presentations


Presentation on theme: "Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp"— Presentation transcript:

1 Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25

2 Computer Programming for Biologists mock exam revision variable scope extensions file handles Overview

3 Computer Programming for Biologists http://bioinf.gen.tcd.ie/GE3M25/programming/exam Mock Exam

4 Computer Programming for Biologists my $prot = &translate($seq);# call sub translate { # definition my $seq = shift @_; # parameters … return $prot; # return value(s) } Revision - Subroutines

5 Computer Programming for Biologists scope The area of the script in which a variable is visible. Different blocks defined by: main program subroutines loops branches  different namespaces

6 Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement1; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement2; } main part

7 Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement1; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement2; } Blocks

8 Computer Programming for Biologists scope Tip: Keep local variables within subroutines  explicitly pass content between main part and subs, e.g.: $protein = &translate($seq); value passed into subroutine value returned from subroutine  avoid accidentally changing global variables

9 Computer Programming for Biologists scope # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { my $header = $1; } print "sequence ID: $header\n"; Global symbol "$header" requires explicit package name Wrong:

10 Computer Programming for Biologists scope # initialize global variable my $header = ''; # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { $header = $1; } print "sequence ID: $header\n"; Correct:

11 Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna.= $input; } my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna.= $input; }

12 Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna.= $input; } my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna.= $input; } different variables

13 Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna.= $input; } my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna.= $input; } same variable

14 Computer Programming for Biologists course project common errors: (arrangement) # print output in chunks of 60 bp width while ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out); } # change string to array: my @chars = split //, $dna; # print output in chunks of 60 bp width while ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out); } # change string to array: my @chars = split //, $dna; empties $dna

15 Computer Programming for Biologists course project considerations: # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); order is important # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/;

16 Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = ''; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } # define variables: my $do_revcomp = ''; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } make actions optional

17 Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = '1'; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } # define variables: my $do_revcomp = '1'; my $do_composition = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_composition) { &composition($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } make actions optional

18 Computer Programming for Biologists course project Work on your course project (sequanto.pl): 1.fix bugs 2.add "choice" variables at the top 3.move code blocks into subroutines (GC-content, composition)

19 Computer Programming for Biologists Control through options Perl module Getopt::Long allows processing command line options.

20 Computer Programming for Biologists Control through options $ man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; my $verbose; $result = GetOptions ("length=i" => \$length, # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag

21 Computer Programming for Biologists Control through options $ man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; my $verbose = ''; $result = GetOptions ("length=i" => \$length, # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag type of argument name of parameter reference

22 Computer Programming for Biologists Control through options perl test.pl -verbose -length 20 -file input.txt  $verbose set to '1', $length set to '20', $data set to 'input.txt' Command line parameters (with arguments):

23 Computer Programming for Biologists Control through options perl test.pl -verbose -length 20 -file input.txt  $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 –verbose Command line parameters (with arguments):

24 Computer Programming for Biologists Control through options perl test.pl -verbose -length 20 -file input.txt  $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 -verbose Long version: perl test.pl --verbose --length=20 --file=input.txt Command line parameters (with arguments):

25 Computer Programming for Biologists Control through options perl test.pl -verbose -length 20 -file input.txt  $verbose set to '1', $length set to '20', $data set to 'input.txt' Reorder: perl test.pl -file input.txt -length 20 -verbose Long version: perl test.pl --verbose --length=20 --file=input.txt Short version: perl test.pl -v -l 20 -f input.txt Command line parameters (with arguments):

26 Computer Programming for Biologists Control through options use Getopt::Long; my $do_translation = ''; my $do_revcomp = ''; &GetOptions ("translate" => \$do_translation, "revcomp" => \$do_revcomp, ); Try this in your script: perl sequanto.pl -gc -revcomp test.fa To allow the following execution:

27 Computer Programming for Biologists Control through options use Getopt::Long; my $do_translation = ''; my $do_revcomp = ''; &GetOptions ("translate" => \$do_translation, "revcomp" => \$do_revcomp, ); Try this in your script: perl sequanto.pl -gc -revcomp test.fa To allow the following execution: 2. Initialise variables 1. Import module 3. Call function define flags associate with referenced variables

28 Program prints output to screen: $ translate.pl seq.fa MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII… Redirect into file: $ translate.pl seq.fa > seq.aa Append to file: $ translate.pl seq.fa >> seq.aa Data Input/Output Redirect output

29 Reading from STDIN, default input stream: my $in = <>; Use filehandle to read input from a specific file: open (IN, 'input.txt'); # open file for reading while (my $in = ) { … } # read content line by line close IN; # close filehandle when finished Data Input/Output Filehandles filehandle

30 Syntax: open (FH, filename); # open file for reading open (FH, "< filename"); # open file for reading open (FH, "> filename"); # open file for writing open (FH, ">> filename"); # append to file close FH; # empties buffer Data Input/Output Filehandles Write and append mode will create files if necessary Write mode will empty file first

31 $file_name = 'results.txt'; if ($write_modus eq 'append') { # append to file (creates file if necessary) open (OUT, ">>$file_name"); } else { # normal write (erases content if file exists) open (OUT, ">$file_name"); } print OUT 'some text'; close OUT; # output might not appear until FH is closed! Data Input/Output Writing to files

32 open (IN, $file_name) or die "Can't read from $file_name: $!"; open (OUT, ">>$file_out") or die "Can't append to $file_out: $!"; # Note: special variable $! contains error message Data Input/Output Error check! Always test if an important operation worked out:

33 One or more file names are specified after the program, loop over each argument: foreach my $file (@ARGV) { # special variable @ARGV open (IN, $file) or die; # open filehandle while (my $in = ) { # read file line by line # do something } close IN; # close filehandle } Data Input/Output Reading from Filehandle

34 Computer Programming for Biologists Reading sequence from a file # read pattern and sequence my ($pattern, $file) = @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while ( ) { next if (/^>/); chomp; $sequence.= $_; } close IN; # get pattern my $pattern = shift @ARGV; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence.= $_; } $ perl split.pl gcctg test.fa Note: two command line arguments!

35 Computer Programming for Biologists Reading sequence from a file # read pattern and sequence my $pattern = shift @ARGV; my $file = shift @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while ( ) { next if (/^>/); chomp; $sequence.= $_; } close IN; # get pattern my $pattern = shift @ARGV; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence.= $_; } $ perl split.pl gcctg test.fa Note: two command line arguments!

36 Computer Programming for Biologists course project Work on your course project (sequanto.pl): 1.Add explicit opening of file-handle 2.Store translated sequence into a new file

37 Computer Programming for Biologists Exam Reminder Exam: Thu, Dec 11 th, 11 - 1 pm


Download ppt "Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp"

Similar presentations


Ads by Google