Introduction to Bioinformatic Computation. Lecture #3 02-02-2010 Objectives for today Practicing in PERL programming on bioinformatic examples
READING THE BOOK (Learning Perl, O’Reilly) You have already learned (chapters 1,2) This week we will touch (chapters 3,6,7,8,9) Next week (chapters 5, 10, 11, 15) After this reading you should be experienced programmers!
Operation with strings $seq1 = ‘agtcctgatggatt’; $seq2 = ‘ttagggctctca’; Concatenation: $new = $seq1 . $seq2; Substring: $new = substr($seq1, 5,3); Reverse: $new = reverse($seq1); Length: $N_characters = length($seq1); Return value of operator length is number of characters in the string.
chop operator remove last character (chop.pl) #!/usr/local/perl $seq1 = ‘agtcctgatggatt’; print “original sequence $seq1 \n”; chop($seq1); print “sequence after first step $seq1 \n”; $last = chop($seq1); print “sequence after second step $seq1 \n”; print “return value of chop is $last”;
Random numbers in perl (rand.pl) #!/usr/local/perl $our_variable = int(rand(7)); print “$our_variable \n”; #try several times and get 0,1,2,3,4,5,6
Interaction with PERL program from keyboard (keyboard.pl) #!/usr/local/perl print “please type in your variable \n”; $input_variable = <STDIN>; chomp ($input_variable); print “You just typed $input_variable \n”;
Random number game (game.pl) #!/usr/local/perl $r = int(rand(7)); print “please guess the rand number 0-6 \n”; $guess = <STDIN>; chomp ($guess); if ($r != $guess) {print “You are wrong my value is $r \n”;} else {print “Congratulations Homo sapiens! You are very smart! \n”;}
Random number game with loop (game2.pl) #!/usr/local/perl ALEXEI: { $r = int(rand(7)); print “please guess the rand number 0-6 \n”; $guess = <STDIN>; chomp ($guess); if ($r != $guess) { print “You are wrong my value is $r \n”; redo ALEXEI; } else {print “Congratulations Homo sapiens! You are very smart! \n”;}
Syntax is extremely important! Parentheses, brackets, and braces () [] {} Label block must be in capital letters only Colon, semicolon, comma, period : ; , .
print into the screen and file Now we can: print into the screen and file interact with the computer form our keyboard Our next step is to make PERL to read data from a file However, for this purpose we must learn about ARRAYS.
Array keeps many variables in a strict order (array.pl) #!/usr/local/perl $seq1 = ‘agtcctgatggatt’; $seq2 = ‘ttagggctctca’; $a = 5; $b = 100; $c = 333; @alexei = ($seq1, $seq2, $a, $b, $c); #five elements in the array: element 0, 1, 2, 3, 4. # if you need second element, get it as $alexei[1], etc for $n (0..10) { print “element $n is $alexei[$n] \n”; }
How to get an array of characters from a string variable (array2.pl) #!/usr/local/perl $seq1 = ‘agtcctgatggatt’; $length = length($seq1); @array = split(“”, $seq1); for $n (0..($length-1)) { print “letter $n is $array[$n] \n”; }
Let’s count nucleotides in our sequence (nulceotides.pl) #!/usr/local/perl $seq1 = ‘agtcctgatggatttccccgatatagcctact’; $length = length($seq1); $a = $c = $g = $t = 0; @array = split(“”, $seq1); for $n (0..($length-1)) { if ($array[$n] eq ‘a’) {$a++;} #The same as $a=$a+1; elsif ($array[$n] eq ‘c’) {$c++;} elsif ($array[$n] eq ‘g’) {$g++;} elsif ($array[$n] eq ‘t’) {$t++;} } print “number of a is $a \t c is $c \t g is $g \t t is $t \n”;
Now it is time to let perl read files and make some useful calculations (read.pl) #!/usr/local/perl $a = $c = $g = $t = 0; $line =0; open (ALEXEI, “gene23534”); while (<ALEXEI>) { $line++; if ($line > 1) { @array = split(“”, $_); #current variable $_ is text line for $n (0..80) { if ($array[$n] eq ‘a’) {$a++;} elsif ($array[$n] eq ‘c’) {$c++;} elsif ($array[$n] eq ‘g’) {$g++;} elsif ($array[$n] eq ‘t’) {$t++;} } print “number of a is $a \t c is $c \t g is $g \t t is $t \n”;
EST database is the largest in GenBank gbest1.gz - gbest279.gz gbest database format is the same as format of gene entries:
Regular expressions (chapters 7-9) $seq = ‘agttctgaaatcggtcaatgccctcggcat’; Substitution $seq =~ s/t/u/g; Transliteration $seq =~ tr/gatc/ctag/; Matching pattern: if ($seq =~m/ttca/) {print “it is found! \n”;} else {print “pattern is absent \n”;}
Looking for pattern (pattern.pl) #!/usr/local/perl $seq = ‘agttctgaaatcggtcaatgccctcggcat’; print “enter pattern to search inside seq \n”; $p = <STDIN>; chomp ($p); if ($seq =~m/$p/) {print “$p is found! \n”;} else {print “pattern $p is absent \n”;}
prog_species_EST1 (in /home/afedorov/EST) #!/usr/bin/perl -w $prefix = $ARGV[0]; open(OUT, ">${prefix}.EST") || die "Could not create ${prefix}.EST\n"; open(IN, "gbest2.seq") || die "Could not open gbest1.seq\n"; $/ = "\/\/\n"; while(<IN>){ $check = 0; undef($seq); undef($locus); $start = $end = 0; @lines = split("\n", $_); for $n (0..$#lines) { if ($lines[$n] =~ /^LOCUS/){$locus = $lines[$n];} if ($lines[$n] =~ /ORGANISM\s+Drosophila\s+melanogaster/) {$check = 1;} if ($lines[$n] =~ /^ORIGIN/){$start = $n + 1; last;} } if ($check) { $end = $#lines -1; for $n ($start..$end) { $lines[$n] =~ s/\s+//g; $lines[$n] =~ s/\d+//g; $seq .= $lines[$n]; print OUT '> ', $locus, "\n", $seq, "\n";
Loop for prog_species_EST1 for $n (1..5) { $file = ‘gbest’ . $n . ‘.seq’; open (FH, “$file”); prog_species_EST1 }