Perl IV Part V: Hashing Out the Genetic Code, Bioperl
Hashes There are three main data types in Perl – scalar variables, arrays, and hashes. Hashes provide VERY fast nested-array look-up Format is similar to that of array: %hash = (‘key’ => ‘value’); $value = $hash{‘key’};
Hashes %array = ( ‘key1’,‘value1’, ‘key2’,‘value2’, ‘key3’,‘value3’, ); %array = ( ‘key1’=>‘value1’, ‘key2’=>‘value2’, ‘key3’=>‘value3’, );
= keys = values %hash
The Binary Search of Arrays The ‘halving’ method is considerably faster then doing a comparison. e.g. finding a match in a set array takes 15 times through a loop max. Good method for one sort and multiple comparisons
Comparing Strings To compare 2 strings alphabetically in Perl, you use the cmp operator, which returns 0 if the two strings are the same, -1 if they are in alphabetical order, and 1 if they are in reverse order. ‘zzz’ cmp ‘zzz’ returns 0 ‘AAA’ cmp ‘ZZZ’ returns -1 ‘ZZZ’ cmp ‘AAA’ returns 1
Sorting Arrays Sorting an array of strings = if given numbers this will sort them lexically Sorting an array of numbers in ascending = sort { $a $b the values $a and $b must be used
Sorting Hashes Sorting keys and values foreach ( sort keys (%hash)) { print “$_\t”, “*” x $hash{$_},”\n”; } Sorting keys in ascending order foreach (sort {$hash{$b} $hash{$_}} keys (%hash)) { …… }
Nested Arrays $array[$i] -> [$j]; produces $array[$i][$j] Or use hashes: %hash = (duck => [‘Huey’,’Louie’,’Dewey’], horse => [‘Mr. Ed’], dog => [‘Benji’, ‘Lassie’] ); print $array will give ARRAY(0x85d3ad0) but print $array[$i] gives array of $j $value = $hash{$key}[$i]
The Genetic Code is Redundant Second Position UCAG First Posit ion U UUU Phe UCU Ser UAU Tyr UGU Cys U Thi rd Pos itio n UUCUCCUACUGCC UUA Leu UCAUAAStopUGAStopA UUGUCGUAGStopUGGTrpG C CUU Leu CCU Pro CAU His CGU Arg U CUCCCCCACCGCC CUACCACAA Gln CGAA CUGCCGCAGCGGG A AUU Ile ACU Thr AAU Asn AGU Ser U AUCACCAACAGCC AUAACAAAA Lys AGA Arg A AUGMet sACGAAGAGGG G GUU Val GCU Ala GAU Asp GGU Gly U GUCGCCGACGGCC GUAGCAGAA Glu GGAA GUGGCGGAGGGGG
Searching for codons DIFFICULT: my($codon) return s if ($codon =~ /TCA/i ); return s elseif ($codon =~ /TCC/i); return s elseif ($codon =~ /TCG/i); blah blah
Searching for codons BETTER: my($codon) return A if ($codon =~ /GC./i ); return C elseif ($codon =~ /TG[TC]/i); return D elseif ($codon =~ /GA[TC]/i); blah blah
Searching for codons BEST: my($codon) $codon uc $codon; my(%genetic_code) = ( ‘TCA’ => ‘S’, ‘TCC’ => ‘S’, ‘TCG’ => ‘S’ …. yadda yadda yadda ); return $genetic_code{$codon} if (exists $genetic_code{$codon})
Modules Perl contains the ability to deal with methods in an object-orientated manner classes are contained in packages These are often referred to as modules OO structure is: objectName ->method(arguments) Note to Self --- how many objects?
BioPerl ( The main focus of Bioperl modules is to perform sequence manipulation, provide access to various biology databases (both local and web-based), and parse the output of various programs. Its modules rely heavily on additional Perl modules available from CPAN (
How to go about comparing an unknown sequence... $in = Bio::SeqIO->new(‘file’=>$infile, ‘-format’=>’genbank’); $seqobj = = $seqobj->all_SeqFeatures(); $feat = $allfeatures[0]; $feature_start = $feat->start; $feature_strand = $feat->strand; If ($seqobj->species->{common_name} =~ {elegans}) { $seq = $seqobj->primary_seq->{seq} $id = $seqobj->id; }