Subroutines and Files Bioinformatics Ellen Walker Hiram College
Why Subroutines? Saves typing Saves potential copy/paste errors Collect common algorithm in one place for reuse
Built-In Subroutines Provide common useful functions, e.g. –Index –Length –Substr Call with arguments, –Index($string, $pat) #$string and $pat are arguments Different arguments produce different results
Finding Predefined Subroutines Textbooks (Safari Online has several) Google (include “Perl” in your string) Online documentation – is nicely searchablehttp://
How a Subroutine Works my $code = “ACA”; print length($code); print “goodbye\n”; Sub length my $string = my $length = 0; …code to count … return $length; ACA 3 “ACA”
Key Components sub name –Declares this as a subroutine and names it –Pulls the arguments out of the list (in parentheses, one at a time, left to right) –Example: somesub(“ACT”,1) – $a = ($a is “ACT) – $b = ($b is 1) return value –Ends the subroutine & gives it a value
Example (p. 122) # find all GC-rich 4-7mers and determine their complements my $GCmatch; while ($someDNA =~m/([GC]{4,7})/g ){ $GCmatch = $1; print “5’ $GCmatch 3’\n\n”; $compl = complement($GCmatch); print “3’ $compl 5’”\n”; }
Subroutine (p. 123) #book version has good documentation sub complement { my $dna = #get first arg my $anti = $dna; $anti =~ tr/ACGTacgt/TGCAtgca/; return $anti; }
Download These (Ch. 7) Counting nucleotides –countNucleotides( $str, “C”); –countNucleotides( $str, “[CG]”); Printing sequences with fixed line width –printSequence($str, 80);
Variable Scope Variables exist from when they are declared (“my”) until the end of the block (closing brace). Variables in subroutines exist only during the subroutine Each call to a subroutine re-initializes the variables
Files and Programs Files are stored on the computer’s hard drive and maintained by the operating system. Programs are connected to files via special subroutines –“open” creates a file handle –“close” releases the file (important!)
Basic File Manipulation Open a file and read –my $HANDLE; –open ($HANDLE, ‘<‘, $filename); –$line = ; Open a file and write –My $HANDLE; –open($HANDLE, ‘>’, $filename); –print $HANDLE “Hello world!”; Close a file –close($HANDLE);
Allowing for Errors If you try to read a file that doesn’t exist, or write a file that does, the open() command will return false The rest of your program won’t work. To fix this add: or die(“some message $file :$!”) to the end of the command ($! Contains the system error messages)
Complete Open Examples open ($HANDLE, ‘<‘, $filename) or die(“Cannot open file: $filename: $!); open ($HANDLE, ‘>‘, $filename) or die(“Cannot write file: $filename: $!);
Reading lines Subroutine chomp removes the ‘\n’ character at the end of each line $line = puts the next line in $line When there are no more lines, the result is false Example: put the whole file in one sequence while ($line = ) { chomp $line $seq = $seq. $line }
Printing to a file The print commands (print and printf) can optionally be followed with a file handle before the string to print Examples: –print $HANDLE “Hello\n”; –printf $HANDLE “GC percent is %.1f\n”, $GCcount * / $total;
Subroutine to read FASTA formatted file (p. 141) Returns sequence as one long string Removes whitespace, lines that begin with # (comments), and all digits ReadInDNA
FASTA File Format One header line, begins with > Many lines of text, sometimes capitalized, sometimes with spaces after every n characters (ReadInDNA handles these variations)
Getting a FASTA File Go to NCBI Search for what you want and download the file to your current machine Send the file to your directory of cs.hiram.edu (Demo to be provided)
Assignment Using subroutines from your text, determine the GC content of the given genomes. (Examples to be provided)