Programming and Perl for Bioinformatics Part I
A Taste of Perl: print a message perltaste.pl: Greet the entire world. #!/usr/bin/perl #greet the entire world $x = 6e9; print “Hello world!\n”; print “All $x of you!\n”; } - function calls (output statements) - command interpretation header - variable assignment statement - a comment
Basic Syntax and Data Types whitespace doesn’t matter to Perl. One can write all statements on one line whitespace doesn’t matter to Perl. One can write all statements on one line All Perl statements end in a semicolon ; just like C All Perl statements end in a semicolon ; just like C Comments begin with ‘#’ and Perl ignores everything after the # until end of line. Comments begin with ‘#’ and Perl ignores everything after the # until end of line. Example: #this is a comment Example: #this is a comment Perl has three basic data types: Perl has three basic data types: scalar scalar array (list) array (list) associative array (hash) associative array (hash)
Scalars Scalar variables begin with ‘$’ followed by an identifier Scalar variables begin with ‘$’ followed by an identifier Example: $this_is_a_scalar; Example: $this_is_a_scalar; An identifier is composed of upper or lower case letters, numbers, and underscore '_'. Identifiers are case sensitive (like all of Perl) An identifier is composed of upper or lower case letters, numbers, and underscore '_'. Identifiers are case sensitive (like all of Perl) $progname = “first_perl”; $progname = “first_perl”; $numOfStudents = 4; $numOfStudents = 4; = sets the content of $progname to be the string “first_perl” & $numOfStudents to be the integer 4 = sets the content of $progname to be the string “first_perl” & $numOfStudents to be the integer 4
Scalar Values Numerical Values Numerical Values integer:5, “3”, 0, -307 integer:5, “3”, 0, -307 floating point: 6.2e9, floating point: 6.2e9, hexadecimal/octal:0xd4f, 0477 hexadecimal/octal:0xd4f, 0477 Binary: 0b Binary: 0b NOTE: all numerical values stored as floating-point numbers (“double” precision)
Do the Math Mathematical functions work pretty much as you would expect: Mathematical functions work pretty much as you would expect:4+76* /122/(3-5) Example Example#!/usr/bin/perl print "4+5\n"; print 4+5, "\n"; print "4+5=", 4+5, "\n"; $myNumber = 88; Note: use commas to separate multiple items in a print statement Note: use commas to separate multiple items in a print statement What will be the output? =9
Scalar Values String values String values Example: Example: $day = "Monday "; print "Happy Monday!\n"; print "Happy $day!\n"; print 'Happy Monday!\n'; print 'Happy $day!\n'; Double-quoted: interpolates (replaces variable name/control character with it’s value) Double-quoted: interpolates (replaces variable name/control character with it’s value) Single-quoted: no interpolation done (as-is) Single-quoted: no interpolation done (as-is) Happy Monday! Happy Monday!\n Happy Monday! Happy $day!\n What will be the output?
String Manipulation Concatenation $dna1 = “ACTGCGTAGC”; $dna2 = “CTTGCTAT”; juxtapose in a string assignment or print statement juxtapose in a string assignment or print statement $new_dna = “$dna1$dna2”; Use the concatenation operator ‘.’ Use the concatenation operator ‘.’ $new_dna = $dna1 $dna2; $new_dna = $dna1. $dna2;Substring $dna = “ACTGCGTAGC”; $exon1 = substr($dna,2,5); 02 # TGCGT Length of the substring
Substitution DNA transcription: T U Substitution operator s/// : $dna = “GATTACATACACTGTTCA”; $rna = $dna; $rna =~ s/T/U/g ; #“GAUUACAUACACUGUUCA” =~ is a binding operator indicating to exam the contents of $ rna for a match pattern Ex: Start with $dna =“gaTtACataCACTgttca”; and do the same as above. What will be the output?
Example transcribe.pl: transcribe.pl: $dna ="gaTtACataCACTgttca"; $rna = $dna; $rna =~ s/T/U/g; print "DNA: $dna\n"; print "RNA: $rna\n"; Does it do what you expect? If not, why not? Patterns in substitution are case-sensitive! What can we do? Convert all letters to upper/lower case (preferred when possible) If we want to retain mixed case, use transliteration/translation operator tr/// $rna =~ tr/tT/uU/; #replace all t by u, all T by U
Case conversion $string = “acCGtGcaTGc”; Upper case: $dna = uc($string);# “ACCGTGCATGC” or $dna = uc $string; or $dna = uc $string; or $dna = “\U$string”; or $dna = “\U$string”; Lower case: $dna = lc($string);# “accgtgcatgc” or $dna = “\L$string”; or $dna = “\L$string”; Sentence case: $dna = ucfirst($string) # “Accgtgcatgc” or $dna = “\u\L$string”; or $dna = “\u\L$string”;
Reverse Complement 5’- A C G T C T A G C.... G C A T -3’ 3’- T G C A G A T C G.... C G T A -5’ Reverse: reverses a string Reverse: reverses a string $string = "ACGTCTAGC"; $string = reverse($string); "CGATCTGCA“ Complementation: use transliteration operator Complementation: use transliteration operator $string =~ tr/ACGT/TGCA/;
More on String Manipulation String length: length($dna)Index: # index STR,SUBSTR,POSITION index($strand, $primer, 2) optional
Flow Control Conditional Statements parts of code executed depending on truth value of a logical statement parts of code executed depending on truth value of a logical statement “truth” (logical) values in Perl: false = {0, 0.0, 0e0, “”, undef}, default “” true = anything else, default 1 ($a, $b) = (75, 83); if ( $a < $b ) { $a = $b; print “Now a = b!\n”; } if ( $a > $b ) { print “Yes, a > b!\n” } # Compact
Comparison Operators ComparisonStringNumber Equalityeq== Inequalityne!= Greater than gt> Greater than or equal to ge>= Less than lt< Less than or equal to return 1/null le<= Comparison: Returns -1, 0, 1 cmp<=>
Logical Operators OperationComputerese English version AND&&and OR||or NOT!not
if/else/elsif allows for multiple branching/outcomes allows for multiple branching/outcomes $a = rand(); if ( $a <0.25 ) { print “A”; } elsif ($a <0.50 ) { print “C”; } elsif ( $a < 0.75 ) { print “G”; } else { print “T”; }
Conditional Loops while ( statement ) { commands … } repeats commands until statement is no longer true repeats commands until statement is no longer true do { commands } while ( statement ); same as while, except commands executed as least once same as while, except commands executed as least once NOTE the ‘;’ after the while statement!! NOTE the ‘;’ after the while statement!! Short-circuiting commands: next and last next; #jumps to end, do next iteration next; #jumps to end, do next iteration last; #jumps out of the loop completely last; #jumps out of the loop completely
while Example: while ($alive) { if ($needs_nutrients) { print “Cell needs nutrients\n”; }} Any problem?
for and foreach loops Execute a code loop a specified number of times, or for a specified list of values Execute a code loop a specified number of times, or for a specified list of values for and foreach are identical: use whichever you want for and foreach are identical: use whichever you want Incremental loop (“C style”): for ( $i=0 ; $i < 50 ; $i++ ) { $x = $i*$i; $x = $i*$i; print "$i squared is $x.\n"; print "$i squared is $x.\n";} Loop over list (“foreach” loop): foreach $name ( "Billy", "Bob", "Edwina" ) { foreach $name ( "Billy", "Bob", "Edwina" ) { print "$name is my friend.\n"; print "$name is my friend.\n";}
Basic Data Types Perl has three basic data types: Perl has three basic data types: scalar scalar array (list) array (list) associative array (hash) associative array (hash)