Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Computer Programming for Biologists recap project scalar variables built-in functions exercises Overview
Computer Programming for Biologists Topics covered in the first class: 1.Unix 2.Perl Recap
Computer Programming for Biologists 1. Unix details commands: mkdir, cd, ls, pwd, rm, chmod command parameters (ls -l) command line extension with TAB key command history (arrow up or down) special treatment of spaces (quotes or backslash) information and help in manual pages (man ls) Recap
Computer Programming for Biologists 2. Perl details variables ($repeat, $message) statements ($repeat = 4;) print function (print ‘Hello world!’;) newline (print “Hello world!\n”;) x operator (print $message x 4;) reading user input from command line ($repeat = shift; or $repeat = <>;) Recap
Computer Programming for Biologists Overall goal: development of a Perl script for sequence analysis Input: file with sequences in FASTA format and command line options Output: number and length of sequences GC content base / aminoacid composition reverse complement translations virtual enzyme digest Course project
> ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACAGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA > ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA > ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA Computer Programming for Biologists FASTA Format headers (starting with ‘>’ followed by sequence ID)
> ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACAGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA > ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA > ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA sequence (split across multiple lines) Computer Programming for Biologists FASTA Format
Computer Programming for Biologists Example usage: sequence length and GC content Course project command output
Computer Programming for Biologists Example usage: base composition and reverse complement Course project
Computer Programming for Biologists Example usage: translation and digest Course project
Computer Programming for Biologists Basic elements for programming: hold information allow changing of information organize data complex constructs possible special operations for data handling Variables
Computer Programming for Biologists Three different types in Perl: 1.scalar 2.array 3.hash Variables
Computer Programming for Biologists 1) Scalars: Content: number or string of characters Variable name starts with dollar sign ($) followed by letter or number, can contain underscore Variables
Computer Programming for Biologists Variables
Computer Programming for Biologists Variables
Computer Programming for Biologists Special scalars: $_default scalar $a, $bspace holders for comparisons $0name of program $!system error messages See man perlvar for many more special variables Variables
Computer Programming for Biologists Practical session: Go to and try the ‘Recap’ exercises Scalars
Built-in functions for scalars lc (change letters in string to lower case) uc (change letters in string to upper case) chop (remove last character) chomp (remove last character if it’s whitespace reverse (reverse list or string) length (calculate length of a string) split (split a string into a list) substr (extract parts of a string) tr (translation of text) Computer Programming for Biologists
Built-in functions for scalars Different ways of using functions: $out = uc($in); $out = uc $in; $out = uc; # works on default variable $_ Combination of functions: $out = uc(reverse($in)); $out = uc reverse $in; Computer Programming for Biologists
Built-in functions online help: more help available on the command line: man perlfunc overview of all built-in functions perldoc -f command information on a specific command only Computer Programming for Biologists
Practical session: Go to and try the ‘Functions’ exercises Built-in functions
Computer Programming for Biologists 1)Write a program that takes some text from the command line, prints it out in capital letters and also reports the length of the text, e.g.: caps.pl ‘Hello World!’ returns: HELLO WORLD! (length: 12) 2)Write a program that takes a DNA sequence from the command line and prints out the reverse complement. Make sure that it works both with small and capital letters, e.g. revcomp.pl aatTTgggcca returns: TGGCCCAAATT Excercises