Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters 1-4, Tisdall
Multiple platforms, multiple languages Windows, Mac, UNIX, Linux –UNIX remains the standard for bioinformatics software development, while PC’s and Mac’s are typically end-users. Java, Python, CORBA, C++, Ruby, Perl –There’s more than one way of doing things. –Uniformity continues to be one of the biggest problems faced in bioinformatics
Why Perl? Ease of use by novice programmers Fast software prototyping –Flexible language –Compact code (sometimes) Powerful pattern matching via “regular expressions” Availability of program and modules (BioPerl) Portability Open Source – easy to extend and customize No Licensing fees
Perl is easy to get… Many computers come with Perl already installed –Check by typing perl –v in a Unix, Linux, MacOSX shell, or Windows MS-DOS shell If not, simply go to or to download a recent version of Perl (download binary whenever possible, source code requires compiling) ActiveState provides several tools for Perl developers (Although some think Perl is an “old” language, it is constantly undergoing revision and improvement
What is Perl? Practical Extraction Report Language An interpreted programming language optimized for scanning text files, extracting information, and printing reports The string-based language of DNA and protein sequence data makes this an obvious choice
What is a Perl program? A program consists of a text file containing a series of Perl statements –Perl programs can be written in a variety of text editors including MS Word, WordPad, NotePad, or as you will use Komodo from ActiveState Perl statements are separated by semi-colons (;) Multiple spaces, tabs, and blank lines are ignored Anything following a # is ignored (comment line) Perl is case sensitive
Perl has three data types $ - Scalar: holds a single value, which can be a number or string, $EcoRI = - Array: stores multiple scalar values [0, 1, 2, etc.] % - Hash: An associative array with keys and values
Using Scalar Variables Example 4-1 Tisdall provides a simple example, a thorough description of this exercise is supplied both in the text
Some additional comments regarding strings: Quotes: –‘XYZ’ Text between a pair of single quotes is interpreted literally –To get a single-quote in a string precede it by a backslash –To get a backslash into a single quoted string, precede backslash with backslash ‘hello’ #hello ‘can\’t’ #can’t ‘ #
Double quotes interpolate variables “” variable names within the string are replaced by their current values –$x = 1; print ‘$x’; #will print out $x print “$x”; # will print out 1
Arithmetic operators + Addition - Subtraction * Multiplication ** Exponentiation / Division % Modulus
Other important operators = is an assignment operator == or eq is equals += or -= assignment operators that add or subtract, $a += 2; # means $a = $a +2; ++,, -- are autoincrement operators that add or subtract one from variable when following variable ($a++ = $a + 1)
\n = newline Often times you would like to introduce some spacing into your output \n introduces a blank line following any variable Print “apple”; print “grape”; Output looks like: apple grape Print “apple\n”; print “grape\n”; Output looks like:apple grape
Chomp and Chop Chop removes the last character from a string –$a = “Dr. Barber is hip”; –Chop ($a);#$a is now “Dr. Barber is hi” Chomp removes a line from the end of the string –$a = “Dr. Barber is hip\n”; –Chomp ($a);#$a is now “Dr. Barber is hip”
Do examples 4-2, 4-3, 4-4
Working with Files Biological data can come in a variety of file formats and our job is to utilize these files and extract what we want One such file format is FASTA
Scalar vs. Array Example 4-5 provides a simple distinction between use of a scalar variable and an array, read it, but don’t necessarily do it Also, it shows how you use filehandles in association with your file are input operators, you will become better acquainted with this when we use later
adhI.pep Supplant NM_021964fragment.pep with adhI.pep, which can be downloaded from the web-site to a folder you need to create on your computer called “BIOS482” Do Example 4-7, if time permits write analogous code to the code that follows this example to test out arrays