Download presentation
Presentation is loading. Please wait.
1
Perl for Bioinformatics Lecture 4
2
Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable Use = to assign a value Use \ to stop the variable being interpolated Take care with variable names and with changing the contents of variables
3
Conditional Blocks, review An if test can be used to control a command in a conditional block, according to the outcome of a decision made by comparing variables. It’s important to keep track of whether variables are strings or numbers. Numbers are compared with ==, strings with eq. It’s usual to indent the block to make it easier to read the code
4
Arrays An array can store multiple pieces of data. They are essential for the most useful functions of Perl. They can store data such as: –the lines of a text file (e.g. primer sequences) –a list of numbers (e.g. BLAST e values) Arrays are designated with the symbol @ my @bases = ( “ A ”, “ C ”, “ G ”, “ T ” );
5
Converting a variable to an array split splits a variable into parts and puts them in an array. my $dnastring = "ACGTGCTA"; my @dnaarray = split //, $dnastring; @dnaarray is now (A, C, G, T, G, C, T, A) @dnaarray = split /T/, $dnastring; @dnaarray is now (ACG, GC, A)
6
join combines the elements of an array into a single scalar variable (a string) $dnastring = join('', @dnaarray); Converting an array to a variable which array spacer (empty here)
7
Loops A loop repeats a bunch of functions until it is done. The functions are placed in a BLOCK – some code delimited with curly brackets {} Loops are really useful with arrays. The “foreach” loop is probably the most useful of all: foreach my $base (@dnaarray) { print "$base “ ; }
8
String comparison (is the text the same?) eq (equal ) ne (not equal ) There are others but beware of them! Comparing strings
9
Getting part of a string substr takes characters out of a string $letter = substr($dnastring, $position, 1) which string where in the string how many letters to take
10
Combining strings Strings can be concatenated (joined). Use the dot. operator $seq1= “ ACTG ” ; $seq2= “ GGCTA ” ; $seq3= $seq1. $seq2; print $seq3; ACTGGGCTA
11
Making Decisions - review The if operator is generally used together with numerical or string comparison operators, inside an (expression). numerical: ==, !=, >, <, ≥, ≤ strings:eq, ne You can make decisions on each member of an array using a loop which puts each part of the array through the test, one at a time
12
More healthy exercise Write a program that asks the user for a DNA restriction site, and then tells them whether that particular sequence matches the site for the restriction enzyme EcoRI, or Bam HI, or Hind III. Site for EcoR1: GAATTC Bam H1: GGATCC Hind III: AAGCTT
13
Read in restriction site to variable Remove newline character Check if variable contains “GAATTC” Check if variable contains “GGATCC”..etc. pseudocode
14
Read in sequence to variable Remove newline character Split sequence in variable to array using “GAATTC”. Count and report number of fragments. Measure length of fragments and report site positions, adding six for missing sites What about longer sequences?
15
Arrays and loops - review An array starts with @. It contains multiple bits of data in a list-like format. @bases = ( “ A ”, “ C ”, “ G ”, “ T ” ); You can make decisions on each member of an array using a foreach loop which puts each part of the array through the test, one at a time
16
Test time, again Remember – keep track of what’s in a variable don’t over-write a variable with another value, unless you intend to syntax and case are critical lines end with a semicolon brackets and quotes must match.
17
Opening and closing files So we can input large amounts of data, Perl has to read data out of files, and write results into output files This is done in two steps First, you must give the file a name within the script - this is known as a filehandle Use the open command: open MYFILE, ‘ exampleprotein.txt ’ ;
18
Reading a file Once the file is open, you can read from it, line by line, using the readline <> operator again –(put the filehandle between the angle brackets) Perl reads files one line at a time, each time you input data from the file, the next line is read: open FILE1, ’ exampleprotein.txt ’ ; $line1 = ; chomp $line1; $line2 = ;
19
Using loops to read in a file The while loop just keeps doing an expression while it’s true. So it will keep reading lines from the file until it runs out. my $longsequence; open FILE, ‘ exampleprotein.txt ’ ; while (my $line = ){ chomp $line; $longsequence = $longsequence. $line; } close FILE; This reads the whole file, and puts each line into the variable $longsequence one at a time. Think about what happens to the ID line….
20
Now More Fun Excercises Read a DNA sequence from a fasta format file Calculate the GC content. What about the non-DNA characters in the file? >header lines with the name of the sequence carriage returns !! You know this one. blank spaces N’s or X’s or unexpected letters
21
Open file Load ID line and sequence into different variables Split the sequence into an array of individual letters For each letter in the array: –if A, increment “A” counting number variable –If T, increment “T” counting variable –…etc Pseudocode
22
Writing to a File Writing to a file is similar to reading from it Use the > operator to open a file for writing: open OUTPUT, ‘ >/home/class30/output.txt ’ ; This creates a new file with that name, or overwrites an existing file Use >> to append text to an existing file print to the file using the filehandle: print OUTPUT $myoutputdata;
23
else Instead of just letting the script go on if it fails an if test, you can get it to execute a second block of code if the statement in brackets isn’t true. Some more stuff you need to know
24
elsif You can string a lot of “if”s together using elsif if ($site eq “ GAATTC ” { print “ EcoR1 site\n ” ; } elsif ($site eq “ CCATGG ” { print “ BamHI site\n ” ; } elsif ($site eq “ AAGCTT ” ) { print “ HindIII site\n ” ; } else { # only happens if none of the preceeding are true die “ I can ’ t find any of the sites I know\n ” ; }
25
Subscripts Bioinformatics data often can be made into array format: –multi-line sequence files –Microarray or statistics data in “tab delimited” format You can address part of the array as if it was a variable using a subscript @numbers = (8, 8, 8, 23984092, 8); print “ $numbers[3]\n ” ; Please note – the first element is number zero! Second is 1!
26
Regular Expressions Sounds odd, doesn’t it? It means a pattern that the computer can match, in a standard format. Very useful in bioinformatics work DNA patterns restriction sites promoters/transcription factor binding sites intron splice site Protein patterns conserved domains (motifs) active sites structural motifs (membrane spanning, signal peptide, etc.)
27
The Binding and Match Operators: =~ / / The =~ operator binds functions together The // operator matches things to patterns It can be translated as “contains” The forward slashes contain the pattern to be matched, like this: if ($dnaseq =~ /GAATTC/) {print “ EcoRI site found\n ” }
28
A regular expression.. is a joy forever. And a pattern to match: can be just a text string, such as: / GATC/ it can have alternative characters: / G[AT]TC/ or contain a wildcard that matches any character: / G.TC/ Or be something bizzare: /\/[^\/]*\/\.\./
29
Perl Regular Expressions It never ceases to amaze me what people can do with regular expressions, but you can match pretty much anything you can think of and a lot you can’t: #man perlrequick
30
Alternative Characters Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) This will match an DNA string that starts with CAG; has A or T in the 4th position, followed by another CAG. A vertical line within the /expression/ means “or”; it allows you to look for either of two completely different patterns: if ($dna =~ /GAAT|ATTC/)
31
Special characters Perl has a large set of special characters to use in regular expressions: – the dot (.) matches any character – \d matches any digit (a number from 0-9) – \w matches any “word” character (a letter or a number, not punctuation or space) – \s matches white space (any amount) – \t matches a tab (useful for tab delimited files) – ^ matches the beginning of a line – $ matches the end of a line –Knowing this makes you lots of fun at parties.
32
“Special” characters What if you need to match text that contains a special character? Aren’t there dots at the end of sentences? Now you have to use a backslash (\) to “escape” the special meaning of that character: if $onewordsentence =~ /\w+ \./ -This would match any text that has one or more text characters, followed by a dot. Finished.
33
Bringing it together So now, when you think about it, you can: Open a file Check whether each line of the file contains a particular pattern Recover part of that line Write it out to another file So.. isn’t that what you wanted to know? But really, it’s very useful combined with the UNIX command line.
34
A last exercise?... Now we’re getting up to speed with Perl, lets try something more fun: Open up a BLAST output file Spit out the name of the query sequence, the top hit, and how many hits there were.
35
Only the beginning Sadly, there is much, much more than this to the Power of Perl. You can make, create and download other people’s websites Make Linux and Windows graphical programs Do almost anything on the internet Interact with databases And much much more
36
Why won’t I teach you more stuff? Whoa! Programming takes time to learn properly You’ve got the tools now to get started on a programming project We will go through some more Perl functions in the later classes, especially modules such as Bioperl.
37
Practice makes perfect You can now practice your Perl skills and understand a lot of the books and help files, which are probably more useful. #man perlintro #man perlrequick #perldoc bioperl Also, check out Radhika’s resource page http://compbio.sph.harvard.edu/chb/training/training-resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.