Download presentation
Presentation is loading. Please wait.
1
Advanced Perl for Bioinformatics Lecture 5
2
Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the variable with =~, then use it within a conditional: if ($dna =~ /CAATTG/) {print “Eco RI\n”;} Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) A vertical line means “or”; it allows you to look for either of two completely different patterns: if ($dna =~ /GAAT|ATTC/)
3
Reading and writing files, review Open a file for reading: open INPUT,”/home/class30/input.txt”; Or writing open OUTPUT,”>/home/class30/output.txt”; Make sure you can open it! open INPUT, ”input.txt” or die “Can’t open file\n”;
4
Test time Last one…
5
Hashes Perl has another super useful data structure called a hash, for want of a better name. A hash is an associative array – i.e. it is an array of variables that are associated with each other.
6
Making a hash of it You can think of a hash just as if it were a set of questions and answers my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” );
7
Getting the hash back my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” ) print “my name is “, $matthash{first_name}; print “ “, $matthash{surname}, “\n”; You can store a lot of information and recover it easily and quickly without knowing in what order you added it, unlike an array.
8
Hashes as an array You can get the “keys” of the hash and use them like an array: foreach my $info (keys %matthash){ print “$info = $matthash{$info}”; }
9
Why are hashes useful? Exercise. Many of you might have noticed in the exercise on restriction sites, that there was no way to keep track of which sites were which using arrays Modify your script using a hash like this one: my %enzymehash = ( “EcoRI” => “CAATTG”, “BamHI” => “GGATCC”, “HindIII” => “AAGCTT”);
10
(an) answer foreach my $name (keys %enzymehash){ if ($sequence =~ /$enzymehash{$name}/) { print “I found a site for $name,$enzymehash{$name}”; }
11
Putting data in a hash my %hash; while ( ) { /stuff(important stuff) more stuff (best stuff)/; $hash{$1} = $2; } Or…. while ($line = ) { my @tmp = split /\t/, $line; $hash{$tmp[0]} = $tmp[1]; }
12
Advanced regex The fun isn’t over yet. You can match precise numbers of characters Any number of characters Positions in a line Precise formatting (spaces, tabs etc) You can get bits of the string you matched out and store them in variables You can use regexes to substitute or to translate
13
Grabbing bits of the regex The fun isn’t over yet. my $blastline = “Query= AT1g34399 gene CDS”; $blastline =~ /Query= (.+) gene/; my $atgnumber = $1; print “The accession number is $atgnumber\n”; You can store the contents of the bit within brackets, within the regex, as the special variable $1. Then use it for other stuff. If you put another pair of brackets in, it will be stored in $2.
14
Using modules You can use other peoples modules, including those that come with Perl. These provide extra commands, or change the way your Perl script behaves. E.g. use strict; use warnings; use Bio::Perl; You will see these stacked up at the beginning of more complicated Perl scripts. Some modules come with perl (strict, warnings) #man perlmod others you need to download and add in yourself.
15
A last exercise?... So: how might hashes help you solve this? Open up a BLAST output file Spit out the name of the query sequence, the top hit, and how many hits there were.
16
Programming projects Now it’s time to think of your programming projects. Hopefully you have an idea – we’ll discuss how feasible they are in the time available If not, here are some suggestions
17
Suggested program functions Translate a cDNA into protein, and then check it against the pfam database for HMM hits. Make a real restriction map of a DNA sequence, with predicted fragment sizes Align proteins of a favorite family, open the alignment and find residues that are totally conserved. Perform BLAST against the latest version of the database files for a particular organism – which will check whether the user has the latest files, and if not will download them Design PCR primers, to make a fragment size chosen by the user, for a sequence input from a fasta file. Check whether primer sites are unique in a sequenced, or partially sequenced, genome, and gives an “electronic PCR” result. Output an XML formatted version of a BLAST or HMMER text file. Analyze codon usage in a protein coding DNA sequence and calculate the Ka/Ks ratio
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.