Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl modules Sequence access Sequence manipulation Parsing BLAST records
Module and main program package Hello1; sub greet { return "Hello, World!"; } 1; Hello1.pm test1.pl #!/usr/bin/perl use Hello1; print Hello1::greet();
Why use module? Reusable by different programs. Keep your code well organized.
Module structure package Hello1; sub greet { return "Hello, World!\n"; } 1; Declare a package; file must be saved as Hello.pm Contents of the package: functions, and variables. Return a true value at end
Path to module Default path to look for perl -e If your module is placed under one of the path you can refer to your module use relative path. E.g. contains /usr/my/lib, and (1)your Mod.pm is /usr/my/lib/Mod.pm, you can refer to your module by “use Mod.pm”. (2)Your Mod.pm is /usr/my/lib/Mymod/Seq/Mod.pm, then you say: use Mymod::Seq::Mod If your module is not placed under any e.g. /some/dir/Mod.pm, then: use lib “/some/dir”; --- this adds the path to the beginning use Mod;
Variable scope in module my $var --- accessible only in module our $var --- accessible from outside $var ---same as “our $var” use strict; --- This forces all variables to be qualified with ‘my’ or ‘our’. package Hello2; use strict; our $var1 = 1; my $var2 = 3; my $str = "Hello World!\n"; sub greet { return $str; } 1; Hello2.pm #!/usr/bin/perl use Hello2; print "var1= $Hello2::var1\n"; print "var2= $Hello2::var2\n"; pring Hello2::greet(); test2.pl
Export Export functions and variables, so that they can be accessed without qualifier package Hello3; use strict; require Exporter; = qw(greet); our $var1 = 1; my $var2 = 3; my $str = "Hello World!\n"; sub greet { return $str; } 1; Hello3.pm #!/usr/bin/perl use Hello3 qw(greet); print "var1= $Hello3::var1\n"; print "var2= $Hello3::var2\n"; print greet(); test3.pl
package Hello3; use strict; use Exporter; = qw(greet); our $var1 = 1; my $var2 = 3; my $str = "Hello World!\n"; sub greet { return $str; } 1; Hello3.pm Need functionality in Exporter.pm to do exporting. This programs inherits functions Exporter module, rather than creates its own. Exporter this sub routine upon request by other program
#!/usr/bin/perl use Hello3 qw(greet); print "var1= $Hello3::var1\n"; print "var2= $Hello3::var2\n"; print greet(); test3.pl Request “greet”
package Hello4; use strict; use Exporter; = qw(greet); = qw(greet2); our $var1 = 1; my $var2 = 3; my $str = "Hello World!"; sub greet { return $str; } sub greet2 { return “Hi.\n”; } 1; Hello4.pm Export this automatically
#!/usr/bin/perl use Hello4 qw(greet); use Hello4; print "var1= $Hello4::var1\n"; print "var2= $Hello4::var2\n"; print greet(); print greet2(); test4.pl Request “greet” This automatically imports whatever
Exercise 1 Create a module which has functions to calculate the area and boundary of a rectangle. The width and length are to be supplied in your main program and passed into your module. Practice
Object Orientied Programming A package (or module) is a class. A reference to a hash becomes an object of this class. The object contains member variables which are stored in the hash. The object also contains member functions.
Hello5.pm package Hello; use strict; sub new { my $class = shift; my $ref = {}; bless ( $ref, $class ); return $ref; } sub greet { my ($ref, return $str; } sub greet2 { return "Hi\n"; } 1; #!/usr/local/bin/perl use Hello5; $h = new Hello5; print $h->greet("Good morning\n"); print $h->greet2; test5.pl
Rectangle.pm package Rectangle; sub new { my ($class, $width, my $hashref = {W=>$width, L=>$length }; bless ( $hashref, $class); return $hashref; } sub getArea { my $self = shift; return $self->{W} * $self->{L}; } sub getBoundary { my $self=shift; return 2*($self->{W}+$self->{L}); } 1; #!/usr/bin/perl use Rectangle; my $w = 3; my $l = 4; my $rect = new Rectangle($w,$l); my $area = $rect->getArea(); print "Area = $area\n"; my $b = $rect->getBoundary(); Print “Boundary=$b\n”; recttest.pl
Exercise 2 Create a class called “Cube”. It should have methods to calculate volume based on the cube’s width, length and height.
More Pratices on Class Sequence.pm: clean, wrap, reverse complement, shuffle, GC content, translate Main program: seq.pl
Bioperl A collection of perl modules for bioinformatics Facilitates sequence retrieval, manipulation, and parsing results of programs like blast, clustalw. for download and documentation. Individual.pm file has info on how to use modules. Usually installed: /usr/local/lib/perl5/site_perl/5.8.0/Bio
Some Bioperl modules Bio::Perl, Bio::DB -- access seq databases. Examples: seqret.pl Bio::Seq -- sequence and its annotation. E.g. seqio.pl Bio::SeqIO – read sequence from file, and write to file. E.g. seqio.pl Bio::Tools:SeqStats -- molecular weight, etc. E.g. seqmw.pl Bio::SearchIO -- parse blast results.
Accessing Remote Databases use Bio::Perl; $seqobj = get_sequence(‘swiss’, “ROA1_HUMAN”); write_sequence(“roa1.fasta”, ‘fasta’, $seqobj); Databases can be: swiss, genbank, genpept, refseq, etc.
Bio::Seq Contain sequence and annotation Methods: display_id, desc, seq, revcom, translate, etc. The revcom and translate methods create new Bio::Seq object. One way to create a Bio::Seq object: $seq = Bio::Seq->new(-seq => 'actgtggcgtcaact', -desc => 'Sample Bio::Seq object', -display_id => 'something', -accession_number => 'accnum', -alphabet => 'dna' ); An other way: read the sequence from file via Bio::SeqIO object.
Parsing blast results Module: Bio::SearchIO my $in = new Bio::SearchIO(-format => 'blast', -file => 'report.bls'); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { if( $hsp->length('total') > 100 ) { if ( $hsp->percent_identity >= 75 ) { print "Hit= ", $hit->name, ", Length=", $hsp->length('total'), ", Percent_id=", $hsp->percent_identity, "\n"; } Example: blastparse.pl