BioPerl
cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics
use Bio::Seq; use Bio::SeqIO; # create a sequence object of some DNA my $seq = Bio::Seq->new( -id => 'testseq', -seq => 'CATGTAGATAG'); # print out some details about it print "seq is ", $seq->length, " bases long\n"; print "revcom seq is ", $seq->revcom->seq, "\n"; # write it to a file in Fasta format my $out = Bio::SeqIO->new( -file => '>testseq.fsa', -format => 'Fasta'); $out->write_seq($seq);
“Bioperl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications.” Core package provides the main parsers, this is the basic package and it's required by all the other packages Run package provides wrappers for executing some 60 common bioinformatics applications BioPerl db package is a subproject to store sequence and annotation data in a BioSQL relational database Network package parses and analyzes protein-protein interaction data
Open Bioinformatics Foundation “.. a non profit, volunteer run organization focused on supporting open source programming in bioinformatics.” BioDAS - XML Infrastructure for exchanging genome annotations BioJava - Java toolkit BioMOBY - Data and application execution through web services BioPerl - Perl toolkit BioPipe - Pipelines and workflow project for creating bioinformatics protocol BioPython - Python toolkit BioRuby - Ruby toolkit BioSQL - RDBMS Database schema for storing sequences, annotations, taxa data. OBDA - a standard for sequence data access locally, remotely, and via RDBMS EMBOSS - Sequence analysis toolkit.
Open Bioinformatics Foundation “.. a non profit, volunteer run organization focused on supporting open source programming in bioinformatics.” BioDAS - XML Infrastructure for exchanging genome annotations BioJava - Java toolkit BioMOBY - Data and application execution through web services BioPerl - Perl toolkit BioPipe - Pipelines and workflow project for creating bioinformatics protocol BioPython - Python toolkit BioRuby - Ruby toolkit BioSQL - RDBMS Database schema for storing sequences, annotations, taxa data. OBDA - a standard for sequence data access locally, remotely, and via RDBMS EMBOSS - Sequence analysis toolkit.
BioPerl Sequence objects Bio::Seq - Sequence object, with features – Default sequence object Bio::PrimarySeq - Bioperl lightweight Sequence Object – CPU and memory efficient Bio::Seq::RichSeq - Module implementing a sequence created from a rich sequence database entry – Sequences obtained from a.o. the EMBL database Bio::Seq::LargeSeq - SeqI compliant object that stores sequence as files in /tmp – Sequences > 100MBases
Sequence and annotation schematic
Incomplete list of topics covered by BioPerl: Accessing sequence data from local and remote databases Manipulating sequences Translating Obtaining basic sequence statistics (SeqStats,SeqWord) Identifying restriction enzyme sites (Bio::Restriction) Identifying amino acid cleavage sites (Sigcleave) Running BLAST Parsing BLAST and FASTA Searching for genes and other structures on genomic DNA (Genscan, Sim4, Grail, Genemark, ESTScan, MZEF, EPCR) Aligning 2 sequences Aligning multiple sequences (Clustalw.pm, TCoffee.pm) Manipulating clusters of sequences (Cluster, ClusterIO) Representing sequence annotations Using 3D structure objects and reading PDB files (StructureI, Structure::IO) Tree objects and phylogenetic trees (Tree::Tree, TreeIO, PAML) Bibliographic objects for querying bibliographic databases (Biblio) Graphics objects for representing sequence objects as images (Graphics) Sequence manipulation using the Bioperl EMBOSS and PISE interfaces
Exercises At: Try to run the: “A Better Version of the Feature Renderer” script. Modify the script to accept an accession number instead of a filename and retrieve the corresponding sequence from the EMBL database. Test with accession number: J02933 Hint: “Bio::DB::EMBL”, where is the database located? Create a BioPerl sequence object from the example1.fasta and add the ORF starting at position 11 as a feature. Display the resulting sequence object using the feature renderer script.