Biol 59500-033 - Practical Biocomputing1 BioPerl General capabilities (packages) Sequences ○ fetching, reading, writing, reformatting, annotating, groups.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
Advertisements

1 Introduction to Perl Part III: Biological Data Manipulation.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
On line (DNA and amino acid) Sequence Information Lecture 7.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
Bioperl modules.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
How to use the web for bioinformatics Ethan Strauss X 1171
Sequence Alignment Topics: Introduction Exact Algorithm Alignment Models BioPerl functions.
Annotation Presentation Alternative Start Codons &
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Wellcome Trust Workshop Working with Pathogen Genomes Module 1 Artemis.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
BioPython Workshop Gershon Celniker Tel Aviv University.
BioPerl Based on a presentation by Manish Anand/Jonathan Nowacki/ Ravi Bhatt/Arvind Gopu.
Essential Bioinformatics and Biocomputing Module (Tutorial) Biological Databases Lecturer: Chen Yuzong Jan 2003 TAs: Cao Zhiwei Lee Teckkwong, Bernett.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Copyright OpenHelix. No use or reproduction without express written consent1.
Beginning BioPerl for Biologists MPI Ploen Jun Wang.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
School B&I TCD Bioinformatics Database homology searching May 2010.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Part I: Identifying sequences with … Speaker : S. Gaj Date
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Assignment feedback Everyone is doing very well!
11/6/2013BCHB Edwards Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Motif discovery and Protein Databases Tutorial 5.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
O Log in to amazon biolinux O For mac users O ssh O For Windows users O use putty O Hostname public_dns_address O username ubuntu.
Using Local Tools: BLAST
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Introducing Bioperl Toward the Bioinformatics Perl programmer's nirvana.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
What is BLAST? Basic BLAST search What is BLAST?
Modules and BioPerl.
Basics of BLAST Basic BLAST Search - What is BLAST?
Using Web-Services: NCBI E-Utilities, online BLAST
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
Presentation transcript:

Biol Practical Biocomputing1 BioPerl General capabilities (packages) Sequences ○ fetching, reading, writing, reformatting, annotating, groups ○ Access to remote databases Applications ○ BLAST, Blat, FASTA, HMMer, Clustal, Alignment, many others Gene modeling ○ Genscan, Sim4, Grail, Genemark, ESTScan, MZEF, EPCR XML formats ○ GAME, BSML and AGAVE GFF Trees Genetic maps 3D structure Literature Graphics

Biol Practical Biocomputing2 BioPerl Auxilliary packages possibly of less general interest require additional modules BioPerl-run – running applications ○ EMBOSS ○ PISE Bioperl-ext – extensions Bioperl-db and BioSQL

Biol Practical Biocomputing3 BioPerl Simple use Bio::Perl; easy access to a small part of Bioperl's functionality in an easy to use manner use Bio::Perl; # this script will only work if you have an internet connection on the # computer you're using, the databases you can get sequences from # are 'swiss', 'genbank', 'genpept', 'embl', and 'refseq' my $seq_object = get_sequence('swiss',"ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); use Bio::Perl; my $seq = get_sequence('swiss',"ROA1_HUMAN"); # uses the default database - nr in this case my $blast_result = blast_sequence($seq); write_blast(">roa1.blast",$blast_result);

Biol Practical Biocomputing4 BioPerl Bio::Perl Bio::Perl has a number of other easy-to-use functions, including get_sequence - gets a sequence from standard, internet accessible databases read_sequence - reads a sequence from a file read_all_sequences - reads all sequences from a file new_sequence - makes a Bioperl sequence just from a string write_sequence - writes a single or an array of sequence to a file translate - provides a translation of a sequence translate_as_string - provides a translation of a sequence, returning back just the sequence as a string blast_sequence - BLASTs a sequence against standard databases at NCBI write_blast - writes a blast report out to a file

Biol Practical Biocomputing5 BioPerl Sequence Objects Seq, PrimarySeq, LocatableSeq, RelSegment, LiveSeq, LargeSeq, RichSeq, SeqWithQuality, SeqI Common formats are interpreted automatically Simple formats - without features ○ FASTA (Pearson), Raw, GCG Rich Formats - with features and annotations ○ GenBank, EMBL ○ Swissprot, GenPept ○ XML - BSML, GAME, AGAVE, TIGRXML, CHADO

Biol Practical Biocomputing6 BioPerl Sequences, Features and Annotations Sequence - DNA, RNA, Amino Acid Sequences are feature containers ○ Feature - Information with a Sequence Location ○ Annotation - Information without explicit Sequence location Parsing sequences Bio::SeqIO ○ for automatically reading most types ○ multiple drivers: genbank, embl, fasta,... Sequence objects ○ Bio::PrimarySeq ○ Bio::Seq ○ Bio::Seq::RichSeq

Biol Practical Biocomputing7 BioPerl Simple examples #!/bin/perl -w use Bio::Seq; $seq_obj = Bio::Seq->new( -seq => "aaaatgggggggggggccccgtt", -alphabet => 'dna' ); #!/bin/perl -w use Bio::Seq; $seq_obj = Bio::Seq->new( -seq => "aaaatgggggggggggccccgtt", -display_id => "#12345", -desc => "example 1", -alphabet => "dna" ); print $seq_obj->seq();

Biol Practical Biocomputing8 BioPerl Reading sequences from files & databases #!/bin/perl -w use Bio::SeqIO; $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' ); # if there is more than one sequence in the file while ($seq_obj = $seqio_obj->next_seq){ # print the sequence print $seq_obj->seq,"\n"; } #!/bin/perl -w use Bio::DB::GenBank; $db_obj = Bio::DB::GenBank->new; $seq_obj = $db_obj->get_Seq_by_id( AE );

Biol Practical Biocomputing9 BioPerl Getting sequences directly from database #!/bin/perl -w use Bio::DB::GenBank; # also Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt, Bio::DB::RefSeq and Bio::DB::EMBLBio::DB::GenBankBio::DB::GenPeptBio::DB::SwissProtBio::DB::RefSeqBio::DB::EMBL #keyword query $query_obj = Bio::DB::Query::GenBank->new( -query =>'gbdiv est[prop] AND Trypanosoma brucei [organism]', -db => 'nucleotide' ); $gb = new Bio::DB::GenBank; # this returns a Seq object : $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); # this also returns a Seq object : $seq2 = $gb->get_Seq_by_acc('AF303112'); # this returns a SeqIO object, which can be used to get a Seq object : $seqio = $gb->get_Stream_by_id(["J00522","AF303112"," "]); $seq3 = $seqio->next_seq;

Biol Practical Biocomputing10 BioPerl Getting more sequence information Some methods − accession_number()get the accession number − display_id()get identifier string − description() or desc()get description string − seq()get the sequence as a string − length()get the sequence length − subseq($start, $end)get a subsequence (char string) − translate()translate to protein (seq obj) − revcom()reverse complement (seq obj) − species()Returns an Bio::Species object #!/usr/bin/env perl use strict; use Bio::SeqIO; use Bio::DB::GenBank; my $genBank = new Bio::DB::GenBank; my $seq = $genBank->get_Seq_by_acc('AF060485'); # get a record by accession my $dna = $seq->seq; # get the sequence as a string my $id = $seq->display_id; # identifier my $acc = $seq->accession; # accession number my $desc = $seq->desc; # get the description print "ID: $id\naccession: $acc\nDescription: $desc\n$dna\n";

Biol Practical Biocomputing11 BioPerl Sequence Objects LOCUS ECORHO 1880 bp DNA linear BCT 26-APR-1993 DEFINITION E.coli rho gene coding for transcription termination factor. ACCESSION J01673 J01674 VERSION J GI: KEYWORDS attenuator; leader peptide; rho gene; transcription terminator. SOURCE Escherichia coli ORGANISM Escherichia coli Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. REFERENCE 1 (bases 1 to 1880) AUTHORS Brown,S., Albrechtsen,B., Pedersen,S. and Klemm,P. TITLE Localization and regulation of the structural gene for transcription-termination factor rho of Escherichia coli JOURNAL J. Mol. Biol. 162 (2), (1982) MEDLINE PUBMED REFERENCE 2 (bases 1 to 1880) AUTHORS Pinkham,J.L. and Platt,T. TITLE The nucleotide sequence of the rho gene of E. coli K-12 COMMENT Original source text: Escherichia coli (strain K-12) DNA. A clean copy of the sequence for [2] was kindly provided by J.L.Pinkham and T.Platt. FEATURES Location/Qualifiers source /organism="Escherichia coli" /mol_type="genomic DNA" /strain="K-12" /db_xref="taxon:562" mRNA 212..>1880 /product="rho mRNA" gene /gene="rho" CDS /gene="rho" /note="transcription termination factor" /codon_start=1 /translation="MNLTELKNTPVSELITLGENMGLENLARMRKQDIIFAILKQHAK... IDAMEFLINKLAMTKTNDDFFEMMKRS" ORIGIN 15 bp upstream from HhaI site. 1 aaccctagca ctgcgccgaa atatggcatc cgtggtatcc cgactctgct gctgttcaaa 61 aacggtgaag tggcggcaac caaagtgggt gcactgtcta aaggtcagtt gaaagagttc...deleted tgggcatgtt aggaaaattc ctggaatttg ctggcatgtt atgcaatttg catatcaaat 1861 ggttaatttt tgcacaggac //

Biol Practical Biocomputing12 BioPerl Bio::Seq object methods add_SeqFeature($feature) - attach feature(s) get_SeqFeatures() - get all the attached features. species() - a Bio::Species object annotation() - Bio::Annotation::Collection Features Bio::SeqFeatureI - interface Bio::SeqFeature::Generic - basic implementation SeqFeature::Similarity - some score info SeqFeature::FeaturePair - pair of features

Biol Practical Biocomputing13 BioPerl Sequence Features Bio::SeqFeatureI - interface - GFF derived ○ start(), end(), strand() for location information ○ location() - Bio::LocationI object (to represent complex locations) ○ score,frame,primary_tag, source_tag - feature information ○ spliced_seq() - for attached sequence, get the sequence spliced. Bio::SeqFeature::Generic ○ add_tag_value($tag,$value) - add a tag/value pair ○ get_tag_value($tag) - get all the values for this tag ○ has_tag($tag) - test if a tag exists ○ get_all_tags() - get all the tags

Biol Practical Biocomputing14 BioPerl Sequence Annotations Each Bio::Seq has a Bio::Annotation::Collection via $seq->annotation() Annotations are stored with keys like ‘comment’ and ‘reference’ Annotation::Comment ○ comment field Annotation::Reference ○ author,journal,title, etc Annotation::DBLink ○ database,primary_id,optional_id,comment get_Annotations(’comment’) $annotation-> add_Annotation(’comment’,$an)

Biol Practical Biocomputing15 BioPerl Sequences, Features, and Annotations Bio::LocationI has-a Bio::SeqFeature::Generic Bio::Annotation::Comment has-a Annotations Bio::SeqBio::Annotation::Collection Features

Biol Practical Biocomputing16 BioPerl Writing sequences write in a different format than read = reformatting use Bio::SeqIO; #convert swissprot to fasta format my $in = Bio::SeqIO->new(-format => ‘swiss’, -file => ‘file.sp’); my $out = Bio::SeqIO->new(-format => ‘fasta’, -file => ‘>file.fa’);` while( my $seq = $in->next_seq ) { $out->write_seq($seq); }

Biol Practical Biocomputing17 BioPerl Remote Blast Retrieve sequence, setup and submit use Bio::DB::GenBank; use Bio::Tools::Run::RemoteBlast; # retrieve sequence from genbank my $db_obj = Bio::DB::GenBank->new; my $seq_obj = $db_obj->get_Seq_by_acc( ' ' ); my $seq = $seq_obj->seq; print "seq:$seq\n"; #remote BLAST setup and query submission my $v = 1;# turn on verbose output my $remote_blast = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastp', '-data' => 'swissprot', '-expect' => '1e-10' ); my $r = $remote_blast->submit_blast( $seq_obj ); print STDERR "waiting…" if( $v > 0 );

Biol Practical Biocomputing18 BioPerl Remote Blast Retrieve sequence, setup and submit WARNING MSG: Unrecognized DBSOURCE data: pdb: molecule 2NLL, chain 65, release Aug 27, 2007; deposition: Nov 20, 1996; class: TranscriptionDNA; source: Mol_id: 1; Organism_scientific: Homo Sapiens; Organism_common: Human; Genus: Homo; Species: Sapiens; Expression_system: Escherichia Coli; Expression_system_common: Bacteria; Expression_system_genus: Escherichia; Expression_system_species: Coli; Mol_id: 2; Organism_scientific: Homo Sapiens; Organism_common: Human; Genus: Homo; Species: Sapiens; Expression_system: Escherichia Coli; Expression_system_common: Bacteria; Expression_system_genus: Escherichia; Expression_system_species: Coli; Mol_id: 3; Synthetic: Yes; Mol_id: 4; Synthetic: Yes; Exp. method: X-Ray Diffraction seq:CAICGDRSSGKHYGVYSCEGCKGFFKRTVRKDLTYTCRDNKDCLIDKRQRNRCQYCRYQKCLAMGM

Biol Practical Biocomputing19 BioPerl Remote Blast Results list of search rids are stored in the remoteblast object #while = $remote_blast->each_rid ) { foreach my $rid ) { # Try to retrieve a search, $rc is not a reference until the search is done # when the serch is complete, $rc is a Bio::SearchIO object my $rc = $remote_blast->retrieve_blast($rid); if( !ref($rc) ) { # if the search is not done, wait 5 sec and try again # it would be a good idea to put a maximum limit here so the script # doesn't run forever in the event of an error if ( $rc < 0 ) { $remote_blast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # search result successfully retrieved my $result = $rc->next_result(); # see Bio::Search::Result #save the output my $filename = $result->query_name()."\.out"; $remote_blast->save_output($filename); $remote_blast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) {a # see Bio::Search::Hit::HitI next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; }

Biol Practical Biocomputing20 BioPerl Remote Blast waiting.... Query Name: 2NLL_A hit name is sp|P |RXRA_MOUSE score is 275 hit name is sp|P |RXRA_HUMAN score is 275 hit name is sp|Q |RXRA_RAT score is 275 hit name is sp|Q |RXRAB_DANRE score is 273 hit name is sp|A2T929.2|RXRAA_DANRE score is 272 hit name is sp|Q7SYN5.1|RXRBA_DANRE score is 270 hit name is sp|Q |RXRGA_DANRE score is 268 hit name is sp|P |RXRA_XENLA score is 268 hit name is sp|P |RXRG_CHICK score is 268 hit name is sp|Q |RXRBB_DANRE score is 266 hit name is sp|Q0GFF6.2|RXRG_PIG score is 264 hit name is sp|Q0VC20.1|RXRG_BOVIN score is 264 hit name is sp|Q5BJR8.1|RXRG_RAT score is 264 hit name is sp|Q5REL6.1|RXRG_PONAB score is 264 hit name is sp|P |RXRG_XENLA score is 264 hit name is sp|P |RXRG_HUMAN score is 264 hit name is sp|P |RXRG_MOUSE score is 264 hit name is sp|Q6DHP9.1|RXRGB_DANRE score is 261 hit name is sp|Q5TJF7.1|RXRB_CANFA score is 258 … hit name is sp|Q505F1.2|NR2C1_MOUSE score is 200 hit name is sp|Q9TTR7.1|COT2_BOVIN score is 200 hit name is sp|Q |COT2_CHICK score is 200 hit name is sp|P |7UP1_DROME score is 200 hit name is sp|P |7UP2_DROME hit name is sp|P |COT2_HUMAN hit name is sp|O |COT2_RAT hit name is sp|A0JNE3.1|NR2C1_BOVIN hit name is sp|Q6PH18.1|N2F1B_DANRE hit name is sp|Q9N4B8.4|NHR41_CAEEL hit name is sp|O |NHR49_CAEEL hit name is sp|P |HNF4_DROME hit name is sp|P |HNF4B_XENLA

Biol Practical Biocomputing21 BioPerl Database Search BLAST - 3 Components ○ Result: Bio::Search::Result::ResultI ○ Hit: Bio::Search::Hit::HitI ○ HSP: Bio::Search::HSP::HSPI

Biol Practical Biocomputing22 BioPerl Blast use Bio::Perl; my $seq = get_sequence('swiss',"ROA1_HUMAN"); # uses the default database - nr in this case my $blast_result = blast_sequence($seq); write_blast(">roa1.blast",$blast_result);} $report_obj = new Bio::SearchIO(-format => 'blast', -file => 'report.bls'); while( $result = $report_obj->next_result ) { while( $hit = $result->next_hit ) { while( $hsp = $hit->next_hsp ) { if ( $hsp->percent_identity > 75 ) { print "Hit\t", $hit->name, "\n", "Length\t", $hsp->length('total'), "\n", "Percent_id\t", $hsp->percent_identity, "\n"; }

Biol Practical Biocomputing23 BioPerl BLAST – Processed result Query is: BOSS_DROME Bride of sevenless protein precursor. 896 aa Matrix was BLOSUM62 Hit is F35H10.10 HSP Len is 315 E-value is 4.9e-11 Bit score 182 Query loc: Sbject loc: HSP Len is 28 E-value is 1.4e-09 Bit score 39 Query loc: Sbject loc:

Biol Practical Biocomputing24 BioPerl BLAST – Using the search::Hit object use Bio::SearchIO; use strict; my $parser = new Bio::SearchIO(-format => ‘blast’, -file => ‘file.bls’); while( my $result = $parser->next_result ){ while( my $hit = $result->next_hit ) { print “hit name=”,$hit->name, “ desc=”, $hit->description, “\n len=”, $hit->length, “ acc=”, $hit->accession, ”\n”; print “raw score “, $hit->raw_score, “ bits “, $hit->bits, “ significance/evalue=“, $hit->evalue, “\n”; } }

Biol Practical Biocomputing25 BioPerl Search::Hit methods start(), end() ○ get overall alignment start and end for all HSPs strand() ○ get best overall alignment strand matches() ○ get total number of matches across entire set of HSPs ○ can specify only exact ‘id’ or conservative ‘cons’

Biol Practical Biocomputing26 BioPerl Using Search::HSP use Bio::SearchIO; use strict; my $parser = new Bio::SearchIO(-format => ‘blast’, -file => ‘file.bls’); while( my $result = $parser->next_result ){ while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print “hsp evalue=“, $hsp->evalue, “ score=“ $hsp->score, “\n”; print “total length=“, $hsp->hsp_length, “ qlen=”, $hsp->query->length, “ hlen=”,$hsp->hit->length, “\n”; print “qstart=”,$hsp->query->start, “ qend=”,$hsp->query->end, “ qstrand=“, $hsp->query->strand, “\n”; print “hstart=”,$hsp->hit->start, “ hend=”,$hsp->hit->end, “ hstrand=“, $hsp->hit->strand, “\n”; print “percent identical “, $hsp->percent_identity, “ frac conserved “, $hsp->frac_conserved(), “\n”; print “num query gaps “, $hsp->gaps(’query’), “\n”; print “hit str =”, $hsp->hit_string, “\n”; print “query str =”, $hsp->query_string, “\n”; print “homolog str=”, $hsp->homology_string, “\n”; } }

Biol Practical Biocomputing27 BioPerl Search::HSP methods rank() ○ order in the alignment ○ by score, size matches seq_inds ○ residue positions that are − conserved, identical, mismatches, gaps

Biol Practical Biocomputing28 BioPerl SearchIO object correspond to many results BLAST (WU-BLAST, NCBI, XML, PSIBLAST, BL2SEQ, MEGABLAST, TABULAR (-m8/m9)) FASTA (m9 and m0) HMMER (hmmpfam, hmmsearch) UCSC formats (WABA, AXT, PSL) Gene based alignments ○ Exonerate, SIM4, {Gene,Genome}wise Can write searches in alternative formats

Biol Practical Biocomputing29 BioPerl Sequence Alignment Bio::AlignIO to read alignment files Produces Bio::SimpleAlign objects ○ Phylip ○ Clustal Interface and objects designed for round-tripping and some functional work

Biol Practical Biocomputing30 BioPerl Graphics use Bio::Graphics; use Bio::SeqIO; use Bio::SeqFeature::Generic; my $file = shift or die "provide a sequence file as the argument"; my $io = Bio::SeqIO->new(-file=>$file) or die "couldn't create Bio::SeqIO"; my $seq = $io->next_seq or die "couldn't find a sequence in the file"; = $seq->all_SeqFeatures; # sort features by their primary tags my %sorted_features; for my $f { my $tag = $f->primary_tag; } my $panel = Bio::Graphics::Panel->new( -length => $seq->length, -key_style => 'between', -width => 800, -pad_left => 10, -pad_right => 10); $panel->add_track(arrow => Bio::SeqFeature::Generic->new(-start => 1, -end => $seq->length), -bump => 0, -double=>1, -tick => 2); $panel->add_track(generic => Bio::SeqFeature::Generic->new(-start => 1, -end => $seq->length, -bgcolor => 'blue', -label => 1,); # general case = qw(cyan orange blue purple green chartreuse magenta yellow aqua); my $idx = 0; for my $tag (sort keys %sorted_features) { my $features = $sorted_features{$tag}; $panel->add_track($features, -glyph => 'generic', -bgcolor => $colors[$idx++ -fgcolor => 'black', -font2color => 'red', -key => "${tag}s", -bump => +1, -height => 8, -label => 1, -description => 1, ); } print $panel->png;

Biol Practical Biocomputing31 BioPerl Graphics