1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Introduction to bioperl. What is perl? Production Engineering Research Laboratory Practically Everything Really Likeable Pre-positioned Equipment Requirement.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
Installing Bioperl Perl and BioPerl are both open source projects
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
Bioperl modules.
Lecture 8 Configuring a Printer-using Magic Filter Introduction to IP Addressing.
Shell Scripting Basics Arun Sethuraman. What’s a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Programming For Nuclear Engineers Lecture 12 MATLAB (3) 1.
CGI Common Gateway Interface. CGI is the scheme to interface other programs to the Web Server.
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
Using the Unix Shell There is No ‘Undelete’. The Unix Shell “A Unix shell is a command-line interpreter or shell that provides a traditional user interface.
UNIX command line. In this module you will learn: What is the computer shell What is the command line interface (or Terminal) What is the filesystem tree.
1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2.
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
8 Shell Programming Mauro Jaskelioff. Introduction Environment variables –How to use and assign them –Your PATH variable Introduction to shell programming.
Introduction to Perl & BioPerl Dr G. P. S. Raghava Bioinformatics Centre Bioinformatics Centre IMTECH, Chandigarh Web:
Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Computer Programming for Biologists Oct 30 th – Dec 11 th, 2014 Karsten Hokamp  Fill out.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.

Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CGI Common Gateway Interface. CGI is the scheme to interface other programs to the Web Server.
What is PHP? PHP stands for PHP: Hypertext Preprocessor PHP is a server-side scripting language, like ASP PHP scripts are executed on the server PHP supports.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
CS4710 Why Progam?. Why learn to program? Utility of programming skills: understand tools modify tools create your own automate repetitive tasks automate.
Chapter 1 : The Linux System Part 2 Lecture 2 11/14/
Introduction to Unix – CS 21
Setting up Cygwin Computer Organization I 1 May 2010 ©2010 McQuain Cygwin: getting the setup tool Free, almost complete UNIX environment emulation.
What is a port The Ports Collection is essentially a set of Makefiles, patches, and description files placed in /usr/ports. The port includes instructions.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Chapter Three The UNIX Editors.
Writing Scripts Hadi Otrok COEN 346.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
An Overview of Perl A language for Systems and Network Administration and Management: An overview of the language.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Files Tutor: You will need ….
More Unix Naomi Altman. Directories Directory = folder mkdir - makes a new directory rmdir - removes an empty directory cd mydirectory - moves you into.
Perl Day 6. Multiline Strings Perl supports a mechanism to deal with multiple lines of text rather than having to add them one at a time Perl supports.
The Kernel At a high level, the kernel in an operating system serves as the bridge between applications and the actual data processing of the hardware.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Introducing Bioperl Toward the Bioinformatics Perl programmer's nirvana.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Perl Subroutines User Input Perl on linux Forks and Pipes.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Introduction to LINUX command line Allegra Via IBBE, National Research Council, Italy Sapienza Università di Roma, Italy EMBO Practical Course on Computational.
BASIS Quick Start Guide
Introduction to Perl: Practical extraction and report language
Development Environment
Introduction to Bioinformatic Computation. Lecture
CS 330 Class 7 Comments on Exam Programming plan for today:
Install external command line softwares
The Linux Operating System
Introduction to Bioinformatic Computation. Lecture
Modification of the bioperl script for parsing BLAST output
Presentation transcript:

1 BioPerl

2 Object Oriented Programming Continued – BioPerl Install

3 System Config file:.ncbirc [NCBI] Data=~/blast/blast-2.2.6/data # this tells blast applications where certain data files are located – such as BLOSUM matrices file:.cshrc (for tcsh/csh) # this line tells blast applications where to look for pre-formatted DB's setenv BLASTDB ~/blast/blast-2.2.6/blastdb # this line is a plain environment variable – adding the blast applications to the "path" variable set path = ( $path ~/blast/blast ) Example blastall –p blastn –i seq –d yeast.nt

4 Example – Bio::SearchIO #!/usr/bin/perl # blastPars.pl # taken almost verbatim from perldoc Bio::SearchIO use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-file => 'bmp4.out'); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print "Hit= ", $hit->name, ",Length=", $hsp->length('total'), ",Percent_id=", $hsp->percent_identity, ",hit_string=", $hsp->hit_string, "\n"; }

5 Hit= mm3_dna,Length=14,Percent_id=100,hit_string=ttaattgtaatttt Hit= mm3_dna,Length=13,Percent_id=100,hit_string=cttccctcctccc Hit= mm3_dna,Length=13,Percent_id=100,hit_string=ggcaataacacca Hit= mm3_dna,Length=12,Percent_id=100,hit_string=ccttttaggcca Hit= mm3_dna,Length=12,Percent_id=100,hit_string=tgttttaatcat Hit= mm3_dna,Length=12,Percent_id=100,hit_string=gttattttgttt Hit= mm3_dna,Length=12,Percent_id=100,hit_string=tccttctctttt Hit= mm3_dna,Length=12,Percent_id=100,hit_string=taaactgttaaa Hit= mm3_dna,Length=11,Percent_id=100,hit_string=caaaaggagga Hit= mm3_dna,Length=11,Percent_id=100,hit_string=tcaaagtaaat

HW – Part 2 6

7 Installing bioperl Condensed instructions for installing bioperl on CSS (aka ICAEN): (see also INSTALL that comes with bioperl) First – look at 0) perl –version Note, this will only work with perl version or higher. (so on CSS, use NX-client download bioperl-1.4.tar.gz (~3 Meg, from OR or ftp.cpan.org in /pub/CPAN/modules/by-module/Bio) 1) ftp ftp.cpan.orgftp.cpan.org 1.1) bin 1.2) cd pub/CPAN/modules/by-module/Bio 1.3) get bioperl-1.*.*.tar.gz #bioperl-1.4.tar.gz 1.4) quit 2) gunzip bioperl-1.4.tar.gz 3) tar -xvf bioperl-1.4.tar ( this is approximately 13 meg)

8 Installing bioperl Condensed instructions for installing bioperl on CSS (aka ICAEN): (see also INSTALL that comes with bioperl) First – look at 0) perl –version Note, this will only work with perl version or higher. (so on CSS, use NX-client download bioperl-1.4.tar.gz (~3 Meg, from OR or ftp.cpan.org in /pub/CPAN/modules/by-module/Bio) 1) ftp ftp.cpan.orgftp.cpan.org 1.1) bin 1.2) cd pub/CPAN/modules/by-module/Bio 1.3) get bioperl-1.*.*.tar.gz #bioperl-1.4.tar.gz 1.4) quit 2) gunzip bioperl-1.4.tar.gz 3) tar -xvf bioperl-1.4.tar ( this is approximately 13 meg)

9 Installing bioperl 3.5) mkdir ~/perl 3.6) mkdir ~/perl/bioperl 3.8) cd bioperl-1.4 4) perl Makefile.PL PREFIX=~/perl/bioperl (if you do it this way -- the "lib" won't work) make make test make install (see installing in private space on next slides) To uninstall, just delete ~/perl/bioperl and ~/perl/bioperl-1.4

10 5) To use: #!/usr/local/bin/perl use lib "~/perl/bioperl/"; # this is supposed to work,but did NOT on CSS use Bio::SearchIO; Instead, set environment variable: Bash 5.1) PERL5LIB=~/perl/bioperl/lib/perl5/site_perl/5.10.0; export PERL5LIB Csh 5.1) setenv PERL5LIB ~/perl/bioperl/lib/perl5/site_perl/ Mac (bash) 5.1) PERL5LIB=~/perl/bioperl/lib/perl5/site_perl; export PERL5LIB 6) To make docs work (I would just put this in your.cshrc file: Bash: PATH=$PATH:~/perl/bioperl/lib/site_perl/ ; export PATH Csh: set path = ($path ~/perl/bioperl/lib/site_perl/5.10.0) Test with: perldoc Bio::SearchIO 7) Test with sample program FINALLY, please note that the version numbers change over time, and the actual paths may very a little between CPAN and/or bioperl.org It make take some trial and error (it usually does for me).

11 Try it – Bio::SearchIO #!/usr/bin/perl # blastPars.pl # taken almost verbatim from perldoc Bio::SearchIO use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-file => 'bmp4.out'); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print "Hit= ", $hit->name, ",Length=", $hsp->length('total'), ",Percent_id=", $hsp->percent_identity, ",hit_string=", $hsp->hit_string, "\n"; }

12 INSTALLING BIOPERL IN A PERSONAL OR PRIVATE MODULE AREA If you lack permission to install perl modules into the standard site_perl/ system area you can configure bioperl to install itself anywhere you choose. Ideally this would be a personal perl directory or standard place where you plan to put all your 'local' or personal perl modules. Note: you _must_ have write permission to this area. Simply pass a parameter to perl as it builds your system specific makefile. Example: perl Makefile.PL LIB=/home/users/dag/My_Local_Perl_Modules make make test make install This tells perl to install bioperl in the desired place, e.g.: /home/users/dag/My_Perl_Modules/Bio/Seq.pm Then in your Bioperl script you would write (NOTE ~/dag/My_Local_Perl_Modules will NOT work): use lib "/home/users/dag/My_Local_Perl_Modules"; use Bio::Seq; To see "perldoc Bio::SearchIO -- you would need to be in directory ~/dag/My_Local_Peral_Modules

13 SearchIO.pm References:

14 More notes on bioperl: Windows 1) Quick instructions for the impatient, lucky, or experienced user. ========================================== Download the ActivePerl MSI from Run the ActivePerl Installer (accepting all defaults is fine). Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm). Add two new PPM repositories with the following commands: ppm> rep add Bioperl ppm> rep add Kobes ppm> rep add Bribes Install Bioperl with the following commands: ppm> search Bioperl This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name. ppm> install Where corresponds to the relevant package and version from the numbered list obtained above. Go to and start reading documentation.

15 Another way "cpan" ppm

16 Windows blast binaries? ftp://ftp.ncbi.nlm.nih.gov/blast/executables/L ATEST/blast ia32-win32.exe

17 Bioperl large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code assists with sequence data and associated annotation access to various types of databases remote (GenBank, EMBL etc) and local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences. associated documentation and mailing list (community of bioinformaticists)

18 Bioperl "most" bioinformatics and computational biology applications are developed in Unix/Linux environments more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment. Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install

19 Bioperl Some programs (BLAST for example) have been ported to Windows. These can be installed and work quite happily with Bioperl in the native Windows environment. fairly simple project OR only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also. Be aware that most Bioperl developers are working in some type of a UNIX environment (Linux, OSX, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses - simply because no one knows the answer to your Windows specific problem. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows.

20 Bioperl Perl is a programming language that has been extended a lot by the addition of external modules. These modules work with the core language to extend the functionality of Perl. Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed

21 Bioperl Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six packages.

22 Bioperl Bioperl Group Functions bioperl (the core) Most of the main functionality of Bioperl. bioperl-run Wrappers to a lot of external programs. bioperl-ext Interaction with some alignment functions and the Staden package. bioperl-db Using bioperl with BioSQL and local relational databases. bioperl-microarray Microarray specific functions. bioperl-gui Some preliminary work on a graphical user interface to some Bioperl functions.

23 Miscellaneous Various commands and techniques that did not make it into other sections. Useful as a review Valuable (I've used them)

24 split split /PATTERN/, EXPR, LIMIT split /PATTERN/, EXPR split /PATTERN/ split -- returns an array of strings -- scans the string EXPR -- splits the EXPR string into a list of substrings by delimiters -- delimiters are defined by repeated pattern matching of the regular expression PATTERN -- if it doesn't match, the whole string (EXPR) is returned -- if it matches once, you get 2 strings, etc. -- if PATTERN is omitted, it splits on whitespaces after omitting leading whitespaces (/\s+/ -- if EXPR is omitted, it splits $_ -- If LIMIT is specified, it splits the string into NO MORE than that many fields

25 = split ' ', = split /[ |,]+/, $text; ($login, $passwd, $remainder) = split /:/, $_, 3; # this splits on ":" – and assigns the first 2 to variables, then the rest is stored in $remainder because of the limit (3)

26 Split Examples #!/usr/bin/perl $text = "this is a = split ' ',$text; foreach { print "$_\n"; } # this is a test (on separate lines) $text = "this is another, simple = split /[ |,]+/,$text; foreach { print "$_\n"; } # this is another simple test (on separate lines)

27 splice splice ARRAY, OFFSET, LENGTH, LIST splice ARRAY, OFFSET, LENGTH splice ARRAY, OFFSET -- removes the elements designated by OFFSET and LENGTH, from ARRAY, and replaces them with LIST -- if LENGTH is omitted, then everything after OFFSET is removed -- returns the elements removed

28 splice examples = qw/one two six = ("three", "four", "five"); print # one two six seven 2, print # one two three four five six seven

29 glob glob EXPR -- returns the file name expansions of = glob = glob ".* *"; #multiple patterns separated by = glob "*.pl";

30 system system LIST -- executes any program on the system #!/usr/bin/perl system("/mnt/r0-blastdb/blast-bin/blastall –p blastn –d /mnt/r0-blastdb/FormattedDBs/nt – i test.txt –o test.out");.

31 back ticks #!/usr/bin/perl $output = `/mnt/r0-blastdb/blast-bin/blastall –p blastn –d /mnt/r0-blastdb/FormattedDBs/nt – i test.txt –o test.out`; print "$output\n"; # command is passed on, and interpreted by the shell # output of command returned

Slides Deprecated 32

33 New Version of BPlite BPlite has actually been "deprecated" –this means that its functionality has been replaced by something else –the code is still available and included, but will not be supported by future versions Replaced by SearchIO perldoc Bio::Tools::BPlite perldoc Bio::SearchIO

34 OOP used extensively in BioPerl A subject is a BLAST hit, which should not be confused with an HSP (below). A BLAST hit may have several alignments associated with it. A useful way of thinking about it is that a subject is "analogous" to a gene and HSPs are "analogous" to exons. Subjects have one attribute (name) and one method (nextHSP). An HSP is a high scoring pair, or simply an alignment. Look at: perldoc Bio::Tools::BPlite

35 BPlite Example – what it looks like to use OOP $report->query; $report->database; while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { print "querySeq ".$hsp->querySeq."\n"; print "sbjctSeq ".$hsp->sbjctSeq."\n"; print "homologySeq ".$hsp->homologySeq."\n"; }

use Bio::Tools::BPlite; $blast_file = "Chr16.0.out"; my $report = new Bio::Tools::BPlite('-file' => $blast_file); $rp = $report->query; $db = $report->database; while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { print "substart ".$hsp->subject->start."\n"; print "subjectend ".$hsp->subject->end."\n"; } 36

37 #!/usr/bin/perl -w # #Input: file_name (blast results file from RPS-BLAST) #Output: list of domain locations relative to database sequence, # and perhaps the genomic sequence with domains emphasized # # Note I broke in: # /usr/lib/perl5/site_perl # to mimic the fact that students probably do not have bio_perl installed use Bio::Tools::BPlite; use Getopt::Long; if($#ARGV != 3) { die "usage: domainID.pl -bf file -c cutoff_value(0.001)\n"; } &GetOptions("bf=s" => \$blast_file, "c=s" => \$cutoff); my $report = new Bio::Tools::BPlite('-file' => $blast_file); #open(FH,$blast_file); #my $report = new Bio::Tools::BPlite('-fh' => \*FH); $rp = $report->query; print "rp = $rp\n"; $db = $report->database; print "db = $db\n"; while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { #print "score ".$hsp->score."\n"; #print "bits ".$hsp->bits."\n"; #print "percent ".$hsp->percent."\n"; #print "P ".$hsp->P."\n"; #print "match ".$hsp->match."\n"; #print "positive ".$hsp->positive."\n"; #print "length ".$hsp->length."\n"; #print "querySeq ".$hsp->querySeq."\n"; #print "sbjctSeq ".$hsp->sbjctSeq."\n"; #print "homologySeq ".$hsp->homologySeq."\n"; if($hsp->P <= $cutoff) { print "subjectseqname ".$hsp->subject->seqname."\n"; print "qstart ".$hsp->query->start."\n"; print "qend ".$hsp->query->end."\n"; print "e = ".$hsp->P."\n"; #print "percent = ".$hsp->percent."\n"; #What is this??? print "match = ".$hsp->match."\n"; print "positive = ".$hsp->positive."\n"; print "length = ".$hsp->length."\n"; #print "NT query start ".(3*$hsp->query->start)." (assuming protein input)\n"; #print "NT query qend ".(3*$hsp->query->end)."\n"; #print "querySeq ".$hsp->querySeq."\n"; print "\n"; } #print "substart ".$hsp->subject->start."\n"; #print "subjectend ".$hsp->subject->end."\n"; #print "subjectseqname ".$hsp->subject->seqname."\n"; # $hsp->subject->overlaps($exon); } # the following line takes you to the next report in the stream/file # it will return 0 if that report is empty, # but that is valid for an empty blast report. # Returns -1 for EOF.

38 Sample "look" file rp = Random Sequence (500 letters) db = blastdb/yeast.nt subjectseqname gi| |ref|NC_ | Saccharomyces cerevisiae chromosome XVI, complete chromosome sequence qstart 1 qend 16 e = 1.2 match = 16 positive = 16 length = 16 subjectseqname gi| |ref|NC_ | Saccharomyces cerevisiae chromosome XVI, complete chromosome sequence qstart 321 qend 335 e = 4.7 match = 15 positive = 15 length = 15

39./domainID.pl -bf bmp4.out -c WARNING MSG: SeqFeatureI::seqname() is deprecated. Please use seq_id() instead subjectseqname mm3_dna range=chr14: 'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=none qstart 1 qend 278 e = 1e-110 match = 258 positive = 258 length = WARNING MSG: SeqFeatureI::seqname() is deprecated. Please use seq_id() instead subjectseqname mm3_dna range=chr14: 'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=none qstart 707 qend 818 e = 3e-32 match = 102 positive = 102 length = 112

40 End

41 Notes Note to self – exploring the percent identity, gapped, and non-gapped would be a great assignment requires random sequence, alignment (clustalw)