Modules and BioPerl.

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

Welcome to lecture 5: Object – Oriented Programming in Perl IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept.
INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
Objected Oriented Perl An introduction – because I don’t have the time or patience for an in- depth OOP lecture series…
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,
12.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
11.1 Variable types in PERL ScalarArrayHash $number $string %hash $array[0] $hash{key}
9.1 Subroutines and sorting. 9.2 A subroutine is a user-defined function. Subroutine definition: sub SUB_NAME { STATEMENT1; STATEMENT2;... } Subroutine.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
Sup.1 Supplemental Material (NOT part of the material for the exam)
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
14.1 Wrapping up Revision 14.3 References are your friends…
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
Scripting Languages Perl Chapter #4 Subroutines. Writing your own Functions Functions is a programming language serve tow purposes: –They allow you to.
10.1 Variable types in PERL ScalarArrayHash $number $string %hash => $array[0] $hash{key}
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 By Eyal Privman and Dudu.
8ex.1 References and complex data structures. 8ex.2 An associative array (or simply – a hash) is an unordered set of key=>value pairs. Each key is associated.
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2009 By Eyal Privman
10.1 Sorting and Modules בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
Bioperl modules.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
Sequence Alignment Topics: Introduction Exact Algorithm Alignment Models BioPerl functions.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
13r.1 Revision (Q&A). 13r.2 $scalar 13r.3 Multiple assignment my ($a,$b) = ('cow','dog'); = = (6,7,8,9,10);
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
11.1 Subroutines A function is a portion of code that performs a specific task. Functions Functions we've met: $newStr = substr
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
Lecture 8 perl pattern matching features
Builtins, namespaces, functions. There are objects that are predefined in Python Python built-ins When you use something without defining it, it means.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormack 3rd floor 607.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Beginning BioPerl for Biologists MPI Ploen Jun Wang.
Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
12.1 Running Other Programs And CGI Scripts Please fill the teaching survey at: I read it closely, and I.
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
How to write & use Perl Modules. What is a Module? A separate Namespace in a separate file with related functions/variables.
Programming Perl in UNIX Course Number : CIT 370 Week 6 Prof. Daniel Chen.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
O Log in to amazon biolinux O For mac users O ssh O For Windows users O use putty O Hostname public_dns_address O username ubuntu.
Introducing Bioperl Toward the Bioinformatics Perl programmer's nirvana.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
1 Using Perl Modules. 2 What are Perl modules?  Modules are collections of subroutines  Encapsulate code for a related set of processes  End in.pm.
Lecture 6.11
Lecture 9: Basic concepts of Perl Modules. Functions (Subs) In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements.
References and Data Structures
Introduction to Bioinformatic Computation. Lecture
Install external command line softwares
The Ensembl Database Steven Jones August 18, 2004
Bioinformatics for Research
Character (String) Data
Introduction to Bioinformatic Computation. Lecture
Perl Variables: Array Web Programming.
Perl Functions.
Lesson 2. Control structures File IO - reading and writing Subroutines
References and Objects
Multiple sequence alignment & Phylogenetics Analysis
Presentation transcript:

Modules and BioPerl

סקר הוראה בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)

Subroutine revision sub reverseComplement { my ($seq) = @_; $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG"); CACTGC A subroutine receives its arguments through @_ and may return a scalar or a list value:

Passing variables by reference If we want to pass arrays or hashes to a subroutine, we must pass a reference: %gene = ("protein_id" => "E4a", "strand" => "-", "CDS" => [126,523]); printGeneInfo(\%gene); sub printGeneInfo { my ($geneRef) = @_; print "Protein $geneRef->{'protein_id'}\n"; print "Strand $geneRef->{'strand'}\n"; print "From: $geneRef->{'CDS'}[0] "; print "to: $geneRef->{'CDS'}[1]\n"; }

Passing variables by reference What if we wanted to invoke this subroutine on every gene in the hash of genes that we created in The previous exercise? foreach $geneRef (values(%genes)) { printGeneInfo($geneRef); } %genes NAME => {"protein_id" => PROTEIN_ID "strand" => STRAND "CDS" => [START, END]}

Returning variables by reference Similarly, to return a hash use a reference: sub getGeneInfo { my %geneInfo; ... ... (fill hash with info) return \%geneInfo; } $geneRef = getGeneInfo(..); In this case the hash will continue to exists outside the scope of the subroutine!

Modules

What are modules A module or a package is a collection of subroutines, usually stored in a separate file with a “.pm” suffix (Perl Module). The subroutines of a module should deal with a well-defined task. e.g. Fasta.pm: may contain subroutines that read and write FASTA files: readFasta, writeFasta, getHeaders, getSeqNo.

Writing a module A module is usually written in a separate file with a “.pm” suffix. The name of the module is defined by a “package” line at the beginning of the file: package Fasta; sub getHeaders { ... } sub getSeqNo { ... } The last line of the module must be a true value, so usually we just add: 1;

Using modules In order to write a script that uses a module add a “use” line at the beginning of the script: use Fasta; Note #1: for basic use of modules put the module file is in the same directory as your script, otherwise Perl won’t find it! Note #2: You can “use” inside a module another module.

Using modules - namespaces use Fasta; Now we can invoke a subroutine from within the namespace of that package: PACKAGE::SUBROUTINE(...) e.g. $seq = Fasta::getSeqNo(3); Note that we cannot access it without specifying the namespace: $seq = getSeqNo(3); Undefined subroutine &main::getSeqNo called at... Perl tells us that no subroutine by that name is defined in the “main” namespace (the global namespace) There is a way to avoid this by using the “Exporter” module that allows a package to export it’s subroutine names. You can read about it here: http://www.netalive.org/tinkering/serious-perl/#namespaces_export

BioPerl The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research. Things you can do with BioPerl: Read and write sequence files of different format, including: Fasta, GenBank, EMBL, SwissProt and more… Extract gene annotation from GenBank, EMBL, SwissProt files Read and analyse BLAST results. Read and process phylogenetic trees and multiple sequence alignments. Analysing SNP data. And more…

BioPerl BioPerl modules are called Bio::XXX You can use the BioPerl wiki: http://bio.perl.org/ with documentation and examples for how to use them – which is the best way to learn this. We recommend beginning with the "How-tos": http://www.bioperl.org/wiki/HOWTOs To a more hard-core inspection of BioPerl modules: BioPerl 1.5.2 Module Documentation

Object-oriented use of packages Many packages are meant to be used as objects. In Perl, an object is a data structure that can use subroutines that are associated with it. To create an object from a certain package use “new”: my $obj = new PACKAGE; e.g. my $in = new FileHandle; New returns a reference to a data structure, which acts as a FileHandle object. New can also receive arguments: my $obj = new PACKAGE; my $in = new FileHandle(">$inFile"); $obj 0x225d14 func() anotherFunc() =>

Object-oriented use of packages To invoke a subroutine from the package for a specific object we use the “->” notation again: $line = $in->getLine(); Note that this is different from accessing elements of a reference to an array or hash, because we don’t have brackets around “getLine”: $length = $proteinLengths->{AP_000081}; $grade = $gradesRef->[0]; $obj 0x225d14 func() anotherFunc() =>

BioPerl: the SeqIO module The Bio::SeqIO module allows input/output of sequences from/to files, in many formats: use Bio::SeqIO; $in = new Bio::SeqIO( "-file" => "<$inputfilename", "-format" => "EMBL"); $out = new Bio::SeqIO( "-file" => ">$outputfilename", "-format" => "Fasta"); while ( my $seq = $in->next_seq() ) { $out->write_seq($seq); }

BioPerl: the Seq module The Bio::SeqIO function “next_seq” returns an object of the Bio::Seq module. This module provides functions like id, accession, length and subseq (read about them in the documentation!): use Bio::SeqIO; $in = new Bio::SeqIO( "-file" => "<$inputfilename", "-format" => "Fasta"); while ( my $seqObj = $in->next_seq() ) { print "Sequence ",$seqObj->id(),"\n"; print "First 10 bases ",$seqObj->subseq(1,10); }

BioPerl: get files from the web The Bio::DB::Genbank module allows us to download a specific record from the NCBI website: use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seqObj = $gb->get_Seq_by_acc("J00522"); # or ... request Fasta sequence $gb = new Bio::DB::GenBank("-format" => "Fasta");