Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,

Similar presentations


Presentation on theme: "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,"— Presentation transcript:

1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center © 2008 The Board of Trustees of The Leland Stanford Junior University

2 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 Prep Log into WebEx session (http://stanford.webex.com/Meetings)http://stanford.webex.com/Meetings  Pwd = “lanelib” Please download all class materials for 3 rd class from FAQ at http://lane.stanford.edu/howto/index.html?id=_3824 in a directory http://lane.stanford.edu/howto/index.html?id=_3824 Open a command window and cd to that directory Start Open Perl IDE (or Mac equivalent)

3 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 Session #3 Focus 1. Object-Oriented (“OO””) programming and why it matters 2. Exploring a powerful application of OO programming: BioPerl modules

4 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 But First… Questions from last session? → Stomp the teacher!

5 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 Part 1: Understanding Object Programming

6 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 If you understand these three concepts, you understand OO programming: 1. Objects 2. Methods 3. Classes Goal: Understanding Enough About Object Programming to be Dangerous Tisdall, 2003

7 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 “The key idea of OO programming is that all data is stored and modified with special data structures called objects, and each kind of object can be accessed only by its defined subroutines called methods. The user of an OO class is typically spared the effort of directly manipulating data, and can use class methods for this instead” Tisdall, 2003Tisdall, 2003.

8 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Understanding Objects Object = Collection of data that logically belongs together.  E.g., a “genome” object has parts (“attributes”) such as… Name of the species Its DNA sequence A list of genes, each associated one or more transcripts A list of start and end points for each exon etc A type of object (e.g., genome object) is called a class  All objects derive from a class

9 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 Understanding Methods A Method is just like a subroutine but associated specifically with a class; they are not shared, except by “inheritance” Each type of object has one or more methods that it can call, and only those methods →The only way to access the data in an object is via the methods defined for that class. E.g., a genome object might have …  A compare method, for whole-genome comparisons  A list-gene-families method, for listing all gene families known to exist in a genome  A GC-percent function, for calculating %GC in specific areas of the genome, or all of it.

10 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 Understanding Classes A Class is an object definition + a collection of methods. A specific object (e.g. a genome object for H. sapiens) is called an instance of a class.

11 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Example of Class Definition and Inheritance

12 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Remember Excel3.pl? It Reads Data From An Excel File Using OO Uses the Spreadsheet::BasicRead module to read Excel workbooksBasicRead Other OO modules also used in program Creates an object of type Spreadsheet Access getNextRow function associated with this object Access cellValue function associated with this object

13 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 Q: So what does OO programming mean for you? A: It provides the fastest way to develop a program with minimal coding You just need to know: 1. That the functionality exists 2. How to call it

14 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Part 2 BioPerl: OO Perl code for biological research that can make you fly!

15 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 BioPerl: Overview BioPerl = >1,000 modules divided into 7 packages  Free  Maintained by volunteers (“Open Source” software)  v1.5.2 is the latest stable release List: http://www.bioperl.org/wiki/Category:Moduleshttp://www.bioperl.org/wiki/Category:Modules

16 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 Some Background - 1 Stajich et al., (2002) The Bioperl toolkit: Perl modules for the life sciences, Genome research, 12:1611 -1618.The Bioperl toolkit: Perl modules for the life sciences

17 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Background – 2: How Much BioPerl Usage Is There?

18 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Examples Usage From Non- Bioinformatics HighWire Journals 123 articles, latest = Jan 2009 But Is It Just Bioinformatics Journals?

19 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 From Stajich et al., (2002)

20 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 BioPerl: You Have A Friend In High Places The big deal: BioPerl provides object classes for various types of sequence data and their associated features and annotations.  These objects provide interfaces for analysis using: external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). various types of databases for storage and retrieval of sequences  remote (GenBank, EMBL etc)  local (MySQL, MS Access, FileMaker, flat files, GFF etc.).

21 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 Taming BioPerl Most difficult things about Bioperl is getting started using it.  Limited good documentation Sometimes wrong Often written from developer point of view, not user POV  Many, many modules can make it difficult to find the ideal module Core module list: http://www.bioperl.org/wiki/Category:Core_Moduleshttp://www.bioperl.org/wiki/Category:Core_Modules Many modules you would not use directly; they are used by other modules  Bioperl modules are written in OO style different from “procedural” style → upcoming slide addresses syntax difference. BioPerl is not a collection of complete user-ready programs  it is as a toolkit you can dip into for help when writing your own programs. Its goal is to provide good working solutions to common bioinformatics tasks and to speed your program development.

22 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 Understanding OO Syntax … using BioPerl Examples use Bio::SeqIO; $seqin = Bio::SeqIO->new( '-format' => 'EMBL', -file => 'myfile.dat');

23 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 The Key to Using BioPerl is Understanding What Each Module Does (Its “methods’) Example: What does Bio::SeqIO do?  http://www.bioperl.org/wiki/Special:Search?search =Bio%3A%3ASeqIO&go=Go http://www.bioperl.org/wiki/Special:Search?search =Bio%3A%3ASeqIO&go=Go  → returns this link, among others: http://www.bioperl.org/wiki/Module:Bio::SeqIO http://www.bioperl.org/wiki/Module:Bio::SeqIO

24 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Key BioPerl Links Windows: BioPerl 1.4 is now installed as part of Perl 5.8.x (what you downloaded from ActivePerl) Not part of Mac Perl  But you can install the Mac version of ActivePerlinstall BioPerl home: http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Getting_Started  Lots of examples

25 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 Example #1: SequenceConversion2.pl (derived from Tisdall)

26 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 Example 2: RemoteBlast1.pl http://bio.perl.org/Core/Latest/bioscripts.html

27 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 27 Other, Non-BioPerl Modules

28 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 28

29 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 29 In Closing: Suggestions Modify the programs provided here  Baby steps… Save often Keep lots of prior versions so you can recover from your mistakes SU provides lots of documentation → use it! Get a quick reference card if you value your neurons Google is invaluable


Download ppt "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,"

Similar presentations


Ads by Google