Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,

Similar presentations


Presentation on theme: "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,"— Presentation transcript:

1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center

2 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 To Dos Close all programs other than IE on your laptop Log into virtual room YP: log into Safari

3 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 To Do - 2 Please download all class materials from http://lane.stanford.edu/howto/index.html?id=_2593

4 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Class Focus for Session #2 1. Converting file contents 2. Introducing BioPerl 3. Perl and relational databases And remember: Ask LOTS OF QUESTIONS

5 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 Cautions - Reminder All examples pertain to MS Office 2003  Unclear what is to be expected for MS Office 2007 All contents pertain to Perl 5.x, not 6.x  V.5 and 6 are NOT compatible  V.5 is far far more common, so not much of an issue

6 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Questions from last session?

7 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Part 1: Converting file contents

8 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Converting Data Stored in Flatfiles Input: ExampleOutputExcel3.csv  File generated last week by Excel3.pl Let’s look and run Convert1.pl →Convert5.pl

9 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 Part 2: BioPerl

10 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 BioPerl: Overview BioPerl = >1,000 modules divided into 7 packages  Not all in 1.4  1.4 = stable release

11 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Other, Non-BioPerl Modules

12 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 BioPerl: You Have A Friend In High Places The big deal: BioPerl provides “objects” for various types of sequence data and their associated features and annotations.  These objects provide interfaces for analysis of these sequences with a wide variety of external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). various types of databases for storage and retrieval of sequences  remote (GenBank, EMBL etc)  local (MySQL, Flat_databases flat files, GFF etc.).

13 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 So What Is This Object Business?

14 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 What A Biology-Related Program Looks Like When Coded According To The Object Paradigm t: Protein t: DNA t: RNA t: Gene t: Organism t: Species t: LivingObject t: Sequence

15 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Objects Inherit From A Class Or Prior Object Object 1 (ancestor) Class = prototype for all objects of this type Derive an object from an existing object Create an object (“new”) Object2 SequenceRNAProtein DNA

16 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 An example: Class inheritance for shape concepts

17 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Key BioPerl Links BioPerl 1.4 installed as part of Perl 5.8.8.822 (what you downloaded) BioPerl home: http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Getting_Started  Lots of examples

18 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 BioPerl Example: Querying GenBank To Retrieve Sequence Properties Seq7.pl Seq8.pl Seq9.pl → after exercise (next slide) Seq11.pl → after exercise (next slide) Related docs:  GenBank search: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/DB/GenBank.htmlhttp://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/DB/GenBank.html  SeqIO: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/SeqIO/genbank.htmlSeqIOhttp://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/SeqIO/genbank.htmlSeqIO See also http://www.bioperl.org/wiki/HOWTO:SeqIOhttp://www.bioperl.org/wiki/HOWTO:SeqIO And most importantly: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html

19 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 Exercise: Print An Additional Sequence Feature Add an additional sequence feature to Seq8.pl  What to print: see Methods for Seq object at http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html

20 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 Quiz Questions based on Seq11.pl use warnings; use strict; use Bio::DB::GenBank; # --------------------------------------------------------------------------- # main $| = 1; # Force unbuffered STDOUT and STDIN. my $gb = Bio::DB::GenBank->new( -format => 'GenBank', -seq_start => 0, -seq_end => 1000, -strand => 1, -complexity => 0); # put in some restrictions as to what is retrieved and stored into GenBank object... # get a stream via a query string my $query = Bio::DB::Query::GenBank->new (-query =>'Homo sapiens[Organism] AND M-cadherin', -db => 'nucleotide'); my $seqio = $gb->get_Stream_by_query($query); my $i=0; # count total number of sequences while (my $seq = $seqio->next_seq) { print "seq id =", $seq->id, "\t version = ", $seq->version, "\t seq acc number = ", $seq->accession_number, "\t seq length = ", $seq->length,"\n"; $i++; } print "retrieved $i sequences from GenBank \n"; # --------------------------------------------------------------------------

21 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 More Quizzing: Seq10.pl Run Seq10.pl  Why the warning messages? Specifying strands  1 for plus  2 for minus Complexity: A GenBank nucleotide entry is often a part of a larger biological blob that contains other GI numbers (e.g., translated protein) Complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest

22 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 Some Cautions Be careful when querying databases  → have an idea of how many sequences you may be downloading/processing  Know that Perl might eat-up all of your CPU cycles

23 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 Part 3: Interacting With A Database

24 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Preliminaries: Updating ODBC Manager First we need to add directions to “GenesToEvaluate” DB to ODBC Manager  More at http://lane.stanford.edu/howto/index.html?id=_1751 http://lane.stanford.edu/howto/index.html?id=_1751

25 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 Example Perl Programs That Interact With A Database Ancillary files: ExampleOutputExcel3.csv needed as input to Access1.pl  Access2.pl and Access3.pl don’t need this file All programs rely on GenesToEvaluate.mdb (Access DB)

26 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 In Closing: Suggestions Modify the programs provided here  Baby steps… Save often Keep lots of prior versions so you can recover from your mistakes  SU provides lots of documentation → use it!  Google is invaluable


Download ppt "Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,"

Similar presentations


Ads by Google