Download presentation
Presentation is loading. Please wait.
Published byRudolf Carson Modified over 9 years ago
1
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center
2
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 To Dos Close all programs other than IE on your laptop Log into virtual room YP: log into Safari
3
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 To Do - 2 Please download all class materials from http://lane.stanford.edu/howto/index.html?id=_2593
4
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Class Focus for Session #2 1. Converting file contents 2. Introducing BioPerl 3. Perl and relational databases And remember: Ask LOTS OF QUESTIONS
5
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 Cautions - Reminder All examples pertain to MS Office 2003 Unclear what is to be expected for MS Office 2007 All contents pertain to Perl 5.x, not 6.x V.5 and 6 are NOT compatible V.5 is far far more common, so not much of an issue
6
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Questions from last session?
7
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Part 1: Converting file contents
8
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Converting Data Stored in Flatfiles Input: ExampleOutputExcel3.csv File generated last week by Excel3.pl Let’s look and run Convert1.pl →Convert5.pl
9
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 Part 2: BioPerl
10
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 BioPerl: Overview BioPerl = >1,000 modules divided into 7 packages Not all in 1.4 1.4 = stable release
11
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Other, Non-BioPerl Modules
12
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 BioPerl: You Have A Friend In High Places The big deal: BioPerl provides “objects” for various types of sequence data and their associated features and annotations. These objects provide interfaces for analysis of these sequences with a wide variety of external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). various types of databases for storage and retrieval of sequences remote (GenBank, EMBL etc) local (MySQL, Flat_databases flat files, GFF etc.).
13
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 So What Is This Object Business?
14
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 What A Biology-Related Program Looks Like When Coded According To The Object Paradigm t: Protein t: DNA t: RNA t: Gene t: Organism t: Species t: LivingObject t: Sequence
15
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Objects Inherit From A Class Or Prior Object Object 1 (ancestor) Class = prototype for all objects of this type Derive an object from an existing object Create an object (“new”) Object2 SequenceRNAProtein DNA
16
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 An example: Class inheritance for shape concepts
17
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Key BioPerl Links BioPerl 1.4 installed as part of Perl 5.8.8.822 (what you downloaded) BioPerl home: http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Main_Page http://www.bioperl.org/wiki/Getting_Started Lots of examples
18
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 BioPerl Example: Querying GenBank To Retrieve Sequence Properties Seq7.pl Seq8.pl Seq9.pl → after exercise (next slide) Seq11.pl → after exercise (next slide) Related docs: GenBank search: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/DB/GenBank.htmlhttp://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/DB/GenBank.html SeqIO: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/SeqIO/genbank.htmlSeqIOhttp://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/SeqIO/genbank.htmlSeqIO See also http://www.bioperl.org/wiki/HOWTO:SeqIOhttp://www.bioperl.org/wiki/HOWTO:SeqIO And most importantly: http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html
19
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 Exercise: Print An Additional Sequence Feature Add an additional sequence feature to Seq8.pl What to print: see Methods for Seq object at http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html http://doc.bioperl.org/releases/bioperl- current/bioperl-live/Bio/Seq.html
20
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 Quiz Questions based on Seq11.pl use warnings; use strict; use Bio::DB::GenBank; # --------------------------------------------------------------------------- # main $| = 1; # Force unbuffered STDOUT and STDIN. my $gb = Bio::DB::GenBank->new( -format => 'GenBank', -seq_start => 0, -seq_end => 1000, -strand => 1, -complexity => 0); # put in some restrictions as to what is retrieved and stored into GenBank object... # get a stream via a query string my $query = Bio::DB::Query::GenBank->new (-query =>'Homo sapiens[Organism] AND M-cadherin', -db => 'nucleotide'); my $seqio = $gb->get_Stream_by_query($query); my $i=0; # count total number of sequences while (my $seq = $seqio->next_seq) { print "seq id =", $seq->id, "\t version = ", $seq->version, "\t seq acc number = ", $seq->accession_number, "\t seq length = ", $seq->length,"\n"; $i++; } print "retrieved $i sequences from GenBank \n"; # --------------------------------------------------------------------------
21
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 More Quizzing: Seq10.pl Run Seq10.pl Why the warning messages? Specifying strands 1 for plus 2 for minus Complexity: A GenBank nucleotide entry is often a part of a larger biological blob that contains other GI numbers (e.g., translated protein) Complexity regulates the display: 0 - get the whole blob 1 - get the bioseq for gi of interest (default in Entrez) 2 - get the minimal bioseq-set containing the gi of interest 3 - get the minimal nuc-prot containing the gi of interest 4 - get the minimal pub-set containing the gi of interest
22
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 Some Cautions Be careful when querying databases → have an idea of how many sequences you may be downloading/processing Know that Perl might eat-up all of your CPU cycles
23
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 Part 3: Interacting With A Database
24
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Preliminaries: Updating ODBC Manager First we need to add directions to “GenesToEvaluate” DB to ODBC Manager More at http://lane.stanford.edu/howto/index.html?id=_1751 http://lane.stanford.edu/howto/index.html?id=_1751
25
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 Example Perl Programs That Interact With A Database Ancillary files: ExampleOutputExcel3.csv needed as input to Access1.pl Access2.pl and Access3.pl don’t need this file All programs rely on GenesToEvaluate.mdb (Access DB)
26
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 In Closing: Suggestions Modify the programs provided here Baby steps… Save often Keep lots of prior versions so you can recover from your mistakes SU provides lots of documentation → use it! Google is invaluable
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.