Download presentation
Presentation is loading. Please wait.
1
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center © 2008 The Board of Trustees of The Leland Stanford Junior University
2
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 Prep Log into WebEx session (stanford.webex.com/Meetings) Please download all class materials for 2 nd class from FAQ at http://lane.stanford.edu/howto/index.html?id=_3824 in a directory http://lane.stanford.edu/howto/index.html?id=_3824 Open a command window and cd to that directory Start Open Perl IDE or Mac equivalent
3
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 Reminder: Cautions All examples pertain to MS Office 2003 From MS Office 2007, save in 2003 format to use Perl code described here. All contents pertain to Perl 5.x, not 6.x
4
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Session #2 Focus 1. Understanding key Perl language elements Scrutinizing several variant programs 2. Altering file contents from text files And remember: Ask QUESTIONS
5
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 Recap from Session 1
6
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Recap Questions from last session? → Stomp the teacher!
7
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Reviewing Simple1.pl Understanding what each element does #!C:\Perl\bin # --------------------------------------------------------------------------- # Simple1 # --------------------------------------------------------------------------- use strict; use warnings; # --------------------------- sub Multiply { my $f1 = shift; my $f2 = shift; return ($f1 * $f2); } # --------------------------- # main print "Let's test Perl \n"; my $TempVar = 0; my @InputNumbers = @ARGV; print "The two numbers are: $InputNumbers[0] and $InputNumbers[1] \n"; my $Result = Multiply($InputNumbers[0],$InputNumbers[1]); print "Here's the value of both numbers multiplied: $Result \n"; print "I'm done! \n";
8
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Simple2.pl: Introducing New Language Elements → let’s look at it using Open Perl IDE and XXX
9
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 A Final Example: Biologically Useful Perl Program What it does: 1. Reads input from an Excel worksheet containing public identifiers for DNA sequences associated with genes 2. Uses Entrez Utilities provided by NCBI to retrieve: UniGene cluster ID UniGene Gene symbol NCBI Gene ID 3. Writes the result into another Excel worksheet Features a mix of procedural and object programmingobject programming Relevant links: http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene Entrez Utilities Entrez Utilities
10
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 What Excel3.pl does:
11
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Let’s Run Excel3.pl Type “perl -f Excel3.pl” in the directory where you installed the demonstration programs
12
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous
13
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 Moving On: Altering file contents
14
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Converting Data Stored in Flatfiles Input: ConvertOuput.csv = renamed file generated by Excel3.pl, converted to csv format Let’s look and run Convert1.pl →Convert5.pl
15
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Convert1.pl Structure of program Run program Exercise: what is chomp?chomp Understanding file handlesfile handles What is $_ ?$_ Create an error: uncomment line 22 and run Introducing the escape character: “\”
16
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 Convert2.pl: Like Convert1.pl, but Prints Only First Item Using arrays to process contents of a line Introducing splitsplit Changing directories Useful to segregate data files Need to change the path to make this work in your environment Note difference between Mac and Windows syntax for path names
17
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Convert3.pl: Like Convert2.pl, but Prints Changed Order of Columns Run program Q: how would you avoid printing the title line in the input file?
18
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Convert4.pl: Like Convert3.pl, but Removes “.” in Cluster IDs Run program Introducing the match and substitute operator:match and substitute Matching: ‘/something/’ Substituting: ‘s/something1/something2/’ Used in regular expressions for text matching (more later) Introducing the tab operator: “\t”
19
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 Convert5.pl: Like Convert3.pl, but with Smarts + Prints More Elements Run program Introducing “regular expressions”regular expressions Q: how would you modify this code to print only when a “Gene: Gene Symbol” was found → tip: use matching operator: If (not($var =~ /something/)) { do something } → Try doing it: 10 min
20
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 More on Regular Expressions Very powerful i.e., flexible, fast Complicated topic Can require lots of trial and error to get it right Quick reference card essential Best comprehensive resource Covers more than Perl Friedl, 2006
21
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous
22
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 Part 2: Practical examples of programs that alter file contents using regular expressions
23
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 Regular Expressions: More Examples The example we’ll use: Extracting clone IDs for CDH5 by… 1. Importing SOURCE results directly into ExcelSOURCE 2. Parsing the.csv version of that file (CDH5Clones.csv)
24
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Processing EST IDs from SOURCE Input: CDH5Clones.csv or CDH5Clones.xls
25
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 Clone1.pl: Filtering of Results What it does: Reads.csv file of SOURCE results Finds all clones from PLACE library Returns list in single column form Run the program Why the error?
26
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 Clone2.pl: Numerical Filtering of Results Problem: Suppose you only want clones with IDs >= 7002000 because you already have clones with ID<7002000? Solution: Check numerical value of clone ID and decide whether to retain it or not. → Run program!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.