Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lane Medical Library & Knowledge Management Center Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION.

Similar presentations


Presentation on theme: "Lane Medical Library & Knowledge Management Center Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION."— Presentation transcript:

1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center © 2008 The Board of Trustees of The Leland Stanford Junior University

2 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 The Bioresearch Informationist: At Your Service Yannick Pouliot, PhD, Lane Medical Library & Knowledge Management Center Bioresearch Informationist ≈ computational biologist in residence Lane Library service Closely coordinated with CMGMCMGM Role: Support laboratory researchers regarding biocomputational resources and their use …especially postdocs Contact: lanebioresearch@stanford.edulanebioresearch@stanford.edu

3 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 Class Requirements You must  …have wireless access  …have the admin password to your machine (or the ability to install software on it)

4 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Please Log Into WebEx Go to workshop description to log into Webex (under Resources) Password = ‘lanelib’

5 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 To Do Please download all class materials from http://lane.stanford.edu/howto/index.html?id=_3098 into C:\course

6 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Class Goals Understanding enough Perl for: Creating, writing and reading Excel files Reformatting data files for input to an analysis program Writing and reading from a database such as MS Access or other locally installed relational database, as well as from databases available on the Internet Remember: Ask LOTS OF QUESTIONS … and on a procedural note, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …

7 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Contents Session 1 Installing what you need to write and run Perl programs Understanding simple Perl programs Intro to programming concepts Where to get help with Perl Session 2 Delving into Perl language elements  more programs; understanding a “real” program  Regular expressions  Interacting with MS Excel, Access database Session 3 Understanding “Object Oriented” programming – enough to be dangerous… An example of OO programming: BioPerl

8 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Some Cautions All examples pertain to MS Office 2003  Examples still work in MS Office 2007 when imported  However, Perl modules used here do not work with MS Office 2007-formatted documents All examples pertain to Perl 5.x, not 6.x  V.5 and 6 are NOT compatible  V.5 is far more common, so not much of an issue Your mileage may vary if you are using Windows Vista  My recommendation: Switch back to XP

9 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 So Why Perl? Perl = Practical Extraction and Reporting Language Perl Free Very widely used  Especially in biology community Very flexible and portable Not the only language of this type…  E.g., Python Not the absolute easiest  … but pretty easy Not suited for everything  E.g., for ultra-fast mathematically-oriented code, C is still best

10 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 Today’s session: - Installing and understanding what is required to run Perl - Understanding the basics of a Perl program

11 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Part 1: Installation

12 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Components to Install & Configure 1. Perl itself  More accurately, the Perl interpreter  We’ll use ActiveState Perl 5.10x (ActivePerl) http://downloads.activestate.com/ActivePerl/Windows/5. 10/ActivePerl-5.10.0.1004-MSWin32-x86-287188.msi http://downloads.activestate.com/ActivePerl/Windows/5. 10/ActivePerl-5.10.0.1004-MSWin32-x86-287188.msi Additional Perl modules Module = extra functions not part of the interpreter Described at Comprehensive Perl Archive Network (CPAN)CPAN 2. Open Perl IDE  IDE = integrated development environment: Editor  to write/edit your program Debugger  to find bugs A compiler/interpreter  to run your program from within the IDE  sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440 sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440 3. Configuring the ODBC manager (next week)  Part of Windows  Allows different programs to interact with databases on your machine or anywhere on the Web via single “doorway”

13 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 So, what is an Interpreter? An interpreter is a program that… 1. Translates a human-understandable instruction into the computer’s language 2. Executes it 3. Repeats the cycle until no instructions remain → “compiled” and executed one instruction at a time Perl is usually used in interpreted mode  Instructions read and executed one at a time  Can also be compiled once (= faster)

14 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Installing Perl from ActiveState Installation for Windows – if Mac, you already have Perl! We’ll be installing Perl 5.10x for Windows X86: Go to http://downloads.activestate.com/ActivePerl/ Windows/5.10/ActivePerl-5.10.0.1004- MSWin32-x86-287188.msi http://downloads.activestate.com/ActivePerl/ Windows/5.10/ActivePerl-5.10.0.1004- MSWin32-x86-287188.msi Run the installer Install under c:\Perl

15 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Installing Additional Perl Modules The fountain of all things Perl: CPAN  = Comprehensive Perl Archive Network  http://www.cpan.org/ http://www.cpan.org/ What does using a module inside a Perl program look like?look Why modules?  If you find yourself struggling with a problem, chances are someone has already dealt with it, and you can use their code for free! Downloading & installing modules: The Perl Package Manager (PPM)PPM Perl is in constant evolution  Different modules become part of the standard Perl distribution  What modules are in MY Perl? What

16 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 The PPM Module: Installing Perl Modules the Easy Way Two ways to install Perl modules: 1. Hard way: Perl modules can downloaded and installed manually from e.g., CPAN 2. Easy way: They can also be installed via the Perl Package Manager: PPM What’s the difference? 1. There are bits of code that need to be moved into various directories 2. Modules often have dependencies on other modules → more installing

17 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Perl Modules We’ll Be Using NameFunction IncludedFile::Copymanipulating files IncludedFile::Findmanipulating files IncludedFile::Pathmanipulating files You do it!File::RenameManipulating files IncludedIO::Fileaccessing the contents of files IncludedSpreadsheet::WriteExcelwriting into an MS Excel spreadsheet IncludedSpreadsheet::ParseExcelparsing an MS Excel spreadsheet IncludedSpreadsheet::BasicReadreading the contents of an MS Excel spreadsheet IncludedWin32::OLEprovides easy access to Windows (e.g., launching Excel) IncludedURIaccessing URLs IncludedLWP::Simpleinteracting with a Web site via http IncludedArray::Uniquereturns unique elements of an array IncludedList::Uniqreturns unique elements of a list IncludedSwitchswitch function ("multiple if-else-then")

18 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

19 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19 Installing an environment to run and edit Perl: Integrated Development Environment (IDE)

20 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 20 Why an IDE? IDEs make writing code much easier/faster because you can… Edit  to write/edit your program Debug  to find bugs Run your program from within the IDE IDEs provide special facilities to facilitate writing & debugging E.g., automatic code highlighting, easily seeing the value of variables We’ll use Open Perl IDE  Free, open source, but Win only (sorry) http://open-perl-ide.sourceforge.net/ For our Mac friends: AffrusAffrus  Not free, but reasonably inexpensive  Evaluation version available

21 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21 Installing Open Perl IDE 1. Go to http://open-perl-ide.sourceforge.net/ and download the code the main file and the patch.http://open-perl-ide.sourceforge.net/ 2. Create folder Program Files/OpenPerlIDE 3. Unzip into Program Files/OpenPerlIDE 4. Update the Path variable under System Properties→Advanced→Environment Variables→System Variables → this makes it possible to run the Open Perl IDE program from anywhere on your machine…

22 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22 BREAK

23 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 23 Part 2: What does it all do?

24 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24 Simple1.pl: Your First Perl Program 1. Start Open Perl IDE 2. Load Simple1.pl (File Open…) 3. Run Simple1.pl Simple1.pl demonstrates: 1. OS directive 2. Modules 3. Main section 4. Variable declaration 5. Reserved variables 6. Variable types: arrays 7. Subroutines 8. Running from command line using input parameters

25 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25 A Second Example Program: Simple2.pl Understanding data (= variable) types: http://en.wikipedia.org/wiki/Perl#Data_types … and more generally, understanding the lingo

26 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26 Exploring Perl’s Major Language Elements *** Norman Matloff’s introduction to Perl: http://heather.cs.ucdavis.edu/~matloff/Perl/PerlIntro.pdf http://heather.cs.ucdavis.edu/~matloff/Perl/PerlIntro.pdf Perl language referencereference ActivePerl documentation Stuck? Google is incredible for programming problems… Also handy: LaneConnex search engine → search with “Perl”

27 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 27 Key Books & Resources *** Learning by example: Perl CookbookPerl Cookbook Learning Perl Perl Quick Reference Guide My favorite: Perl Quick ReferencePerl Quick Reference

28 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 28 The Next Step: Programming Tips PLAN your program  Write down how you intend to process the data using more-or-less plain language (“pseudo-code”) Goal: ensure that it really does make sense  Hacking doesn’t really pay… Have documentation handy  ActivePerl documentation (searchable)documentation  Perl language referencereference → eBooks: help served on a silver plattereBooks  Lane FAQsFAQ When you’re stuck: Search the Web  Google can answer almost any programming question … though quality documentation is still best

29 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 29 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

30 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 30 Toying with Excel3.pl, a “real” program

31 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 31 Excel3.pl: A “Real” Program What it does: 1. Reads input from an Excel worksheet containing public identifiers for DNA sequences associated with genes 2. Uses Entrez Utilities provided by NCBI to retrieve: UniGene cluster ID UniGene Gene symbol NCBI Gene ID 3. Writes the result into another Excel worksheet Features a mix of procedural and object programming → Session 3 of workshopobject programming Relevant links:  http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=uni gene http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=uni gene  Entrez Utilities Entrez Utilities

32 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 32 What Excel3.pl Does

33 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 33 Assignments Look at Simple2.pl  Modify it, break it  Come up with a modification, e.g., divide instead of multiply  Write down at least one question so we can talk about it next week

34 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 34

35 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 35 eBooks Rule

36 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 36 What Does A Module Look Like?


Download ppt "Lane Medical Library & Knowledge Management Center Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION."

Similar presentations


Ads by Google