Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot,

Slides:



Advertisements
Similar presentations
The essentials managers need to know about Excel
Advertisements

Tutorial 8: Developing an Excel Application
Tutorial 12: Enhancing Excel with Visual Basic for Applications
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
AP Computer Science. Google Interview Question You are given 8 identical looking balls. One of them is heavier than the rest of the 7 (all the others.
Lane Medical Library & Knowledge Management Center How to Write a Program Yannick Pouliot, PhD Bioresearch Informationist
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist.
Lane Medical Library & Knowledge Management Center Essential UNIX Skills for Biologists Yannick Pouliot, PhD Bioresearch Informationist.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Feb 12 th 2008 Yannick Pouliot,
1ex.1 Perl Programming for Biology Exercise 1 The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman.
1 Introduction to OBIEE: Learning to Access, Navigate, and Find Data in the SWIFT Data Warehouse Lesson 8: Printing and Exporting an OBIEE Analysis This.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists A bold experiment into the unknown… PART 1:
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Document Processing Ways to centralize and streamline your Endangered Species Act document processing procedures.
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Ogden Air Logistics Center. Purpose of Excel2FV Many agencies produce point lists of different data (target lists, force locations, etc.) in either Excel.
Recitation 1 Programming for Engineers in Python.
How to Create a Book Purchase Request using Books in Print?
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Game Programming © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line L.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Tutorial 14 Working with Forms and Regular Expressions.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Python File Handling. In all the programs you have made so far when program is closed all the data is lost, but what if you want to keep the data to use.
1 ARRA Recipient Reporting FederalReporting.gov Briefing In-Bound Recipient Reporting April Reporting Cycle Enhancements March 29, 2010.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Lane Medical Library & Knowledge Management Center Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION.
Lecture 6: Computer Languages. Programming Environments (IDE) COS120 Software Development Using C++ AUBG, COS dept.
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Chapter 17 Creating a Database.
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 2 – Gaining Proficiency: The Web and Business.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Introduction to Excel Editing Your Workbook.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Introduction to Perl Yupu Liang cbio at MSKCC
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
Grade Quick Training Level I Please do not log on.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
CS101: Introduction to Computer Science Slides adapted from Sedgewick and Wayne Copyright © Your First Java.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists, Second Edition Part 1: 9/11/2007 Yannick Pouliot,
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Bioinformatics for biologists
Forms Manager. What is Forms Manager? Forms Manager is a completely new online form creation and form data management tool.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. A Concise Introduction to MATLAB ® William J. Palm III.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
{ Analyzer Tutorial By You will be able to find the download link of the latest version here.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
NXT File System Just like we’re able to store multiple programs and sound files to the NXT, we can store text files that contain information we specify.
Python: File Directories What is a directory? A hierarchical file system that contains folders and files. Directory (root folder) Sub-directory (folder.
Chapter 10 Using Macros, Controls and Visual Basic for Applications (VBA) with Excel Microsoft Excel 2013.
Linux Administration Working with the BASH Shell.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
COMPREHENSIVE Excel Tutorial 12 Expanding Excel with Visual Basic for Applications.
Introduction to OBIEE:
Central Document Library Quick Reference User Guide View User Guide
Instructor: Prasun Dewan (FB 150,
Topics Introduction to File Input and Output
Topics Introduction to File Input and Output
Presentation transcript:

Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center © 2008 The Board of Trustees of The Leland Stanford Junior University

Lane Medical Library & Knowledge Management Center 2 Prep Log into WebEx session (stanford.webex.com/Meetings) Please download all class materials for 2 nd class from FAQ at in a directory Open a command window and cd to that directory Start Open Perl IDE or Mac equivalent

Lane Medical Library & Knowledge Management Center 3 Reminder: Cautions All examples pertain to MS Office 2003  From MS Office 2007, save in 2003 format to use Perl code described here. All contents pertain to Perl 5.x, not 6.x

Lane Medical Library & Knowledge Management Center 4 Session #2 Focus 1. Understanding key Perl language elements Scrutinizing several variant programs 2. Altering file contents from text files And remember: Ask QUESTIONS

Lane Medical Library & Knowledge Management Center 5 Recap from Session 1

Lane Medical Library & Knowledge Management Center 6 Recap Questions from last session? → Stomp the teacher!

Lane Medical Library & Knowledge Management Center 7 Reviewing Simple1.pl Understanding what each element does #!C:\Perl\bin # # Simple1 # use strict; use warnings; # sub Multiply { my $f1 = shift; my $f2 = shift; return ($f1 * $f2); } # # main print "Let's test Perl \n"; my $TempVar = 0; print "The two numbers are: $InputNumbers[0] and $InputNumbers[1] \n"; my $Result = Multiply($InputNumbers[0],$InputNumbers[1]); print "Here's the value of both numbers multiplied: $Result \n"; print "I'm done! \n";

Lane Medical Library & Knowledge Management Center 8 Simple2.pl: Introducing New Language Elements → let’s look at it using Open Perl IDE and XXX

Lane Medical Library & Knowledge Management Center 9 A Final Example: Biologically Useful Perl Program What it does: 1. Reads input from an Excel worksheet containing public identifiers for DNA sequences associated with genes 2. Uses Entrez Utilities provided by NCBI to retrieve: UniGene cluster ID UniGene Gene symbol NCBI Gene ID 3. Writes the result into another Excel worksheet Features a mix of procedural and object programmingobject programming Relevant links:   Entrez Utilities Entrez Utilities

Lane Medical Library & Knowledge Management Center 10 What Excel3.pl does:

Lane Medical Library & Knowledge Management Center 11 Let’s Run Excel3.pl Type “perl -f Excel3.pl” in the directory where you installed the demonstration programs

Lane Medical Library & Knowledge Management Center 12 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

Lane Medical Library & Knowledge Management Center 13 Moving On: Altering file contents

Lane Medical Library & Knowledge Management Center 14 Converting Data Stored in Flatfiles Input: ConvertOuput.csv  = renamed file generated by Excel3.pl, converted to csv format Let’s look and run Convert1.pl →Convert5.pl

Lane Medical Library & Knowledge Management Center 15 Convert1.pl Structure of program Run program Exercise: what is chomp?chomp Understanding file handlesfile handles What is $_ ?$_ Create an error: uncomment line 22 and run Introducing the escape character: “\”

Lane Medical Library & Knowledge Management Center 16 Convert2.pl: Like Convert1.pl, but Prints Only First Item Using arrays to process contents of a line  Introducing splitsplit Changing directories  Useful to segregate data files  Need to change the path to make this work in your environment Note difference between Mac and Windows syntax for path names

Lane Medical Library & Knowledge Management Center 17 Convert3.pl: Like Convert2.pl, but Prints Changed Order of Columns Run program Q: how would you avoid printing the title line in the input file?

Lane Medical Library & Knowledge Management Center 18 Convert4.pl: Like Convert3.pl, but Removes “.” in Cluster IDs Run program  Introducing the match and substitute operator:match and substitute Matching: ‘/something/’ Substituting: ‘s/something1/something2/’ Used in regular expressions for text matching (more later)  Introducing the tab operator: “\t”

Lane Medical Library & Knowledge Management Center 19 Convert5.pl: Like Convert3.pl, but with Smarts + Prints More Elements Run program Introducing “regular expressions”regular expressions  Q: how would you modify this code to print only when a “Gene: Gene Symbol” was found → tip: use matching operator: If (not($var =~ /something/)) { do something } → Try doing it: 10 min

Lane Medical Library & Knowledge Management Center 20 More on Regular Expressions Very powerful  i.e., flexible, fast Complicated topic  Can require lots of trial and error to get it right  Quick reference card essential  Best comprehensive resource Covers more than Perl Friedl, 2006

Lane Medical Library & Knowledge Management Center 21 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

Lane Medical Library & Knowledge Management Center 22 Part 2: Practical examples of programs that alter file contents using regular expressions

Lane Medical Library & Knowledge Management Center 23 Regular Expressions: More Examples The example we’ll use: Extracting clone IDs for CDH5 by… 1. Importing SOURCE results directly into ExcelSOURCE 2. Parsing the.csv version of that file (CDH5Clones.csv)

Lane Medical Library & Knowledge Management Center 24 Processing EST IDs from SOURCE Input: CDH5Clones.csv or CDH5Clones.xls

Lane Medical Library & Knowledge Management Center 25 Clone1.pl: Filtering of Results What it does:  Reads.csv file of SOURCE results  Finds all clones from PLACE library  Returns list in single column form Run the program Why the error?

Lane Medical Library & Knowledge Management Center 26 Clone2.pl: Numerical Filtering of Results Problem: Suppose you only want clones with IDs >= because you already have clones with ID< ? Solution: Check numerical value of clone ID and decide whether to retain it or not. → Run program!