Introduction to Perl Matt Hudson. Review blastall: Do a blast search HMMER hmmpfam: search against HMM database hmmsearch: search proteins with HMM hmmbuild:

Slides:



Advertisements
Similar presentations
Second edition Your UNIX: The Ultimate Guide Das © 2006 The McGraw-Hill Companies, Inc. All rights reserved. UNIX – The Master Manipulator perl Perl is.
Advertisements

Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Introduction to Perl. How to run perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Your program/script.
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
2010/11 : [1]Building Web Applications using MySQL and PHP (W1)PHP Recap.
Introduction to Shell Script Programming
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
1 Operating Systems Lecture 3 Shell Scripts. 2 Shell Programming 1.Shell scripts must be marked as executable: chmod a+x myScript 2. Use # to start a.
1 Operating Systems Lecture 3 Shell Scripts. 2 Brief review of unix1.txt n Glob Construct (metacharacters) and other special characters F ?, *, [] F Ex.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
Introduction to Python
1 An Introduction to Perl Part 1 CSC8304 – Computing Environments for Bioinformatics - Lecture 7.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
An Introduction to Unix Shell Scripting
CS161 Topic #21 CS161 Introduction to Computer Science Topic #2.
Instructor: Chris Trenkov Hands-on Course Python for Absolute Beginners (Spring 2015) Class #002 (January 17, 2015)
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Introduction to programming in Perl. What is Perl ? Perl : Practical Extraction and Report Language by Larry Wall in 1987 Text-processing language Glue.
Input, Output, and Processing
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
This slide deck is for LPI Academy instructors to use for lectures for LPI Academy courses. ©Copyright Network Development Group Module 9 Basic Scripting.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Introduction to Perl Yupu Liang cbio at MSKCC
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introduction to Perl “Practical Extraction and Report Language” “Pathologically Eclectic Rubbish Lister”
Writing Scripts Hadi Otrok COEN 346.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
1 Printing in Python Every program needs to do some output This is usually to the screen (shell window) Later we’ll see graphics windows and external files.
PROGRAMMING IN PYTHON LETS LEARN SOME CODE TOGETHER!
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Linux Administration Working with the BASH Shell.
More about comments Review Single Line Comments The # sign is for comments. A comment is a line of text that Python won’t try to run as code. Its just.
Variables, Expressions, and IO
Introduction to Scripting
Statement atoms The 'atomic' components of a statement are: delimiters (indents, semicolons, etc.); keywords (built into the language); identifiers (names.
Scripts & Functions Scripts and functions are contained in .m-files
Intro to PHP & Variables
Perl for Bioinformatics
Introduction to C++ Programming
Chapter 3: Selection Structures: Making Decisions
Boolean Expressions to Make Comparisons
Chapter 3: Selection Structures: Making Decisions
Unit 3: Variables in Java
Chapter 1 c++ structure C++ Input / Output
Presentation transcript:

Introduction to Perl Matt Hudson

Review blastall: Do a blast search HMMER hmmpfam: search against HMM database hmmsearch: search proteins with HMM hmmbuild: make an HMM from a protein alignment, as made by clustalw clustalw: align protein or DNA sequences fasta34: search a sequence using an older, slower, but sometimes more flexible algorithm

grep – my favorite Allows you to pick out lines of a text file that match a query, count them, and retrieve lines around the match. grep ‘ Query= ’ myblast.txt What sequences did I BLAST? grep –c ‘ > ’ testprotein.txt How many sequences are in this file? grep –A 10 ‘ > ’ testprotein.txt Give me the first ten lines of each protein

ftp commands ftp ftp.ncbi.nih.gov go to the NCBI siteftp.ncbi.nih.gov open open a connection ls same as UNIX cd same as UNIX get get me this file mget get more than one file put put a file on the server lcd local cd ! local shell close close connection bye exit the ftp program

OK. You are now up and running with UNIX, and can use it to do some fairly sophisticated bioinformatics. We’re going to concentrate on Perl scripting from now on.

UNIX books You might find that your UNIX skills need some refreshing from time to time. I recommend having one of these books around in case you need some help using the command line: For students who haven’t done much UNIX: Sams Teach Yourself Unix in 24 Hours (4th Edition) (Sams Teach Yourself in 24 Hours) (Paperback) by Dave Taylor For more advanced UNIX users: UNIX System V: A Practical Guide (3rd Edition) (Paperback) by Mark G. SobellDave TaylorMark G. Sobell Also, for those of you not so familiar with bioinformatics: Bioinformatics for Dummies (Paperback) by Jean-Michel Claverie, Cedric Notredame, Jean-Michel Claverie, Cedric NotredameJean-Michel ClaverieCedric NotredameJean-Michel ClaverieCedric Notredame

Perl books For some reason, although there are hundreds of Perl books out there, none of them are really that good. Here are some that might be useful, but none are completely recommended. This one I recommend EXCEPT that it uses tools that come with the book that are non-standard: Beginning Perl for Bioinformatics (Paperback) by James Tisdall This I have heard good things about but not used much myself: Beginning Perl, Second Edition (Paperback) by James Lee This is a classic but slow going if you know no programming: Learning Perl, Fourth Edition (Paperback) by Randal L. Schwartz, Tom Phoenix, brian d foy This is better if you have little programming experience, but not a textbook: Perl for Dummies (Fourth Edition) (Paperback) by Paul HoffmanJames TisdallJames LeeRandal L. SchwartzTom Phoenixbrian d foyPaul Hoffman Once you get started Programming Perl, 3 rd edition, by Larry Wall, O’Reilly, 2001

Why use Perl? Interpreted language – quick to program Easy to learn compared to most languages Designed for working with text files Free for all operating systems Most popular language in bioinformatics – many scripts available you can “borrow”, also ready made modules.

Programming In Perl, the program, or script, is just a text file. You write it with ANY text editor (we are using WordPad and/or nano). Run the program Look at the output Correct the errors (debugging) Edit the script and try again.

All programming courses traditionally start with a program that prints “Hello, world!”. So in keeping with that tradition: Note: No line numbers. Each command line ends with a semicolon Remember your program? #!/usr/bin/perl print “ Hello, world\n ” ;

Print All programming languages use “print” to mean “write this to the console” – i.e. the command line. Once opon a time, the console was a typewriter. But now “print” never means print on a printer. print statements are necessary to keep tabs on what your program is doing. You need to tell Perl to put a carriage return at the end of a printed line –Use \n in a text string to signify a newline. –The \ character is called “backslash”. –It is an “escape” – it changes the meaning of the character after it. In this case it changes “n” to “newline”. Other examples are \t (tab) or \$ (= print an actual dollar sign, normally a dollar sign has a special meaning).

Program details Perl programs on UNIX start with a line like: #!/usr/bin/perl Perl ignores anything after a # (this is a command not to Perl, but to the UNIX shell). Elsewhere in the program # is used for comments to explain the code. Lines that are Perl commands end with a semicolon (;).

Run your Perl program #cd scratch #nano helloworld.pl (paste or type text into editor, save, and exit) #perl helloworld.pl Or: #chmod 755 helloworld.pl #./helloworld.pl

Pseudocode Programmers often find it easier to write out the things the program is doing in “normal” language. We call this pseudocode. print “Hello, world\n”; = Output the text “Hello, world” to the terminal, followed by a newline character.

Strings In Perl, strings are very important. They are just a series of any text characters – letters, numbers, > :$%^&*, etc. In the statement print “ Hello, world\n ” ; ---- this is a string----

Numbers, etc The other common type of data is a number. Perl can handle numbers in most common formats, without any complications: E-26 Arithmetic functions: +(add) - (minus) / (divide) * (multiply) ** (exponentiation)

A program using numbers #!/usr/bin/perl print “ 2+2\n ” ; print 3*4, “ \n ” ; print “ 8/2= ”, 8/2, “ \n ” ; Do you get it? Numbers in quotes are part of a string. Numbers outside quotes are numbers, and the computer does the math before printing.

Pseudocode print “ 2+2\n ” ; = Output “2+2”, followed by a newline, to the terminal print 3*4, “ \n ” ; = Evaluate 3 x 4, and print the answer, followed by a newline, to the terminal

Variables Up till now, we’ve been telling the computer exactly what to print. But in order for the program to generate what is printed, we need to use variables. A variable name starts with “ $ ” It can be either a string or a number.

Assigning values In pretty much all programming languages, = means “assign this value to this variable”. The “my” command in Perl initializes the variable. This is optional but highly recommended. So, you assign values to a variable as follows: my $number = 123; my $dna_sequence_string = “ acgt ” ;

A program with variables #!/usr/bin/perl -w #this program uses variables containing numbers my $two = 2; my $three = $two + 1; print “ \$two * \$three = $two * $three = “,($two * $three); print "\n";

Pseudocode my $two = 2; Assign the value 2 to the variable $two

Interpolation When you print the variable, Perl gives the contents rather than the name of the variable. print $number; 9 If you put a variable inside double quotes, Perl interpolates the variable print “ The number is $number\n ” The number is 9 If you use single quotes, no interpolation happens print ‘ The number is $number\n ’ The number is $number\n A more flexible way to do this is to “escape” the $ print “ The value of \$number is $number\n ” ; The value of $number is 9

Variables - summary A variable name starts with a $ It contains a number or a text string Use my to define a variable Use = to assign a value Use \ to stop the variable being interpolated Take care with variable names and with changing the contents of variables

Standard Input To make the program do something, we need to input data. –The angle bracket operator (<>) tells Perl to expect input, by default from the keyboard. –Usually this is assigned to a variable print “ Please type a number: ” ; my $num = ; print “ Your number is $num\n ” ;

Pseudocode my $num = ; Stop the program, and wait until the user types input. Once the user hits the “enter” key, take the input (including the newline character) and put it into the variable $num.

chomp When data is entered from the keyboard, the program waits for you to type the carriage return key. But.. the string which is captured includes a newline (carriage return) at its end You can use the chomp function to remove the newline character: print “ Enter your name: ” ; $name = ; print “ Hello $name, happy to meet you!\n ” ; chomp $name; print “ Hello $name, happy to meet you!\n ” ;

if and True/False All programming works on ones and zeros – true and false. if (1 == 1) { print “ one equals one ” ; } Perl evaluates the expression (1 == 1 ) Note TWO NOT ONE EQUALS SIGNS! The if operator causes the command in curly brackets to be executed ONLY IF the expression is true

if if evaluates some statement in parentheses (must be true or false) Note: conditional block is indented, using tabs. –Perl doesn’t care about indents, but it makes your code more “human readable”

Comparing variables if ($one == $two) {print “ one equals two ” ;} Note there are TWO equals signs in this expression. If you remember, = means “assign this variable this value”. So == actually means “equals”. You can also use > Greater than < Less than >= Greater than or equal to <= Less than or equal to != Not equal to

Pseudocode if ($one == $two) {print “ one equals two ” ;} If the contents of the variable $one are identical to the contents of the variable $two, print “one equals two”

What’s a block? In the case of an “if” statement: If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below. You can also do stuff in a block over and over again using a loop – more later.

die, scum die kills your script safely and prints a message It is often used to prevent you doing something regrettable – e.g. running your script on a file that doesn’t exist, or overwriting an existing file.

Exercising the Perl muscles Now let’s write a script to ask the user their age, and then deliver an insult specific to the age bracket: Over 25 - old fogey Under 15 – callow youth – (insert your own insult here)

Pseudocode output “Enter your age: ” to the terminal Stop the program, and wait until the user types input. Once the user hits the “enter” key, take the input (including the newline character) and put it into the variable $age. Remove newline from $age if present If the value in $age is less than 15, output “You are too young for this kind of work!” followed by a newline, then terminate the program with the text “too young” If the value in $age is more than 25, output “You’re old enough to know better!” and then terminate the program with the text “too old”. If the program is still running (i.e. $age is between 15 and 25), then output “You have much to learn!” followed by a newline.

Conditional Blocks, summary An if test can be used to control multiple lines of commands, as in this example * print “ Enter your age: ” ; $age = ; chomp $age; if ($age < 15) { print “ You are too young for this kind of work!\n ” ; die “ too young ” ; } if ($age > 25) { print “ You ’ re old enough to know better! ” ; die “ too old ” ; } print “ You have much to learn!\n ” ;

Arrays An array can store multiple pieces of data. They are essential for the most useful functions of Perl. They can store data such as: –the lines of a text file (e.g. primer sequences) –a list of numbers (e.g. BLAST e values) Arrays are designated with the = ( “ A ”, “ C ”, “ G ”, “ T ” );

Converting a variable to an array split splits a variable into parts and puts them in an array. my $dnastring = "ACGTGCTA"; = split //, is now (A, C, G, T, G, C, T, = split /T/, is now (ACG, GC, A)

join combines the elements of an array into a single scalar variable (a string) $dnastring = Converting an array to a variable which array spacer (empty here)

Loops A loop repeats a bunch of functions until it is done. The functions are placed in a BLOCK – some code delimited with curly brackets {} Loops are really useful with arrays. The “foreach” loop is probably the most useful of all: foreach my $base { print "$base “ ; }

String comparison (is the text the same?) eq (equal ) ne (not equal ) There are others but beware of them! Comparing strings

Getting part of a string substr takes characters out of a string $letter = substr($dnastring, $position, 1) which string where in the string how many letters to take

Combining strings Strings can be concatenated (joined). Use the dot. operator $seq1= “ ACTG ” ; $seq2= “ GGCTA ” ; $seq3= $seq1. $seq2; print $seq3; ACTGGGCTA

Making Decisions - review The if operator is generally used together with numerical or string comparison operators, inside an (expression). numerical: ==, !=, >, <, ≥, ≤ strings:eq, ne You can make decisions on each member of an array using a loop which puts each part of the array through the test, one at a time

More healthy exercise Write a program that asks the user for a DNA restriction site, and then tells them whether that particular sequence matches the site for the restriction enzyme EcoRI, or Bam HI, or Hind III. Site for EcoR1: GAATTC Bam H1: GGATCC Hind III: AAGCTT