1.1 Perl Programming for Biology G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 David Burstein and Ofir Cohen.

Slides:



Advertisements
Similar presentations
CS0007: Introduction to Computer Programming Console Output, Variables, Literals, and Introduction to Type.
Advertisements

Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
COMP234 Perl Printing Special Quotes File Handling.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
What Data Do We Have? Sections 2.2, 2.5 August 29, 2008.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
Lecturer: Fintan Costello Welcome to Hdip 001 Introduction to Programming.
1.1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman and Dudu.
Introduction to Perl. How to run perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Your program/script.
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 By Eyal Privman and Dudu.
Introduction to Python
1ex.1 Perl Programming for Biology Exercise 1 The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman.
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2009 By Eyal Privman
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
Guide To UNIX Using Linux Third Edition
Computer Science A 1: 3/2. Course plan Introduction to programming Basic concepts of typical programming languages. Tools: compiler, editor, integrated.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
2ex.1 Lists and Arrays. 2ex.2 Comments on exercises Always run your script with “ perl -w ” and take care of all warnings  submitted scripts should not.
COMP 14: Primitive Data and Objects May 24, 2000 Nick Vallidis.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
CSE 131 Computer Science 1 Module 1: (basics of Java)
Introducing Java.
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
CS 106 Introduction to Computer Science I 01 / 25 / 2010 Instructor: Michael Eckmann.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
IST 210: PHP BASICS IST 210: Organization of Data IST210 1.
Programming in Python Part I Dr. Fatma Cemile Serçe Atılım University
Computer Programming for Biologists Oct 30 th – Dec 11 th, 2014 Karsten Hokamp  Fill out.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
The string data type String. String (in general) A string is a sequence of characters enclosed between the double quotes "..." Example: Each character.
POS 406 Java Technology And Beginning Java Code
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Why? – Examples Speaking Computer-ise – How – What – Environment (windows) Basic Instructions – Declare – Conditional – Loop – Input Write a quiz game.
Introduction to Perl Yupu Liang cbio at MSKCC
Intro to PHP IST2101. Review: HTML & Tags 2IST210.
College Board A.P. Computer Science A Topics Program Design - Read and understand a problem's description, purpose, and goals. Procedural Constructs -
CSD 340 (Blum)1 Starting JavaScript Homage to the Homage to the Square.
1 Introduction to Perl CIS*2450 Advanced Programming Techniques.
© 2004 Pearson Addison-Wesley. All rights reserved ComS 207: Programming I Instructor: Alexander Stoytchev
Introduction to Perl “Practical Extraction and Report Language” “Pathologically Eclectic Rubbish Lister”
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Perl COEN 351  Thomas Schwarz, S.J Perl Scripting Language Developed by Larry Wall 1987 to speed up system administration tasks. Design principles.
1.1 Perl Programming for Biology G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2012 Eli Levy Karin and Haim Ashkenazy
CSD 340 (Blum)1 Starting JavaScript Homage to the Homage to the Square.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Basic Variables & Operators Web Programming1. Review: Perl Basics Syntax ► Comments: start with # (ignored by Perl) ► Statements: ends with ; (performed.
2.1 Scalar data - revision numeric e-14 ( = 6.35 × )‏ operators: + (addition) - (subtraction) * (multiplication) / (division)
Python Lesson 1 1. Starter Create the following Excel spreadsheet and complete the calculations using formulae: 2 Add A1 and B1 A2 minus B2 A3 times B3.
1 Data and Expressions Chapter 2 In PowerPoint, click on the speaker icon then the “play” button to hear audio narration.
PHP Form Processing * referenced from
2.1 Lesson 2: Scalar Functions and Arrays “Perl programming is an empirical science!” - Larry Wall.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
CS 106 Introduction to Computer Science I 01 / 24 / 2007 Instructor: Michael Eckmann.
Perl Subroutines User Input Perl on linux Forks and Pipes.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
IST 210: PHP Basics IST 210: Organization of Data IST2101.
Server-Side Application and Data Management IT IS 3105 (Spring 2010)
Web DB Programming: PHP
Introduction to Perl Learning Objectives:
Instructor: Alexander Stoytchev
Presentation transcript:

1.1 Perl Programming for Biology G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 David Burstein and Ofir Cohen

1.2 Why biologists need computers? Collecting and managing data Searching databases Interpreting data Protein function prediction - heidelberg.de/ heidelberg.de/ Gene expression - Browsing genomes -

1.3 Why biologists need to program? (or: why are you here?)

1.4 Why biologists need to program? A real life example Proto-oncogene activation by retroviral insertional mutagenesis c-Myc: a proto-oncogene that is activated due to over- or misexpression. (In w.t. cells c-Myc is a transcription factor expressed mainly during the G 1 phase).

1.5 A real life example Shmulik >tumor1 TAGGAAGACTGCGGTAAGTCGTGATCTGAGCGGTTCCGTTACAGCTGCTA CCCTCGGCGGGGAGAGGGAAGACGCCCTGCACCCAGTGCTG... >tumor157 Run BLAST: and save it to a text file: Score E Sequences producing significant alignments: (bits) Value ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic e-45 ref|NT_ |Mm6_39393_34 Mus musculus chromosome 6 genomic c ref|NT_ |Mm9_39517_34 Mus musculus chromosome 9 genomic c ref|NT_ |Mm8_39502_34 Mus musculus chromosome 8 genomic c ref|NT_ |Mm3_39274_34 Mus musculus chromosome 3 genomic c ref|NT_ |Mm2_39247_34 Mus musculus chromosome 2 genomic c >ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic contig, strain C57BL/6J Length = Score = 186 bits (94), Expect = 1e-45 Identities = 100/102 (98%) Strand = Plus / Plus Query: 1 taggaagactgcggtaagtcgtgatctgagcggttccgttacagctgctaccctcggcgg 60 ||||||||||||||| ||||||||||||||||||||||| |||||||||||||||||||| Sbjct: taggaagactgcggtgagtcgtgatctgagcggttccgtaacagctgctaccctcggcgg

1.6 A Perl script can do it for you Shmulik writes a simple Perl script to parse blast results and find all hits that are in the myc locus, or up to 10kbp from it: Use the "Blast reading" package Open and read file “mice.blast” Iteration – for each blast result: If we hit the genomic sequence “Mm15_39661_34” In the coordinates of the Myc locus (±10kbp) (23,198, ,223,004) Then print this hit (hit number and position in locus)

1.7 A Perl script can do it for you use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format'=>'blast', '-file' =>'mice.blast'); while (my $result = $blast_report->next_result) { print "Checking query ", $result->query_name, "...\n"; my $hit = $result->next_hit(); my $hsp = $hit->next_hsp(); if ($hit->name() =~ m/Mm15_39661_34/ && $hsp->hit->start() > && $hsp->hit->end() name(); print " (at position ", $hsp->hit->start(), ")\n"; } } Shmulik writes a simple Perl script to parse blast results and find all hits that are in the myc locus, or up to 10kbp from it: Use the "Blast reading" packageOpen file “mice.blast” Iterate over all blast results For each blast hit – ask if we hit the genomic sequence “Mm15_39661_34” in the coordinates of the Myc locus 23,198, ,223,004 If so – print hit name and position

1.8 A Perl script can do it for you Checking query tumor1... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor2... Checking query tumor3... Checking query tumor4... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor5... Checking query tumor6... Checking query tumor7... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor8... Checking query tumor9... Checking query tumor10... Checking query tumor11... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor12...

1.9 What is Perl ? Perl was created by Larry Wall. (read his forward to the book “Learning Perl”) Perl = Practical Extraction and Report Language (or: Pathologically Eclectic Rubbish Lister)‏forward to the book “Learning Perl” Perl is an Open Source project Perl is a cross-platform programming language.

1.10 Why Perl ? Perl is an Open Source project Perl is a cross-platform programming language. Perl is a very popular programming language, especially for bioinformatics Perl allows a rapid development cycle Perl is strong in text manipulation Perl can easily handle files and directories Perl can easily run other programs‏

1.11 Perl & biology BioPerl: “An international association of developers of open source Perl tools for bioinformatics, genomics and life science research” Many smaller projects, and millions of little pieces of biological Perl code (which should be used as references – google and find them!)‏

1.12 This course No prior knowledge expected: intended for students with no experience in programming whatsoever. Time consuming: requires more hours than your average seminar… For you: oriented towards programming tasks for molecular biology

1.13 Some formalities … Use the course web page: Presentations will be available on the morning of the class. There will be 5-7 exercises, amounting to 30% of your grade. You get full points if you do the whole exercise, even if some of your answers are wrong, but genuine effort is evident. Exercises are for individual practice. DO NOT submit exercises in pairs or copy exercises from anyone.

1.14 Some formalities … Submit your exercises by to your teacher (either Dudu or Ofir and you will be replied with There will be a final exam on computers. Both learning groups will be taught the same material each week. Presentations are in English, lessons – given in Hebrew.

1.15 list for the course Everybody send us an and please write that you’re taking the course (even if you are not enrolled yet). Please let us To which group you belong Whether you are a undergraduate student, graduate (M.Sc. / Ph.D.) student or other Whether you have any programming background

1.16 Example exercises Ex. 1: Write a script that prints "I will submit my homework on time" 100 times (by the end of this lesson! ) Ex. 3: Read a GenBank file and print coordinates of ORFs Ex. 5: Write a module of functions for reading sequence files and identification of palindromes

1.17 A first Perl script print "Hello world!"; A Perl statement must end with a semicolon “ ; ” The print function outputs some information to the terminal screen Compare this to Java's "Hello world": public class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!!"); } }

1.18 Data TypeDescription scalarA single number or string value "hello" arrayAn ordered list of scalar values (9,-15,3.5) associative arrayAlso known as a “hash”. Holds an unordered list of key-value couples. ('dudu' => 'ofir' => Data types

1.19 Scalar Data

1.20 A scalar is either a string or a number. Numerical values e4 (= 1.3 × 10 4 = 1,300)‏ 6.35e-14 ( = 6.35 × )‏ Scalar values

1.21 Single-quoted strings print 'hello world'; hello world Double-quoted strings print "hello world"; hello world print "hello\tworld"; helloworld print 'a backslash-t: \t '; a backslash-t: \t MeaningConstruct Newline\n Tab\t Backslash\\ Double quote\”\” Strings Backslash is an “escape” character that gives the next character a special meaning: print "a backslash: \\ "; a backslash: \ print "a double quote: \" "; a double quote: " Scalar values

1.22 Operators An operator takes some values (operands), operates on them, and produces a new value. Numerical operators: + - * / ** (exponentiation) (autoincrement, will talk about them later)‏ print 1+1; 2 print ((1+1)**3); 8

1.23 Operators An operator takes some values (operands), operates on them, and produces a new value. String operators:. (concatenate) x (replicate)‏ e.g. print ('swiss'.'prot'); swissprot print (('swiss'.'prot')x3); swissprotswissprotswissprot

1.24 String or number? Perl decides the type of a value depending on its context: (9+5).'a' 14.'a' '14'.'a' '14a' Warning: When you use parentheses in print make sure to put one pair of parantheses around the WHOLE expression: print (9+5).'a'; # wrong print ((9+5).'a'); # right You will know that you have such a problem if you see this warning: print (...) interpreted as function at ex1.pl line 3. (9x2)+1 ('9'x2)+1 '99'

1.25 Variables Scalar variables can store scalar values. Variable declaration my $priority; Numerical assignment $priority = 1; String assignment $priority = 'high'; Assign the value of variable $b to $a $a = $b; Note: Here we make a copy of $b in $a.

1.26 Variables - notes and tips Tips: Give meaningful names to variables: e.g. $studentName is better than $n Always use an explicit declaration of the variables using the my function Note: Variable names in Perl are case-sensitive. This means that the following variables are different (i.e. they refer to different values): $varname = 1; $VarName = 2; $VARNAME = 3; Note: Perl has a long list of scalar special variables ($_, $1, $2,…) So please don’t use them!

1.27 Variables - always use strict! Always include the line: use strict; as the first line of every script. “Strict” mode forces you to declare all variables by my. This will help you avoid very annoying bugs, such as spelling mistakes in the names of variables. my $varname = 1; $varName++; Warning: Global symbol "$varName" requires explicit package name at... line...

1.28 Interpolating variables into strings $a = 9.5; print "a is $a!\n"; a is 9.5! Reminder: print 'a is $a!\n'; a is $a!\n

1.29 Command-line interface

1.30 Running Perl at the Command Line Traditionally, Perl scripts are run from a command line interface (Similar to the old DOS). (Start it by clicking: Start  Accessories  Command Prompt or: Start  Run…  cmd ) Running a Perl script perl -w YOUR_SCRIPT_NAME (To check if Perl is installed in your computer use the ‘perl -v’ command)

1.31 Common DOS commands: d: change to other drive (d in this case) md my_dir make a new directory cd my_dir change directory cd.. move one directory up dir list files (dir /p to view it page by page) help list all dos commands help dir get help on a dos command (hopefully) auto-complete go to previous/next command -c Emergency exit More tips about the command line are founds here.here Running Perl at the Command Line

1.32 Our first Perl script print "Hello world!"; A Perl statement must end with a semicolon “ ; ” The print function outputs some information to the terminal screen Try it yourself! Use Notepad to write the script in a file named “ hello.pl ” (Save it in D:\perl_ex) Run it! Click Start  Accessories  Command Prompt or: Start  Run…  cmd Change to the right drive ( "D:" ) and change directory to the directory that holds the Perl script ( "cd perl_ex" ). Type perl -w script_name.pl (replace script_name.pl with the name of the script)

1.33 Class exercise 1 Create a directory in drive D: called "perl_ex". Open a new file (text file) called "perl_ex1.pl" Write a Perl script that prints the following lines: 1.The string “hello world! hello Perl!” 2.Use the operator “.” to concatenate the words “apple!”, “orange!!” and “banana!!!”‏ 3*. Produce the line: “ 666:666:666:god help us! ” without any 6 and with only one : in your script! Like so: hello world! hello Perl! apple!orange!!banana!!! 666:666:666:god help us!

1.34 Reading input <STDIN> allows us to get input from the user: print "What is your name?\n"; my $name = <STDIN>; print "Hello $name!"; Here is a test run: What is your name? Shmulik Hello Shmulik ! $name: "Shmulik\n"

1.35 $name: "Shmulik\n" Reading input Use the chomp function to remove the “new-line” from the end of the string (if there is any): print "What is your name?\n"; my $name = <STDIN>; chomp $name; # Remove the new-line print "Hello $name!"; Here is a test run: What is your name? Shmulik Hello Shmulik! $name: "Shmulik"

1.36 The length function The length function returns the length of a string: print length("hi you"); 6 Actually print is also a function so you could write: print(length("hi you")); 6

1.37 The substr function The substr function extracts a substring out of a string. It receives 3 arguments: substr(EXPR,OFFSET,LENGTH) For example: $str = "university"; $sub = substr ($str, 3, 5); $sub is now "versi", and $str remains unchanged. Note: If length is omitted, everything to the end of the string is returned. You can use variables as the offset and length parameters. The substr function can do a lot more, google it and you will see…

1.38 Documentation of perl functions Anothr good place to start is the list of All basic Perl functions in the Perl documentation site: Click the link “Functions” on the left (let's try it…)

1.39 Home exercise 1 – submit by until next class 1.Install Perl on your computer. Use Notepad to write scripts. 2.Write a script that prints "I will submit my homework on time" 100 times. 3.Write a script that assigns your address into the variable $ and then prints it. 4.Write a script that reads a line and prints the length of it. 5.Write a script that reads a line and prints the first 3 characters. 6*.Write a script that reads 4 inputs: text line number representing "start" position number representing "end" position number representing "copies. and then prints the letters of the text between the "start" and "end" positions (including the "end"), duplicated "copies" times. (an example is given in the Ex1.doc on the course web site) * Kohavit questions are a little tougher, and are not mandatory