Introduction to perl programming: the minimum to know for practice!

Slides:



Advertisements
Similar presentations
Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China.
Advertisements

File Handle and conditional Lecture 2. A typical bioinformatics file: FASTA format Name of the following file is: DNA_sequence.fasta >gi|34529|emb|Y |
CIS 240 Introduction to UNIX Instructor: Sue Sampson.
Programming and Perl for Bioinformatics Part III.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
CSET4100 – Fall 2009 Perl Introduction Scalar Data, Operators & Control Blocks Acknowledgements: Slides adapted from NYU Computer Science course on UNIX.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 33: - Shell Programming Chin-Chih Chang
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
Practical Extraction & Report Language Picture taken from
Introduction to Perl Learning Objectives: 1. To introduce the features provided by Perl 2. To learn the basic Syntax & simple Input/Output control in Perl.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
0 Chap. 2. Types, Operators, and Expressions 2.1Variable Names 2.2Data Types and Sizes 2.3Constants 2.4Declarations Imperative Programming, B. Hirsbrunner,
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
JavaScript, Third Edition
1 Expressions, Operators Expressions Operators and Precedence Reading for this class: L&L, 2.4.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Practical Extraction & Report Language PERL Joseph Beltran.
A Variable is symbolic name that can be given different values. Variables are stored in particular places in the computer ‘s memory. When a variable is.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Lecture 8 perl pattern matching features
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
2440: 211 Interactive Web Programming Expressions & Operators.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Introduction To Perl Susan Lukose. Introduction to Perl Practical Extraction and Report Language Easy to learn and use.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
COMP519: Web Programming Autumn 2010 Perl Tutorial: The very beginning  A basic Perl Program The first line Comments and statements Simple printing 
Bioinformatics 生物信息学理论和实践 唐继军
1 Perl Syntax: control structures Learning Perl, Schwartz.
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 9: Perl Programming Practical Extraction and Report Language Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Unix – CS 21
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Shell Programming Learning Objectives: 1. To understand the some basic utilities of UNIX File 2. To compare UNIX shell and popular shell 3. To learn the.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Getting started in Perl: Intro to Perl for programmers Matthew Heusser – xndev.com - Presented to the West Michigan Perl User’s Group.
Introduction to Perl October 4, 2004 Class Meeting 7 * Notes on Perl by Lenwood Heath, Virginia Tech © 2004.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
More Perl Data Types Scalar: it may be a number, a character string, or a reference to another data type. -the sigil $ is used to denote a scalar(or reference)
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Basic Variables & Operators Web Programming1. Review: Perl Basics Syntax ► Comments: start with # (ignored by Perl) ► Statements: ends with ; (performed.
Week Five Agenda Link of the week Review week four lab assignment This week’s expected outcomes Next lab assignment Break-out problems Upcoming deadlines.
0 Chap.2. Types, Operators, and Expressions 2.1Variable Names 2.2Data Types and Sizes 2.3Constants 2.4Declarations 2.5Arithmetic Operators 2.6Relational.
The Scripting Programming Language
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template.
String Methods Programming Guides.
ITM 352 Data types, Variables
Prepared by: Eng. Maryam Adel Abdel-Hady
Looking for Patterns - Finding them with Regular Expressions
Chap. 2. Types, Operators, and Expressions
Perl Programming Language Design and Implementation (4th Edition)
Chapter 5 - Control Structures: Part 2
Regular Expressions and perl
Perl Variables: Array Web Programming.
Control Structures: if Conditional
Control Structures: for & while Loops
PHP.
CSCI 431 Programming Languages Fall 2003
elementary programming
The Selection Structure
Introduction to Perl Learning Objectives:
Introduction to Bash Programming, part 3
Karan Thaker CS 265 Section 001
Presentation transcript:

Introduction to perl programming: the minimum to know for practice! Fredj Tekaia tekaia@pasteur.fr Bioinformatics and Genome Analyses Institut Pasteur Tunis, Tunisia. September 18 – December 15, 2017

perl A basic program #!/usr/bin/perl # Program to print a message print 'Hello world.'; # Print a message

Variables, Arrays $val=9; $val=“9”; $val=“ABC transporter”; • case sensitive: $val is different from $Val

Operations and Assignment Perl uses arithmetic operators: $a = 1 + 2; # Add 1 and 2 and store in $a $a = 3 - 4; # Subtract 4 from 3 and store in $a $a = 5 * 6; # Multiply 5 and 6 $a = 7 / 8; # Divide 7 by 8 to give 0.875 $a = 9 ** 10; # Nine to the power of 10 $a = 5 % 2; # Remainder of 5 divided by 2 $a++; # Return $a and then increment it by 1 $a--; # Return $a and then decrement it by 1 for strings perl has among others: $a = $b . $c; # Concatenate $b and $c $a = $b x $c; # $b repeated $c times

Operations and Assignment To assign values perl includes $a = $b; # Assign $b to $a $a += $b; # Add $b to $a $a -= $b; # Subtract $b from $a $a .= $b; # Append $b onto $a

Array variables they are prefixed by: @ An array variable is a list of scalars (ie numbers and/or strings). they are prefixed by: @ @SEQNAME = (”MG001", ”MG002", ”MG003"); $SEQNAME[2] (MG003) Attention: 0, 1, 2,.... @num = (0,1,2,3);

Array variables @L_CODONS = ('TTT','TTC','TTA','TTG', 'CTT','CTC','CTA','CTG', 'ATT','ATC','ATA','ATG', 'GTT','GTC','GTA','GTG', 'TCT','TCC','TCA','TCG', 'CCT','CCC','CCA','CCG', 'ACT','ACC','ACA','ACG', 'GCT','GCC','GCA','GCG', 'TAT','TAC','TAA','TAG', 'CAT','CAC','CAA','CAG', 'AAT','AAC','AAA','AAG', 'GAT','GAC','GAA','GAG', 'TGT','TGC','TGA','TGG', 'CGT','CGC','CGA','CGG', 'AGT','AGC','AGA','AGG', 'GGT','GGC','GGA','GGG');

Array variables @AA = ('A','R','N','D','C','Q','E','G','H','I','L','K','M','F','P','S','T','W','Y','V','B'); @mm = ( 'a','r','n','d','c','q','e','g','h','i','l','k','m','f','p','s','t','w','y','v','b’ );

Associative arrays : hash tables Ordinary list arrays allow us to access their element by number. The first element of array @AA is $AA[0]. The second element is $AA[1], and so on. But perl also allows us to create arrays which are accessed by string. These are called associative arrays. The array itself is prefixed by a % sign

Associative arrays : hash tables %ages = (”Michael", 39, "Angie", 27, "Willy", "21 years", "The Queen Mother", 108); $ages{"Michael"}; # Returns 39 $ages{"Angie"}; # Returns 27 $ages{"Willy"}; # Returns "21 years" $ages{"The Queen Mother"}; # Returns 108

File handling a script (cat.pl) equivalent to the UNIX cat: #!/usr/bin/perl open(FILE,”GMG.pep”); while <FILE> { print $_; } close (FILE); use: chmod a+x cat.pl ; cat.pl

split A very useful function in perl: splits up a string and places it into an array. #!/usr/bin/perl open(FILE,”GMG.pep”); while <FILE> { @tab=split(/\s+/, $_); print $tab[0]; } close (FILE);

split #!/usr/bin/perl open(FILE,”GMG.pep”); while <FILE> { @tab=split(/\s+/, $_, 2); $NOM{$tab[0]} = $tab[1]; print $NOM{$tab[0]} ; } close (FILE); @tab=split(/\s+/,$_,n);

Control structures foreach To go through each line of an array or other list-like structure (such as lines in a file) perl uses the foreach structure. This has the form foreach $nom (@SEQNAME) # Visit each item in turn # and call it $nom { print "$nom\n"; # Print the item }

Control structures foreach $j ( 0 .. 2) # Visit each value in turn # and call it $j { print "$SEQNAM[$j]\n";# Print the item } foreach $j ( 0 .. $#AA) # Visit each value in turn # and call it $j { print "$AA[$j]\n";# Print the item }

Testing Here are some tests on numbers and strings. $a == $b # Is $a numerically equal to $b? # Beware: Don't use the = operator. $a != $b # Is $a numerically unequal to $b? $a eq $b # Is $a string-equal to $b? $a ne $b # Is $a string-unequal to $b? You can also use logical and, or and not: ($a && $b) # Is $a and $b true? ($a || $b) # Is either $a or $b true? !($a) # is $a false?

for for (initialise; test; inc) { first_action; second_action; etc.... } for ($i = 0; $i < 10; ++$i) # Start with $i = 1 # Do it while $i < 10 #Increment $i before repeating { print "$i\n"; }

Conditionals if ($a) { print "The string is not empty\n"; } else print "The string is empty\n"; #!/usr/bin/perl open(FILE,”GMG.pep”); while <FILE> { print $_ if ( m/>/ ); } close (FILE);

String matching $a eq $b # Is $a string-equal to $b? $a ne $b # Is $a string-unequal to $b? Here are some special RE characters and their meaning . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character

Some more special characters \n # A newline \t # A tab \w # Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W # Any non-word character. # The same as [^a-zA-Z0-9_] \d # Any digit. The same as [0-9] \D # Any non-digit. The same as [^0-9] \s # Any whitespace character: space, # tab, newline, etc \S # Any non-whitespace character \b # A word boundary, outside [] only \B # No word boundary

Some more special characters Characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash (\). So: \| # Vertical bar \[ # An open square bracket \) # A closing parenthesis \* # An asterisk \^ # A carat symbol \/ # A slash \\ # A backslash

Substitution and translation s/london/London/ $sentence =~ s/london/London/ global substitution; i option (for "ignore case"). s/london/London/gi Translation $sentence =~ tr/abc/edf/ tr/a-z/A-Z/; #converts $_ to upper case tr/A-Z/a-z/; #converts $_ to lower case

Practical session

Exercises: Simple scripts -given a nucleotide sequence: base composition -given a protein sequence: amino-acid composition; -given a nucleic database (in fasta format): given a protein database (in fasta format): amino-acid composition

Exercises: Simple scripts -sequence size (bases or amino-acids) -extract a portion of a sequence: (pos start; pos end) -extract a sequence by name (from a database of sequences) gene sequence: codon count; given allxxseqnew file: script to compute frequencies of multiple matches; see splitfasta.pl; splitdnafasta.pl

Exercises: Simple scripts given allxxseqnew file: script to compute frequencies of multiple matches; Exercices de manipulation des données : - home-directory, mkdir, cd, pathway, pwd, find ; - notation : DB.pep, DB.dna, seq.dna, seq.prt ; - utiliser « tab » comme séparateur ; - utilisation de sed et de grep ; - le format fasta des séquences ; - compter le nombre des séquences dans une base de séquences au format fasta ; (grep « > » DB.pep  wc –l ) - changer un caractère par un autre : - extraire les séquences d’une base (fichier au format fasta) (splitfasta.pl, splitdnafasta.pl); - extraire 1 partie d’une séquence (la séquence est au format fasta); - fréquence des aa d’une séquence protéique ; - fréquence des bases d’une séquence nucléotidique ; - taille d’une séquence ; - tailles des séquence d’une base ; - fréquence des codons d’une séquence codante ; - Codons volatilité : . correspondance codons/amino-acids ;

Thank You