Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

The Linux Operating System Lecture 6: Perl for the Systems Administrator Tonga Institute of Higher Education.
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
COMP234 Perl Printing Special Quotes File Handling.
Programming and Perl for Bioinformatics Part III.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
4.1 Controls: Ifs and Loops. 4.2 Controls: if ? Controls allow non-sequential execution of commands, and responding to different conditions else { print.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Practical Extraction & Report Language Picture taken from
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Getting Started with Perl (and Excel) Biophysics 101 September 17, 2003 Griffin Weber (With material from Jon Radoff and Ivan Ovcharenko)
Tutorial 14 Working with Forms and Regular Expressions.
Scripting Languages Chapter 8 More About Regular Expressions.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 - Benjamin King Mount Desert Island Biological Laboratory
Tutorial 14 Working with Forms and Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Sort the Elements of an Array Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. Elements considered as strings.
Introduction to Perl & BioPerl Dr G. P. S. Raghava Bioinformatics Centre Bioinformatics Centre IMTECH, Chandigarh Web:
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Hossain Shahriar Announcement and reminder! Tentative date for final exam need to be fixed! Topics to be covered in this lecture(s)
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Perl Programming Paul Tymann Computer Science Department Rochester Institute of Technology
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Unix – CS 21
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Artificial Intelligence Lecture No. 26 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Chapter Twelve sed, awk & perl1 System Programming sed, awk & perl.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
1 More Perl CIS*2450 Advanced Programming Concepts.
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
The Scripting Programming Language
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
CS 330 Class 7 Comments on Exam Programming plan for today:
Introduction to Bioinformatic Computation. Lecture #
Teaching Materials by Ivan Ovcharenko
Introduction to Computer Science
Introduction to Bioinformatic Computation. Lecture #
Presentation transcript:

part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line array Modules in Perl: How to use/share libraries of functions Functions/Subroutines: Repetitive use of functional blocks Error messages: How to interrupt program on a mistake die statement

part 4 Arrays as a “FIRST-COME … LAST-SERVED” = (7,-1,2,4,5); 5 numbers array # zero = (); # store numbers 7; -1; 2; 4; 5; $lastNumber = print “last number stored was $lastNumber\n”; Jar of 5 numbers push 5 pop

part 4 When push/pop commands are useful? #!/usr/local/bin/perl # storing file = (); open (INP, “ ) { chomp($line); $line; } close(INP); # calculating number of lines in the file $nLines = $#fileLines + 1; print “There are $nLines lines in data.txt file\n”; # printing out data.txt file content foreach $line { print “$line\n”; } Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number = (1..6); foreach $d { print “$d “; } print “\n”;

part 4 Command line arguments #!/usr/local/bin/perl # determine file name $fName = $ARGV[0]; # open, read and print out file open (INP, “ ) { print $line; } close(INP); printFile.pl -- program, which prints out contents of a file Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of numbers.txt words.txt printFile.pl numbers.txt printFile.pl -- array of arguments following program = (“numbers.txt”);

part 4 Example. Print out N-th line of the file #!/usr/local/bin/perl # determine file name, and line index $fName = $ARGV[0]; $lineNo = $ARGV[1]; # open and read file open (INP, “ ) { $line; } close(INP); # print out N-th line print $fileLines[ $lineNo-1 ]; Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of words.txt printFile.pl words.txt 3 a challenging problem. Analyzing novel

part 4 Error messages #!/usr/local/bin/perl # check whether we’ve got 2 arguments or not if ($#ARGV != 1) { die “Error. Incorrect number of arguments\n”; }... printFile.pl words.txt 3 How to stop correctly a program with an indication of a run problem? Example problem: Program should be executed with 2 arguments, but user specifies only 1: printFile.pl 3 Program should stop and report about an error Print out a message and stop the program Stop on incorrect indication of a line number:... if ($ARGV[1] <= 0) { die “Error. Incorrect line number: $ARGV[1]\n”; }...

part 4 Defining novel functions and commands $x = min(5,3); print “Smallest of 5 and 3 is: $x\n”; # Function min sub min { ($a, $b) if ($a < $b) { $small = $a; } else { $small = $b; } return $small; } Defining min function, which returns minimum of 2 numbers: Function is a “mini computer” inside a program, it gets input data and produces output results FUNCTION (filtering out numbers) INPUT 2 Hello Everybody OUTPUT Hello Everybody INPUT parameters

part 4 Regular expressions $string1 = “Total: 576 genes, 2763 exons, some introns”; $string2 = “human -G-ACT---TTGC------AA----A---A----”; How to extract 2 numbers? How to extract just DNA sequence? Special symbols substituting groups of common type characters (called patterns): \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character ^ Match the beginning of the line. Match any character (except newline) $ Match the end of the line \t Tabulation symbol (HT, TAB) \n Newline (LF, NL)

part 4 Grouping options: * Match 0 or more times + Match 1 or more times [] Character class Patterns management: $string = “Total: 576 genes, 2763 exons, some introns”; $string =~ s/\d+/some/g; --> “Total: some genes, some exons, some introns”; $string =~ s/\s+/#/g; --> “Total:#576#genes,#2763#exons,#some#introns”; $string =~ s/\D+/\*/g; --> “* 576 * 2763 * * *”;

part 4 Localizing substrings: human -G-ACT---TTGC------AA----A---A-----CG-----G-AT TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT human C----GG------GA TG-AG--AGG | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT alignment.blast while ($line = ) { if ($line =~ /^mouse/) { print $line;} How to extract only the lines starting with ‘mouse’ ? mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT

part 4 Obtaining substrings after localization: human -G-ACT---TTGC------AA----A---A-----CG-----G-AT TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT human C----GG------GA TG-AG--AGG | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT alignment.blast $humanSeq = “”; $mouseSeq = “”; while ($line = ) { if ($line =~ /^mouse (\S+)$/) { $mouseSeq.= $1; } elsif ($line =~ /^human (\S+)$/) { $humanSeq.= $1; } } print “Human sequence: $humanSeq\n”; print “Mouse sequence: $mouseSeq\n”; How to extract human and mouse sequences? /...(xxx)...(xxx)../ -- substrings enclosed into parenthesizes are available after a search in a format of variables $1, $2,...

part 4 Modules: Perl does not have functions for all the cases, but majority of those functions are already programmed by other people… And they share their libraries of functions, which are called modules Perl does not know how to create pictures, use GD; -- now it knows How to communicate with databases? use DBI; How to do DNA sequence analysis? use BioPerl; How to extract command line options? use Getopt; -- storage of Perl modules use X; command indicates that functions from X module should be used