Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
1 Introduction to Perl Part III: Biological Data Manipulation.
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
Programming and Perl for Bioinformatics Part III.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Hashes a “hash” is another fundamental data structure, like scalars and arrays. Hashes are sometimes called “associative arrays”. Basically, a hash associates.
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
7ex.1 Hashes. 7ex.2 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Eyal Enter a phone number:
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
8.1 Hashes (associative arrays). 8.2 Let's say we want to create a phone book... Enter a name that will be added to the phone book: Dudi Enter a phone.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
Physical Mapping II + Perl CIS 667 March 2, 2004.
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Introduction to Perl Matt Hudson. Review blastall: Do a blast search HMMER hmmpfam: search against HMM database hmmsearch: search proteins with HMM hmmbuild:
2010/11 : [1]Building Web Applications using MySQL and PHP (W1)PHP Recap.
Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.
Introduction to Computational Thinking Vicky Chen.
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
1 An Introduction to Perl Part 2 CSC8304 – Computing Environments for Bioinformatics - Lecture 8.
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Lecture 8 perl pattern matching features
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Computer Programming for Biologists Class 10 Dec 5 th, 2014 Karsten Hokamp
9 1 DBM Databases CGI/Perl Programming By Diane Zak.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Assignment feedback Everyone is doing very well!
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Why? – Examples Speaking Computer-ise – How – What – Environment (windows) Basic Instructions – Declare – Conditional – Loop – Input Write a quiz game.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Introduction to Unix – CS 21
Motif discovery and Protein Databases Tutorial 5.
Iteration While / until/ for loop. While/ Do-while loops Iteration continues until condition is false: 3 important points to remember: 1.Initialise condition.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Introduction to PERL Genetics PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text.
Files Tutor: You will need ….
GE3M25: Computer Programming for Biologists Python, Class 5
Python Lesson 1 1. Starter Create the following Excel spreadsheet and complete the calculations using formulae: 2 Add A1 and B1 A2 minus B2 A3 times B3.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Modification of the bioperl script for parsing BLAST output
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Advanced Perl for Bioinformatics Lecture 5

Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the variable with =~, then use it within a conditional: if ($dna =~ /CAATTG/) {print “Eco RI\n”;} Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) A vertical line means “or”; it allows you to look for either of two completely different patterns: if ($dna =~ /GAAT|ATTC/)

Reading and writing files, review Open a file for reading: open INPUT,”/home/class30/input.txt”; Or writing open OUTPUT,”>/home/class30/output.txt”; Make sure you can open it! open INPUT, ”input.txt” or die “Can’t open file\n”;

Test time Last one…

Hashes Perl has another super useful data structure called a hash, for want of a better name. A hash is an associative array – i.e. it is an array of variables that are associated with each other.

Making a hash of it You can think of a hash just as if it were a set of questions and answers my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” );

Pseudocode: Create an associative array where these keys are associated with these values: KeyValue first_nameMatt surnameHudson age secret height187(note in text – cm) hairstyleD minus Making a hash of it

Getting the hash back my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” ) print “my name is “, $matthash{first_name}; print “ “, $matthash{surname}, “\n”; You can store a lot of information and recover it easily and quickly without knowing in what order you added it, unlike an array.

Pseudocode Output text “My name is “ Then value for key “first name” in matthash Then value for key “last name” in matthash Then newline character Getting the hash back

Hashes as an array You can get the “keys” of the hash and use them like an array: foreach my $info (keys %matthash){ print “$info = $matthash{$info}”; }

Why are hashes useful? Exercise. Many of you might have noticed in the exercise on restriction sites, that there was no way to keep track of which sites were which using arrays Modify your script using a hash like this one: my %enzymehash = ( “EcoRI” => “CAATTG”, “BamHI” => “GGATCC”, “HindIII” => “AAGCTT”);

(an) answer foreach my $name (keys %enzymehash){ if ($sequence =~ /$enzymehash{$name}/) { print “I found a site for $name,$enzymehash{$name}”; }

For every key in the hash %enzymehash If the sequence in $sequence contains the value for that key: print “I found a site for (key), (value in %enzymehash for key)” pseudocode

Putting data in a hash my %hash; while ( ) { /stuff(important stuff) more stuff (best stuff)/; $hash{$1} = $2; } Or…. while ($line = ) { = split /\t/, $line; $hash{$tmp[0]} = $tmp[1]; }

Create an empty hash %hash For every line in the file FILE: if the line matches the regex: stuff(important stuff) more stuff (best stuff) then store (important stuff) as a hash key and (best stuff) as a value for that key pseudocode

Advanced regex The fun isn’t over yet. You can match precise numbers of characters Any number of characters Positions in a line Precise formatting (spaces, tabs etc) You can get bits of the string you matched out and store them in variables You can use regexes to substitute or to translate

Grabbing bits of the regex The fun isn’t over yet. my $blastline = “Query= AT1g34399 gene CDS”; $blastline =~ /Query= (.+) gene/; my $atgnumber = $1; print “The accession number is $atgnumber\n”; You can store the contents of the bit within brackets, within the regex, as the special variable $1. Then use it for other stuff. If you put another pair of brackets in, it will be stored in $2.

Using modules You can use other peoples modules, including those that come with Perl. These provide extra commands, or change the way your Perl script behaves. E.g. use strict; use warnings; use Bio::Perl; You will see these stacked up at the beginning of more complicated Perl scripts. Some modules come with perl (strict, warnings) #man perlmod others you need to download and add in yourself.

We have talked about using “my” the first time you use a variable I recommend you always have use strict; At the top of your script. That way if you mistype a variable and use my, you will know. Using strict

A last exercise?... So: how might hashes help you solve this? Open up a BLAST output file Spit out the name of the query sequence, the top hit, and how many hits there were.

Programming projects Now it’s time to think of your programming projects. Hopefully you have an idea – we’ll discuss how feasible they are in the time available If not, here are some suggestions

Suggested program functions Translate a cDNA into protein, and then check it against the pfam database for HMM hits. Make a real restriction map of a DNA sequence, with predicted fragment sizes Align proteins of a favorite family, open the alignment and find residues that are totally conserved. Perform BLAST against the latest version of the database files for a particular organism – which will check whether the user has the latest files, and if not will download them Design PCR primers, to make a fragment size chosen by the user, for a sequence input from a fasta file. Check whether primer sites are unique in a sequenced, or partially sequenced, genome, and gives an “electronic PCR” result. Output an XML formatted version of a BLAST or HMMER text file. Analyze codon usage in a protein coding DNA sequence and calculate the Ka/Ks ratio