Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
1 Introduction to Perl Part III: Biological Data Manipulation.
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
Programming and Perl for Bioinformatics Part III.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation.
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
6b.1 Pattern Matching. 6b.2 We often want to find a certain piece of information within the file, for example: Pattern matching 1.Find all names that.
Physical Mapping II + Perl CIS 667 March 2, 2004.
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Introduction to Perl Matt Hudson. Review blastall: Do a blast search HMMER hmmpfam: search against HMM database hmmsearch: search proteins with HMM hmmbuild:
2010/11 : [1]Building Web Applications using MySQL and PHP (W1)PHP Recap.
Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Lecture 8 perl pattern matching features
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Computer Programming for Biologists Class 10 Dec 5 th, 2014 Karsten Hokamp
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Assignment feedback Everyone is doing very well!
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Why? – Examples Speaking Computer-ise – How – What – Environment (windows) Basic Instructions – Declare – Conditional – Loop – Input Write a quiz game.
Introduction to Unix – CS 21
Motif discovery and Protein Databases Tutorial 5.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
GE3M25: Computer Programming for Biologists Python, Class 5
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Python Lesson 1 1. Starter Create the following Excel spreadsheet and complete the calculations using formulae: 2 Add A1 and B1 A2 minus B2 A3 times B3.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
bacteria and eukaryotes
Lecture 7 You’re on your own now...
Genome Center of Wisconsin, UW-Madison
Modification of the bioperl script for parsing BLAST output
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool (BLAST)
Input and Output Python3 Beginner #3.
Presentation transcript:

Advanced Perl for Bioinformatics Lecture 5

Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the variable with =~, then use it within a conditional: if ($dna =~ /CAATTG/) {print “Eco RI\n”;} Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) A vertical line means “or”; it allows you to look for either of two completely different patterns: if ($dna =~ /GAAT|ATTC/)

Reading and writing files, review Open a file for reading: open INPUT,”/home/class30/input.txt”; Or writing open OUTPUT,”>/home/class30/output.txt”; Make sure you can open it! open INPUT, ”input.txt” or die “Can’t open file\n”;

Test time Last one…

Hashes Perl has another super useful data structure called a hash, for want of a better name. A hash is an associative array – i.e. it is an array of variables that are associated with each other.

Making a hash of it You can think of a hash just as if it were a set of questions and answers my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” );

Getting the hash back my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” ) print “my name is “, $matthash{first_name}; print “ “, $matthash{surname}, “\n”; You can store a lot of information and recover it easily and quickly without knowing in what order you added it, unlike an array.

Hashes as an array You can get the “keys” of the hash and use them like an array: foreach my $info (keys %matthash){ print “$info = $matthash{$info}”; }

Why are hashes useful? Exercise. Many of you might have noticed in the exercise on restriction sites, that there was no way to keep track of which sites were which using arrays Modify your script using a hash like this one: my %enzymehash = ( “EcoRI” => “CAATTG”, “BamHI” => “GGATCC”, “HindIII” => “AAGCTT”);

(an) answer foreach my $name (keys %enzymehash){ if ($sequence =~ /$enzymehash{$name}/) { print “I found a site for $name,$enzymehash{$name}”; }

Putting data in a hash my %hash; while ( ) { /stuff(important stuff) more stuff (best stuff)/; $hash{$1} = $2; } Or…. while ($line = ) { = split /\t/, $line; $hash{$tmp[0]} = $tmp[1]; }

Advanced regex The fun isn’t over yet. You can match precise numbers of characters Any number of characters Positions in a line Precise formatting (spaces, tabs etc) You can get bits of the string you matched out and store them in variables You can use regexes to substitute or to translate

Grabbing bits of the regex The fun isn’t over yet. my $blastline = “Query= AT1g34399 gene CDS”; $blastline =~ /Query= (.+) gene/; my $atgnumber = $1; print “The accession number is $atgnumber\n”; You can store the contents of the bit within brackets, within the regex, as the special variable $1. Then use it for other stuff. If you put another pair of brackets in, it will be stored in $2.

Using modules You can use other peoples modules, including those that come with Perl. These provide extra commands, or change the way your Perl script behaves. E.g. use strict; use warnings; use Bio::Perl; You will see these stacked up at the beginning of more complicated Perl scripts. Some modules come with perl (strict, warnings) #man perlmod others you need to download and add in yourself.

A last exercise?... So: how might hashes help you solve this? Open up a BLAST output file Spit out the name of the query sequence, the top hit, and how many hits there were.

Programming projects Now it’s time to think of your programming projects. Hopefully you have an idea – we’ll discuss how feasible they are in the time available If not, here are some suggestions

Suggested program functions Translate a cDNA into protein, and then check it against the pfam database for HMM hits. Make a real restriction map of a DNA sequence, with predicted fragment sizes Align proteins of a favorite family, open the alignment and find residues that are totally conserved. Perform BLAST against the latest version of the database files for a particular organism – which will check whether the user has the latest files, and if not will download them Design PCR primers, to make a fragment size chosen by the user, for a sequence input from a fasta file. Check whether primer sites are unique in a sequenced, or partially sequenced, genome, and gives an “electronic PCR” result. Output an XML formatted version of a BLAST or HMMER text file. Analyze codon usage in a protein coding DNA sequence and calculate the Ka/Ks ratio