BINF 634 FALL 2013 LECTURE 8 Modules and Maps1 Thanks to John Grefenstette for Many of These Slides !! Topics Midterm Discussions Program 2 Discussions.

Slides:



Advertisements
Similar presentations
Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Advertisements

Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Dynamic Arrays Lecture 4. Arrays In many languages the size of the array is fixed however in perl an array is considered to be dynamic: its size can be.
Programming and Perl for Bioinformatics Part III.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
CSC3530 Software Technology Tutorial Two PERL Basics.
Scripting Languages Perl Chapter #4 Subroutines. Writing your own Functions Functions is a programming language serve tow purposes: –They allow you to.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Lecture 2 BNFO 135 Usman Roshan. Perl variables Scalar –Number –String Examples –$myname = “Roshan”; –$year = 2006;
Physical Mapping II + Perl CIS 667 March 2, 2004.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
Subroutines Just like C, PERL offers the ability to use subroutines for all the same reasons – Code that you will use over and over again – Breaking large.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
 2004 Prentice Hall, Inc. All rights reserved. 1 Chapter 11 - JavaScript: Arrays Outline 11.1 Introduction 11.2 Arrays 11.3 Declaring and Allocating Arrays.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
BINF 634 Fall Lecture 14 Review 1 Final Review Lec 1 Review of Molecular Biology –Central dogma –Proteins –DNA –Reading frames Perl topics: Running.
Computational Biology, Part A More on Sequence Operations Robert F. Murphy Copyright  1997, All rights reserved.
Bioinformatics 生物信息学理论和实践 唐继军
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
Books. Perl Perl (Practical Extraction and Report Language) by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl 5 was released.
1 Topics Quiz 1 Homework Review Programming Assignment # 1 Perl shortcuts Declaring variables and Scope Subroutines passing arguments array references.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
Bioinformatics 生物信息学理论和实践 唐继军
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
5 1 Data Files CGI/Perl Programming By Diane Zak.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
1 More Perl Strings References Complex data structures –Multidimensional arrays Subprograms Perl OOP –Methods –Constructors and Instances –Inheritance.
Perl Chapter 6 Functions. Subprograms In Perl, all subprograms are functions – returns 0 or 1 value – although may have “side-effects” optional function.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Topic 4:Subroutines CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 4, pages 56-72, Programming Perl 3rd edition pages 80-83,
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
A Few More Functions. One more quoting operator qw// Takes a space separated sequence of words, and returns a list of single-quoted words. –no interpolation.
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
 2001 Prentice Hall, Inc. All rights reserved. Chapter 7 - Introduction to Common Gateway Interface (CGI) Outline 7.1Introduction 7.2A Simple HTTP Transaction.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
2.1 Scalar data - revision numeric e-14 ( = 6.35 × )‏ operators: + (addition) - (subtraction) * (multiplication) / (division)
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
 History  Ease of use  Portability  Standard  Security & Privacy  User support  Application &Popularity Today  Ten Most Popular Programming Languages.
1 Project 7: Looping. Project 7 For this project you will produce two Java programs. The requirements for each program will be described separately on.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
BINF 634 Fall LECTURE061 Outline Lab 1 (Quiz 3) Solution Program 2 Scoping Algorithm efficiency Sorting Hashes Review for midterm Quiz 4 Outline.
Programming Perl in UNIX Course Number : CIT 370 Week 2 Prof. Daniel Chen.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
OCR Computing GCSE © Hodder Education 2013 Slide 1 OCR GCSE Computing Python programming 3: Built-in functions.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 8.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Perl Variables: Array Web Programming.
Lesson 2. Control structures File IO - reading and writing Subroutines
Presentation transcript:

BINF 634 FALL 2013 LECTURE 8 Modules and Maps1 Thanks to John Grefenstette for Many of These Slides !! Topics Midterm Discussions Program 2 Discussions Perl Modules Module Getopt::Std Range Operator Restriction Maps Parsing Rebase File Quiz 4

Midterm Discussions Here are my solutions BINF 634 FALL 2013 LECTURE 8 Modules and Maps2

Program Discussions Here are my solutions BINF 634 FALL 2013 LECTURE 8 Modules and Maps3

4 Perl Modules - I You can put useful subroutines into modules for future use Only write and debug subroutines once Creating a module: Put subroutines into a file with extension.pm, e.g. MySubs.pm Include " 1; " as last line in file

BINF 634 FALL 2013 LECTURE 8 Modules and Maps5 Perl Modules - II Using a module: Include statement: use Mysubs; note: leave off the.pm from the module name in use statement Then use any subroutine defined in MySubs.pm Modules are included at compile time Example: BeginPerlBioinformatics.pm contains subroutines from Tisdall textbook (see web page)

BINF 634 FALL 2013 LECTURE 8 Modules and Maps6 Perl Modules Where does Perl find the modules? Perl looks in the list of directories in built-in includes paths to standard modules such as strict.pm and automatically includes the current directory You can prepend directories by using the -I switch % perl -I"/mypath/libdir" prog.pl You also can tell Perl where to look as follows: use lib "/mypath"; use MyFASTASubs; # file is: "/mypath/MyFASTASubs.pm" use MyRandomSubs;

% cat RAND.pm # # subroutines for using random numbers # sub random_uniform { my ($lower, $upper) return $lower + rand ($upper - $lower); }; sub random_int { my ($lower, $upper) return int($lower + rand ($upper - $lower + 1)); } sub shuffle_array { my my $length = for (my $i = 0; $i < $length; $i++) { my $j = random_int($i, $length-1); ($a[$i], $a[$j]) = ($a[$j], $a[$i]); } } sub shuffle_string { my ($s) = split "", $s; return join "", } 1; 7BINF 634 FALL 2013 LECTURE 8 Modules and Maps

#!/usr/bin/perl use strict; use warnings; # file: test_myrand.pl use lib 'C:\Documents and Settings\Owner\workspace\binf634_book_examples'; use RAND; my $dna = "AAAAAAAATTTTTTTTTTGGGGGGGGCCCCCCCCC"; print "$dna\n"; $dna = shuffle_string($dna); print "$dna\n"; % test_myrand.pl AAAAAAAATTTTTTTTTTGGGGGGGGCCCCCCCCC GCTACGCAGAGGTTAGTCATCTCGACATACTGCTT % test_myrand.pl AAAAAAAATTTTTTTTTTGGGGGGGGCCCCCCCCC TCTTCGGAACCTACGACGTTCTATACAATGCTGGG 8BINF 634 FALL 2013 LECTURE 8 Modules and Maps

9 use Getopt::Std; Suppose prog.pl contains the statements: use Getopt::Std; my %opts = (); # a hash to hold the options getopts('bnf:', \%opts); # -b & -n are boolean flags # -f takes an argument (indicated by ":") Hash keys will be x (where x is the switch name) with key values the value of the argument or 1 if no argument is specified. The following set %opts = (n=>1, f=>"infile") % prog.pl -f infile -n % prog.pl -nf infile % prog.pl -n -f infile

#!/usr/bin/perl use strict; # file: opt.pl use Getopt::Std; my %options=(); # empty hash of options # "d:" means d takes an argument getopts("od:fF",\%options); print "-o $options{o}\n" if defined $options{o}; print "-d $options{d}\n" if defined $options{d}; print "-f $options{f}\n" if defined $options{f}; print "-F $options{F}\n" if defined $options{F}; print "Unprocessed by Getopt::Std:\n" foreach { print "$_\n"; } % opt.pl -Fd5 -o 123 foo -o 1 -d 5 -F 1 Unprocessed by Getopt::Std: 123 foo 10BINF 634 FALL 2013 LECTURE 8 Modules and Maps processing of options stopped when it saw 123

BINF 634 FALL 2013 LECTURE 8 Modules and Maps11 Restriction Maps (Tisdall Ch. 9) Restriction Enzymes are “chemical scissors” that cut DNA in specific places specified by short sequence patterns HindII binds and cuts at GTY^RAC Y = C or T (pyrimidines) R = A or G (purines) ^ indicates cleave site A Restriction Map of a DNA sequence shows all the positions at which a given Restriction Enzyme will cut Several hundred restriction enzymes are known Database REBASE:

REBASE version 104 bionet.104 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= REBASE, The Restriction Enzyme Database Copyright (c) Dr. Richard J. Roberts, All rights reserved. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rich Roberts Mar AaaI (XmaIII) C^GGCCG AacI (BamHI) GGATCC AaeI (BamHI) GGATCC AagI (ClaI) AT^CGAT AaqI (ApaLI) GTGCAC AarI CACCTGCNNNN^ AarI ^NNNNNNNNGCAGGTG AatI (StuI) AGG^CCT AatII GACGT^C AauI (Bsp1407I) T^GTACA AbaI (BclI) T^GATCA AbeI (BbvCI) CC^TCAGC AbeI (BbvCI) GC^TGAGG AbrI (XhoI) C^TCGAG AcaI (AsuII) TTCGAA AcaII (BamHI) GGATCC AcaIII (MstI) TGCGCA AcaIV (HaeIII) GGCC AccI GT^MKAC # The IUB ambiguity codes R = G or A Y = C or T M = A or C K = G or T S = G or C W = A or T B = not A (C or G or T) D = not C (A or G or T) H = not G (A or C or T) V = not T (A or C or G) N = A or C or G or T 12 BINF 634 FALL 2013 LECTURE 8 Modules and Maps

13 Restriction Maps Problem: Given a DNA sequence and the name of a restriction enzyme in REBASE Find all location of the restriction sites on the DNA Pseudocode: read in rebase data from file “bionet” create hash with key = enzyme name and value = regular expression for each user query (in the form of an enzyme name) if query is defined in the hash get positions that match regular expr in DNA report positions, if any }

BINF 634 FALL 2013 LECTURE 8 Modules and Maps14 Range Operator start.. end In list context, returns a slice of an array: print 20]\n"; # print items 10 thru 20 In a scalar context, return true if $. (line number) is currently between start and end values: # print out first 20 lines of file: while ( ) { print if (1.. 20); } # print out up to first line containing "foo": while ( ) { print if (1.. /foo/); }

BINF 634 FALL 2013 LECTURE 8 Modules and Maps15 Finding matches with regular expressions Reminder: If we use the global modifier g, then pos($string) returns position after the match: $string = "ATCGCATGGAA"; $string =~ /T.G/g; print "$& ends at position ", pos($string)-1, "\n"; $string =~ /T.G/g; print "$& ends at position ", pos($string)-1, "\n"; Output: TCG ends at position 3 TGG ends at position 8

# Example 9-2 Subroutine to parse a REBASE datafile # parseREBASE-Parse REBASE bionet file # # A subroutine to return a hash where # key = restriction enzyme name # value = blank-separated recognition site and regular expression sub parseREBASE { my($rebasefile) use BeginPerlBioinfo; # see Chapter 6 about this module # Declare variables = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; # Read in the REBASE file # note: incorrect on p. 191 my $rebase_filehandle = open_file($rebasefile); 16BINF 634 FALL 2013 LECTURE 8 Modules and Maps

while( ) { # note: incorrect on p. 191 # Discard header lines next if ( 1.. /Rich Roberts/ ); # Discard blank lines next if /^\s*$/; # Split the fields = split( " ", $_); # Get and store the name and the recognition site # grab just the first and last fields $name = $site = # Translate the recognition sites to regular expressions $regexp = IUB_to_regexp($site); # Store the data into the hash $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; } 17BINF 634 FALL 2013 LECTURE 8 Modules and Maps

############################################################# # Subroutine IUB_to_regexp # # Translate IUB ambiguity codes to regular expressions # # The IUB ambiguity codes # R = G or A Y = C or T M = A or C K = G or T # S = G or C W = A or T B = not A (C or G or T) # D = not C (A or G or T) H = not G (A or C or T) # V = not T (A or C or G) N = A or C or G or T sub IUB_to_regexp { my($iub) my $regular_expression = ''; my %iub2character_class = ( A => 'A', C => 'C', G => 'G', T => 'T', R => '[GA]', Y => '[CT]', M => '[AC]', K => '[GT]', S => '[GC]', W => '[AT]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); # Remove the ^ signs $iub =~ s/\^//g; # Translate the iub sequence for (my $i = 0; $i < length($iub); ++$i ){ $regular_expression.= $iub2character_class{substr($iub,$i,1)}; } return $regular_expression; } 18BINF 634 FALL 2013 LECTURE 8 Modules and Maps

Sample Main Program to test parseREBASE subroutine: #!/usr/bin/perl use warnings; use strict; my %rebasehash = (); %rebasehash = parseREBASE("bionet"); for my $key (sort keys %rebasehash) { my ($site, $regexp) = split " ", $rebasehash{$key}; print "enzyme = $key site = $site regexp = $regexp\n"; } sub parseREBASE {... } Output: enzyme = AaaI site = C^GGCCG regexp = CGGCCG enzyme = AacI site = GGATCC regexp = GGATCC enzyme = AaeI site = GGATCC regexp = GGATCC enzyme = AagI site = AT^CGAT regexp = ATCGAT enzyme = AaqI site = GTGCAC regexp = GTGCAC enzyme = AarI site = ^NNNNNNNNGCAGGTG regexp = [ACGT][ACGT][ACGT][ACGT][ACGT][ACGT][ACGT][ACGT]GCAGGTG 19BINF 634 FALL 2013 LECTURE 8 Modules and Maps

20 Restriction Maps Problem: Given a DNA sequence and the name of a restriction enzyme in REBASE Find all location of the restriction sites on the DNA Pseudocode: read in rebase data from file “bionet” create hash with key = enzyme name and value = regular expression for each user query (in the form of an enzyme name) if query is defined in the hash get positions that match regular expr in DNA report positions, if any }

##################################################################### # Subroutine match_positions # Find locations of a match of a regular expression in a string # Return an array of positions where the regular expression # appears in the string sub match_positions { my($regexp, $sequence) = ( ); # Determine positions of regular expression matches while ( $sequence =~ /$regexp/ig ) { pos($sequence) - length($&) + 1; } } # sample main program: my $dna = “ATTGATAAGTCA”; = match_positions(“T[GC]A”, $dna); print exit; Output: BINF 634 FALL 2013 LECTURE 8 Modules and Maps N.B. – pos returns the position relative to 0 immediately after the match. So the match gives us the position at the start of the match relative to 1.

BINF 634 FALL 2013 LECTURE 8 Modules and Maps22 Restriction Maps Problem: Given a DNA sequence and the name of a restriction enzyme in REBASE Find all location of the restriction sites on the DNA Pseudocode: read in rebase data from file “bionet” create hash with key = enzyme name and value = regular expression for each user query (in the form of an enzyme name) if query is defined in the hash get positions that match regular expr in DNA report positions, if any }

#!/usr/bin/perl -w # Make restriction map from user queries on names of # restriction enzymes use strict; use warnings; use BeginPerlBioinfo; # see Chapter 6 about this module # Declare and initialize variables my %rebase_hash = ( ); = ( ); my $query = ''; my $dna = ''; my $recognition_site = ''; my $regexp = ''; = ( ); # Read in the file = get_file_data("sample.dna"); # Extract the DNA sequence data from the file "sample.dna" $dna = # Get the REBASE data into a hash, from file "bionet" %rebase_hash = parseREBASE("bionet"); 23BINF 634 FALL 2013 LECTURE 8 Modules and Maps

# Prompt user for restriction enzyme names, create restriction map do { print "Search for what restriction site (or quit)?: "; $query = ; chomp $query; # Exit if empty query if ($query =~ /^\s*$/ ) { exit;} # Perform the search in the DNA sequence if ( exists $rebase_hash{$query} ) { ($recognition_site, $regexp) = split " ", $rebase_hash{$query}; # Create the restriction = match_positions($regexp, $dna); # Report the restriction map to the user if { print "Searching for $query $recognition_site $regexp\n"; print "A restriction site for $query at locations:\n"; print join(" "\n"; } else { print "A restriction enzyme $query is not in the DNA:\n"; } print "\n"; } until ( $query =~ /quit/ ); exit; 24BINF 634 FALL 2013 LECTURE 8 Modules and Maps

25 % rebase.pl Search for what restriction site (or quit)?: AceI Searching for AceI G^CWGC GC[AT]GC A restriction site for AceI at locations: Search for what restriction site (or quit)?: AceII A restriction enzyme AceII is not in the DNA: Search for what restriction site (or quit)?: Acc36I Searching for Acc36I ^NNNNNNNNGCAGGT [ACGT][ACGT][ACGT][ACGT][ACGT][ACGT][ACGT][ACGT]GCAGGT A restriction site for Acc36I at locations: 166 Search for what restriction site for (or quit)?: quit % The Program in Action!!

BINF 634 FALL 2013 LECTURE 8 Modules and Maps26 HW Read Tisdall Appendix Chapter 9 and Appendix B