Presentation is loading. Please wait.

Presentation is loading. Please wait.

More “What Perl can do” With an introduction to BioPerl Ian Donaldson Biotechnology Centre of Oslo MBV 3070.

Similar presentations


Presentation on theme: "More “What Perl can do” With an introduction to BioPerl Ian Donaldson Biotechnology Centre of Oslo MBV 3070."— Presentation transcript:

1 More “What Perl can do” With an introduction to BioPerl Ian Donaldson Biotechnology Centre of Oslo MBV 3070

2 Much of the material in this lecture is from the “Perl” lecture and lab developed for the Canadian Bioinformatics Workshops by Will Hsiao Sohrab Shah Sanja Rogic And released under the Creative Commons license

3 http://creativecommons.org/licenses/by-sa/2.5/

4 More “What can Perl do” So far, we’ve had a very brief introduction to Perl Next, we want to go a little deeper into Use of “strict” Perl regular expressions Modules An introduction to object-oriented Perl and BioPerl

5 strict

6 Effects of “use strict” Requires you to declare variables Warns you about possible typos in variables CorrectIncorrect my $DNA; $DNA = “ATCG”; or my $DNA = “ATCG”; $DNA = “ATCG”; No warningWarning my $DNA = “ATCG”; $DNA =~tr/ATCG/TAGC/ my $DNA = “ATCG”; $DAN =~tr/ATCG/TAGC

7 Why bother “use strict” Enforces some good programming rules Helps to prevent silly errors Makes trouble shooting your program easier Becomes essential as your code becomes longer We will use strict in all the code you see today and in your assignment Bottom line: ALWAYS use strict

8 Exercise 12 Write a program that has one function. Use a variable named “$some_variable” in this function and in the main body of the program. Prove that you can alter the value of $some_variable in the function without changing the value of $some_variable in the the main body of the program. Try it yourself (15 minutes) then check the answer at the end of this lecture.

9 regular expressions

10 What is a Regular Expression? REGEX provides pattern matching ability Tells you whether a string contains a pattern or not (Note: it’s a yes or no question!) ‏ “I have a golden retriever”“Yesterday I saw a big black dog” “My dog ate my homework” “Yes” or “True” “No” or “False” Dog! Human’s best friend “No” since REGEX is case sensitive Regular Expression looking for “dog”

11 Regular expressions are “regular” Look at these names for yeast open reading frame names. YDR0001W YDR4567C YAL0045W YBL0008C While they are all different, they all follow a pattern (or regular expression). 1. Y means yeast 2. some letter between A and L represent a chromosome 3. an ‘R’ or ‘L’ refers to an arm of the chromosome 4. a four digit number refers to an open reading frame 5. A ‘W’ or a ‘C’ refers to either the Watson or Crick strand You can write a regular expression to recognize ALL yeast open reading frame names.

12 Perl REGEX example my $text = “The dog ate my homework”; if ($text =~ m/dog/){ print “The text contains a dog\n”; } =~ m is the binding operator. It says: “does the string on the left contain the pattern on the right?” / dog / is my pattern or regular expression The matching operation results in a true or false answer

13 Regular Expressions in Perl A pattern that match only one string is not very useful! We need symbols to represent classes of characters For example, say you wanted to recognize ‘Dog’ or ‘dog’ as being instances of the same thing REGEX is its own little language inside Perl –Has different syntax and symbols! –Symbols which you have used in perl such as $. { } [ ] have totally different meanings in REGEX

14 REGEX Metacharacters Metacharacters allow a pattern to match different strings –Wildcards are examples of metacharacters –/.og/ will match “dog”, “log”, “tog”, “ og”, etc. –So. Means “any character” –Perl REGEX has much more powerful metacharacters used to represent classes of characters

15 Types of Metacharacters. matches any one character or space except “\n” [ ] denotes a selection of characters and matches ONE of the characters in the selection. What does [ATCG] match? \t, \s, \n match a tab, a space and a newline respectively \w matches any characters in [a-zA-Z0-9] \d matches [0-9] \D matches anything except [0-9]

16 Using metacharacters to build a regular expression YBL3456W /Y[A-L][RL]\d\d\d\d[WC]/ Is this a good pattern for a yeast ORF name? What else does it match? What if the name only has 3 digits?

17 REGEX Quantifiers What if you want to match a character more than once? What if you want to match an mRNA with a polyA tail that is at least 5 – 12 A’s? “ATG……AAAAAAAAAAA”

18 REGEX Quantifiers + matches one or more copies of the previous character * matches zero or more copies of the previous character ? matches zero or one copy of the previous character {min,max} matches a number of copies within the specified range “ATG……AAAAAAAAAAA” /ATG[ATCG]+A{5,12}/

19 REGEX Anchors The previous pattern is not strictly correct because: –It’ll match a string that doesn’t start with ATG –It’ll match a string that doesn’t end with poly A’s Anchors tell REGEX that a pattern must occur at the beginning or at the end of a string

20 REGEX Anchors ^ anchors the pattern to the start of a string $ anchors the pattern to the end of a string /^ATG[ATCG]+A{5,12}$/

21 REGEX is greedy! The revised pattern is still incorrect because –It’ll match a string that has more than 12 A’s at the end quantifiers will try to match as many copies of a sub-pattern as possible! /^ATG[ATCG]+A{5,12}$/ “ATGGCCCGGCCTTTCCCAAAAAAAAAAAA”

22 Curb that Greed! ? after a quantifier prevents REGEX from being greedy /^ATG[ATCG]+?A{5,12}$/ “ATGGCCCGGCCTTTCCGAAAAAAAAAAAA” note this is the second use of the question mark - what is the other use of ? in REGEX?

23 REGEX Capture What if you want to keep the part of a string that matches to your pattern? Use ( ) “memory parentheses” “ATGGCCCGGCCTTTCCGAAAAAAAAAAAA” /^ATG([ATCG]+?)A{5,12}$/

24 REGEX Capture What’s inside the first ( ) is assigned to $1 What’s inside the Second ( ) is $2 and so on So $2 eq “AAAAAAAAAAAA” /^ATG([ATCG]+?)(A{5,12})$/ $1$2

25 REGEX Modifiers Modifiers come after a pattern and affect the entire pattern You have seen //g already which does global matching (/T/g) and global replacement(s/T/U/g) ‏ Other useful modifiers: //imake pattern case insensitive //slet. match newline //mlet ^ and $ (anchors) match next to embedded newline ///eallow the replacement string to be a perl statement

26 REGEX Summary REGEX is its own little language!!! REGEX is one of the main strengths of Perl To learn more: Learning Perl (3 rd ed.) Chapters 7, 8, 9 Programming Perl (3 rd ed.) Chapter 5 Mastering Regular Expression (2 nd ed.) http://www.perl.com/doc/manual/html/pod/perlre.html A good cheat sheet is: http://www.biotek.uio.no/EMBNET/guides/guideRegExp.pdf

27 Exercise 13 In a text file, write out three strings that match the following regular expression /^ATG?C*[ATCG]+?A{3,10}$/ Write a program that reads each string from the text file and checks your answers. Try it yourself (30 min) then look at the answer at the end of this lecture.

28 modules

29 What are Modules a “logical” collection of functions Using modules has the same advantage as using functions; i.e., it simplifies code (makes it modular) and facilitates code reuse Each collection (or module) has its own “name space” Name space: a table containing the names of variables and functions used in your code

30 Why Use Modules? Modules allow you to use others’ code to extend the functionality of your program. There are a lot of Perl modules.

31 Finding out what modules you already have In Perl, each module is a file stored in some directory in your system. The system that this class is using, stores Perl modules (like cgi.pm) in one of two directories C:\bin\Perl\lib C:\bin\Perl\site\lib

32 Finding out what modules you already have To find out where modules are installed, type perl –V at the command prompt To find out what is installed, type perldoc perllocal at the command prompt.

33 Using Modules To use a module, you need to include the line: use modulename; at the beginning of your program. But you already knew that… use strict; use warnings;

34 Where to find modules You can search for modules (and documentation) that may be useful to your particular problem using http://search.cpan.org/ http://search.cpan.org/ CPAN: Comprehensive Perl Archive Network Central repository for Perl modules and more “If it’s written in Perl, and it’s helpful and free, it’s probably on CPAN” http://www.perl.com/CPAN/

35 Exercise 14 Open a web browser Go to http://search.cpan.org/http://search.cpan.org/ Type in “bioperl Tools BLAST” Follow the link to Bio::Tools::Blast Browse through this page and the example code Make a.plx file like this: #bioperl example code use strict; use warnings; #make the bioperl module (class) accessible to your program use Bio::Seq; print"ok - ready to use Bio::Seq"; Does this programme run or return an error?

36 Bioperl Overview The Bioperl project – www.bioperl.orgwww.bioperl.org Comprehensive, well documented set of Perl modules A bioinformatics toolkit for: Format conversion Report processing Data manipulation Sequence analyses and more! Written in object-oriented Perl

37 Bioperl Overview The last exercise most likely did not work (unless you have BioPerl installed) ‏ So let’s install it…

38 How to install modules This class is using the active state version of Perl that comes with a program called ppm (Perl Package Manager) ‏ At the command prompt type >ppm And follow the instructions in the exercise 15

39 How to install modules (without ppm) ‏ If you are not using active state Perl, you you can also install modules from CPAN using: >perl –MCPAN –e “install ‘Some::Module’” Module dependency is taken care of automatically You’ll (usually) need to be root to install a module successfully

40 Exercise 15 Install bioperl 1. At the command line prompt type >ppm 2. Then at the ppm prompt type ppm> search bioperl 3. Then type ppm> install bioperl Try this exercise at home. Installing libraries is not possible at UiO computers.

41 What are objects? Examples of objects in real life: –My car, my dog, my dishwasher… Objects have ATTRIBUTES and METHODS Some attributes of a my dog Fido: Color of fur = brown Height = 20 cm Owner’s Name = Ian Weight = 2 Kg Tail position = up Some methods of my dog Fido: Bark Walk Run Eat Wag tail Fido

42 What is a class? A class is a type of object in the real world: –Cars, dogs, dishwashers… Classes have ATTRIBUTES and METHODS Some attributes of a dog: Color of fur Height Owner’s Name Weight Tail position Some methods of a dog: Bark Walk Run Eat Wag tail The concept of a “dog”

43 So an object is an instance of a class The concept of “dog” class object Fido

44 Objects have unique names called “references” and classes have names too. Dog class object reference Fido Class name

45 All classes have a method called new that is used to create objects. Dog class object Fido Fido = new Dog(); reference

46 A reference to an object can be used to access its properties or methods. Dog class object Fidoprint Fido->bark(); woof

47 A reference to an object can be used to access its properties or methods. Bio::DB: :RefSeq class object $refseq $molecule = Some sequence record $refseq = new Bio::DB::RefSeq; $molecule = $refseq->get_seq_by_acc(“NP_01014”);

48 Putting it all together So now that you understand (sort of) ‏ Classes Objects Attributes and Methods What remains is learning what the different classes are that are available in BioPerl and what you can do with them. For the next exercise, use the documentation at bioperl.org* to figure out what the following code does… *see www.bioperl.org/wiki/HOWTOs andwww.bioperl.org/wiki/HOWTOs doc.bioperl.org (then click on bioperl-live) ‏

49 Exercise 16 #! /usr/local/bin/perl # Create and run a program which creates a Seq object and manipulates it: use Bio::Seq; # initiation of Seq object $seq = Bio::Seq->new('-seq' =>'CGGCGTCTGGAACTCTATTTTAAGAACCTCTCAAAACGAAACAAGC', '-desc' => 'An example', '-display_id' => 'NM_005476', '-accession_number' => '6382074', '-moltype' => 'dna'); # sequence manipulations $aa = $seq -> moltype(); # one of 'dna','rna','protein' $ab = $seq -> subseq(5,10); # part of the sequence as string $ac = $seq -> revcom; # returns an object of the reverse complemented sequence $ac1 = $ac -> seq(); $ad = $seq -> translate; # returns an object of the sequence translation $ad1 = $ad -> seq(); $ae = $seq -> translate(undef,undef,1); # returns an object of the sequence translation (using frame 1) (0,1,2 can be used)‏ $ae1 = $ae -> seq(); print "Molecule Type: $aa\n"; print "Sequence from 5 to 10: $ab\n"; print "Reverse complemented sequence: $ac1\n"; print "Translated sequence: $ad1\n"; print "Translated sequence (using frame 1): $ae1\n"; Make the Bio::Seq class available to my program Create a new Bio::Seq object and initialize some attributes

50 Exercise 17 Check out the code of several examples using BioPerl at: http://bip.weizmann.ac.il/course/prog2/perlBioin fo/

51 More Bioperl modules Bio::SeqIO: Sequence Input/Output –Retrieve sequence records and write to files –Converting sequence records from one format to another Bio::Seq: Manipulating sequences –Get subsequences ( $seq->subseq($start, $end) ) ‏ –Find the length of the object ( $seq->length ) ‏ –Reverse complement a DNA sequence –Translate a DNA sequence ….etc. Bio::Annotation: Annotate a sequence –Assign journal references to a sequence, etc. –Bio::Annotation is associated with an entire sequence record and not just part of a sequence (see also Bio::SeqFeature) ‏

52 Some more Bioperl modules Bio::SeqFeature: Associate feature annotation to a sequence –“features” describe specific locations in the sequence –E.g. 5’ UTR, 3’ UTR, CDS, SNP, etc –Using this object, you can add feature annotations to your sequences –When you parse a genbank file using Bioperl, the “features” of a record are stored as SeqFeature objects Bio::DB::GenBank, GenPept, EMBL and Swissprot: Remote Database Access –You can retrieve a sequence from remote databases (through the Internet) using these objects

53 Even more Bioperl modules Bio::SearchIO: Parse sequence database search reports –Parse BLAST reports (make custom report) ‏ –Parse HMMer, FASTA, SIM4, WABA, etc. –Custom reports can be output to various formats (HTML, Table, etc) ‏ Bio::Tools::Run::StandAloneBLAST: Run Standalone BLAST through perl –By combining this and SearchIO, you can automate and customize BLAST search Bio::Graphics: Draw biological entities (e.g. a gene, an exon, BLAST alignments, etc) ‏

54 Bioperl Summary For Online documentation: –For this workshop: http://doc.bioperl.org/releases/bioperl-1.4/ –Tutorial: http://www.bioperl.org/wiki/HOWTO:Beginners –HOWTOs: http://www.bioperl.org/wiki/HOWTOs –Modules: http://www.bioperl.org/wiki/Category:Core_Modules Literature: –Stajich et al., The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002 Oct;12(10):1611-8. PMID: 12368254 Bioperl mailing list: bioperl-l@bioperl.org –Best way to get help using Bioperl –Very active list (upwards of 10 messages a day) ‏ Use with caution: things change fast and without warning (unless you are on the mailing list…) ‏

55 Perl Documents In-line documentation –POD = plain old documents –Read POD by typing perldoc –E.g. perldoc perl, perldoc Bio::SeqIO On-line documentation –http://www.cpan.org –http://www.perl.com –http:/www.bioperl.org Books –Learning Perl (the best way to learn Perl if you know a bit about programming already) ‏ –Beginning Perl for Bioinformatics (example based way to learn Perl for Bioinformatics) ‏ –Programming Perl (THE Perl reference book – not for the faint of heart) ‏

56 Additional Book References Perl Cookbook 2 nd edition (quick solutions to 80% of what you want to do) ‏ Learning Perl Objects, References & Modules (for people who want to learn objects, references and modules in Perl) ‏ Perl in a Nutshell (an okay quick reference) ‏ Perl CD Bookshelf, Version 4.0 (electronic version of the above books – best value, searchable, and kill fewer trees) ‏ Mastering Perl for Bioinformatics (more example based learning) ‏ CGI Programming with Perl (rather outdated treatment on the subject... Not really recommended) ‏ Perl Graphics Programming (if you want to generate graphics using Perl; side note – Perl is probably not the best tool for generating graphics) ‏

57

58 #!/usr/bin/perl use strict; use warnings; #TASK: demonstrate the use of “my” in setting the #scope of a variable my $some_variable = 100; #body of the main program with the function call print "the value of some_variable is: $some_variable\n"; subroutine1(); print "but here, some_variable is still: $some_variable\n"; #subroutine using $some_variable sub subroutine1{ my $some_variable = 0; print "in subroutine1,some_variable is: $some_variable\n"; } #what happens if you comment out "use strict" and #remove "my" from lines 7 and 16 Answer 12

59 #!/usr/bin/perl use strict; use warnings; #TASK: check your answers to the regex excercise #open input and output files open(IN,"myanswers.txt"); #read the input file line-by-line #for each line test if it matches a regular expression while( ){ chomp; my $is_correct = does_it_match($_); if ($is_correct){ print "$_ is a match\n"; } else{ print "$_ is NOT a match\n"; } #close input file and exit close(IN); exit(); #does it match sub does_it_match{ my($answer) = @_; my $is_correct = 0; if ($answer =~ m/^ATG?C*[ATCG]+?A{3,10}$/){ $is_correct = 1; } return $is_correct; } Answer 13


Download ppt "More “What Perl can do” With an introduction to BioPerl Ian Donaldson Biotechnology Centre of Oslo MBV 3070."

Similar presentations


Ads by Google