Lecture 6.11

Slides:



Advertisements
Similar presentations
Using CAB Abstracts to Search for Articles. Objectives Learn what CAB Abstracts is Know the main features of CAB Abstracts Learn how to conduct searches.
Advertisements

INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
Tutorial 12: Enhancing Excel with Visual Basic for Applications
On line (DNA and amino acid) Sequence Information Lecture 7.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Introduction To System Analysis and Design
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
Master’s course Bioinformatics Data Analysis and Tools Lecture 6: Internet Basics Centre for Integrative Bioinformatics.
CIS101 Introduction to Computing Week 11. Agenda Your questions Copy and Paste Assignment Practice Test JavaScript: Functions and Selection Lesson 06,
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
Bioperl modules.
Russell Taylor Lecturer in Computing & Business Studies.
The Linnaeus Centre for Bioinformatics Short introduction to perl & gff Marcus Ronninger The Linnaeus Centre for Bioinformatics.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
1 OO Java, Baile Herculane, Romania, 2005 OO Java Requirements Specification - Produce highly adaptable teaching materials - 1 st step: collect all useful.
Creating Web Page Forms
2440: 141 Web Site Administration Web Server-Side Programming Professor: Enoch E. Damson.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Introduction to Perl Part III By: Cedric Notredame Adapted from (BT McInnes)
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Introduction To System Analysis and design
INTRODUCTION TO WEB DATABASE PROGRAMMING
A First Program Using C#
M1G Introduction to Programming 2 1. Designing a program.
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
CSC1401: Introductory Programming Steve Cooper
BioPython Workshop Gershon Celniker Tel Aviv University.
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
Public Resources for Bioinformatics Databases : how to find relevant information. Analysis Tools.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
Introduction To System Analysis and Design
Copyright OpenHelix. No use or reproduction without express written consent1.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Introduction of Geoprocessing Topic 7a 4/10/2007.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
More “What Perl can do” With an introduction to BioPerl Ian Donaldson Biotechnology Centre of Oslo MBV 3070.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
Next Back MAP MAP F-1 Management Information Systems for the Information Age Second Canadian Edition Copyright 2004 The McGraw-Hill Companies, Inc. All.
Introduction of Geoprocessing Lecture 9 3/24/2008.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
Lecture 6.11
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Tonga Institute of Higher Education IT 141: Information Systems
Working in the Forms Developer Environment
Modules and BioPerl.
Introduction to Computer Science
Systems Biology Tools for working with BIND data
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Presentation transcript:

Lecture

Lecture 6.12 An Introduction to Perl for Bioinformatics – Part 2 Will Hsiao Simon Fraser University Department of Molecular Biology and Biochemistry

Lecture 6.13

4 Outline Session 1 –Review of the previous day –Perl – historical perspective –Expand on Regular Expression –General Use of Perl –Expand on Perl Functions and introduce Modules –Interactive demo on Modules Break Session 2 –Use of Perl in Bioinformatics –Object Oriented Perl –Bioperl Overview –Interactive demo on Bioperl –Introduction to the Perl assignment

Lecture 6.15 Perl in Bioinformatics Case to point 1: Human Genome data exchange –“How Perl saved the Human Genome Project” Lincoln Stein (1996) Different sequencing centres all have different data format Perl allowed various genome centres to exchange and communicate data with each other Introduces a project to produce modules to process all known forms of biological data (Bioperl)

Lecture 6.16 Perl in Bioinformatics Case to point 2: Ensembl –Much of Ensembl is written in Perl –Ensembl has an extensive Perl API - allow you to access Ensembl database directly from your perl code Case to point 3: GMOD – Generic Model Organism Database – –a joint effort by model organism system databases (worm, fly, corn, rat, yeast, E. coli, arabidopsis, rice) to develop reusable components suitable to be adapted for other biological databases –Written mostly in Java and Perl

Lecture 6.17 Bioinformatics Spectrum MathBiologyComputer ScienceSoftware/ data analysis Perl JAVA C/C++

Lecture 6.18 Perl for bioinformatics in your lab Scripting –automation of repetitive analyses –parse results obtained from other programs Wrapping –accessing others programs (e.g. BLAST) through Perl Web CGI’ing –Develop an interactive web page to your lab –Create web forms

Lecture 6.19 Bioperl Overview The Bioperl project – –Comprehensive, well documented set of Perl modules –Last stable release (developer 1.5.1) –A bioinformatics toolkit for: Format conversion Report processing Data manipulation Sequence analyses and more! –Written in object-oriented Perl

Lecture What are objects? Examples of objects in real life: –Cars, dogs, dishwashers… Objects have ATTRIBUTES and ACTIONS Some attributes of a dog: Color of fur Height Owner’s Name Weight Tail position Some actions of a dog: Bark Walk Run Eat Wag tail

Lecture What are programming objects? Borrows from the concept of real life objects sub dye_fur{} sub eat{ } sub wag_tail{ } $fur_color $weight $tail_position Attributes are stored as variables Actions are implemented as functions A Program Dog Object

Lecture Object Exercise Pair up with your neighbour (2-3 people) In the next 2-3 minutes, come up with as many attributes and actions (aka methods) of a DNA sequence object –E.g. attributes of a DNA sequence object: $length=300, $percent_GC=50% –E.g. methods of a DNA sequence object: Translate_to_protein, remove_polyA_tail Share with the class

Lecture Objects belong to Classes If we take all your suggestions and design a generic template. We can then use this template to create objects. This template is called a Class An “instance of a class” is called an object DNA sequence object 1 DNA sequence object 2 DNA sequence object 3 DNA sequence object 4 DNA Sequence Class

Lecture How do we interact with an object? WOOF POLO Polo is the name of my dog We have to refer to an object by its name

Lecture Interact with a program object $Polo sub dye_fur{} sub eat{ } sub wag_tail{ } $fur_color $weight $tail_position A Program Dog Object WOOF $Polo is the name of a program dog object

Lecture A name is a reference Objects have unique names (labels) You refer to an object by its unique name This unique name that you give to an object is called a “reference”

Lecture Reference in Perl A reference is a scalar (simple) variable that refers to a chunk of memory Stored in that memory can be another variable or an object $array_ref MemoryMy Program 1234

Lecture Reference to an object $var{SwissProt_ID} $var{name} $var{length} $var{souce} $var{%domain_location} sub new{…} sub return_ID{…} sub get_domain{…} A protein object $my_protein Memory $my_protein is called a “reference” to an object (in this case a protein object) To access the attributes and methods of the protein object, you have to go through its reference (i.e. $my_protein) Objects have inherent functions that are useful These inherent functions also have specific names My Program

Lecture Object Oriented Programming What is O-O Programming? –Simple answer: a way to organize code so it interacts in certain ways and follows certain rules –Long answer: to be found in books on O-O Why O-O Programming? –Provides well defined framework –Promotes certain good practice such as code reuse, abstraction, cleaner design, etc. –Does have certain trade-offs (e.g. O-O Perl is usually slower than declarative Perl) –Designing good object classes requires forethoughts and skills

Lecture To use an object 1.Find out which class you need and learn about the class by reading its documentation 2.Make the class available to your program 3.Create a new object of the class 4.Start using the object by modifying its attributes and calling its methods

Lecture Example of using objects Task: –I have a sequence file in Genbank format that I want to convert to EMBL format How many objects do you think we need to accomplish the task above?

Lecture Find the Objects you need Objects that we need: 1.an object that read in sequences from a file 2.an object that represents a sequence record 3.an object that write sequences to a file Sequence File Input Object EMBLGenbank Sequence Object Sequence File Output Object Memory

Lecture Example of using objects Solution: –I remember that Bioperl provides this functionality. So first I’ll take a look at the Bioperl documentation –Website:

Lecture Bioperl Documentation demo Go to the webpage and navigate to SeqIO doc Pay attention to 1) the name of the module 2) Synopsis (code examples) 3) Description 4) list of methods

Lecture 6.125

Lecture Click

Lecture List of Modules by Class Complete List of Modules by Name

Lecture 6.128

Lecture Make the object class available In perl, classes are implemented as object-oriented modules To include a class, simply use the module –E.g. use Bio::SeqIO Note the name of the module is case sensitive By using Bio::SeqIO, my program automatically gain access to any modules included in Bio::SeqIO

Lecture Create an object 1.Make up a name for my object reference (e.g. $seq_input) 2.Create the object by calling the object class’s “new” method –every class has a “constructor” method to create an object of that class –constructor method is often called “new” –use single arrow operator to call methods 3.Assign the object to the object reference 4.You can give the object you are about to create some initial attributes (e.g. the file name of my sequence record, the format of the record) my $seq_inBio::SeqIO->new= ( -file => “myGBrecord”, -format => “genbank”);

Lecture Call object’s methods? We’ve seen the -> (single arrow) operator for calling a class method (e.g. new) The same operator is used for calling an object method –E.g. to ask $seq_in object to get a sequence record from your Genbank sequence file my $seq_record = $seq_in->next_seq();

Lecture Putting it all together #!/usr/bin/perl –w use strict; use Bio::SeqIO; my $seq_in = Bio::SeqIO->new( -file => “myGBrecord”, -format => “genbank”); my $seq_out = Bio::SeqIO->new( -file => “>myEMBLrec”, -format => ‘EMBL’); my $seq_record = $seq_in->next_seq(); $seq_out->write_seq($seq_record); Make the Bio::SeqIO class available to my program Create a new Bio::SeqIO object and initialize some attributes a sequence object

Lecture More Bioperl modules Bio::SeqIO: Sequence Input/Output –Retrieve sequence records and write to files –Converting sequence records from one format to another Bio::Seq: Manipulating sequences –Get subsequences ( $seq->subseq($start, $end) ) –Find the length of the object ( $seq->length ) –Reverse complement a DNA sequence –Translate a DNA sequence ….etc. Bio::Annotation: Annotate a sequence –Assign journal references to a sequence, etc. –Bio::Annotation is associated with an entire sequence record and not just part of a sequence (see also Bio::SeqFeature)

Lecture Some more Bioperl modules Bio::SeqFeature: Associate feature annotation to a sequence –“features” describe specific locations in the sequence –E.g. 5’ UTR, 3’ UTR, CDS, SNP, etc –Using this object, you can add feature annotations to your sequences –When you parse a genbank file using Bioperl, the “features” of a record are stored as SeqFeature objects Bio::DB::GenBank, GenPept, EMBL and Swissprot: Remote Database Access –You can retrieve a sequence from remote databases (through the Internet) using these objects

Lecture Even more Bioperl modules Bio::SearchIO: Parse sequence database search reports –Parse BLAST reports (make custom report) –Parse HMMer, FASTA, SIM4, WABA, etc. –Custom reports can be output to various formats (HTML, Table, etc) Bio::Tools::Run::StandAloneBLAST: Run Standalone BLAST through perl –By combining this and SearchIO, you can automate and customize BLAST search Bio::Graphics : Draw biological entities (e.g. a gene, an exon, BLAST alignments, etc)

Lecture Bioperl Summary For Online documentation: –For this workshop: –Tutorial: –HOWTOs: –Modules: Literature: –Stajich et al., The Bioperl toolkit: Perl modules for the life sciences. Genome Res Oct;12(10): PMID: Bioperl mailing list: –Best way to get help using Bioperl –Very active list (upwards of 10 messages a day) Use with caution: things change fast and without warning (unless you are on the mailing list…)

Lecture Interactive demo on Bioperl Open your laptop! Open a terminal window Type cd ~/perl_two Type gedit./bioperl_demo.pl& Let’s go over the example together

Lecture Summary for Session 2 Perl is a popular language in bioinformatics because: –it handles text well –It has great user base and support (e.g. Bioperl) Bioperl is a large collection of object oriented perl modules for many biological data analyses an object is a collection of attributes and methods You have to access an object through its reference a reference is a name

Lecture Perl Documents In-line documentation –POD = plain old documents –Read POD by typing perldoc –E.g. perldoc perl, perldoc Bio::SeqIO On-line documentation – – – Books –Learning Perl (the best way to learn Perl if you know a bit about programming already) –Beginning Perl for Bioinformatics (example based way to learn Perl for Bioinformatics) –Programming Perl (THE Perl reference book – not for the faint of heart)

Lecture Additional Book References Perl Cookbook 2 nd edition (quick solutions to 80% of what you want to do) Learning Perl Objects, References & Modules (for people who want to learn objects, references and modules in Perl) Perl in a Nutshell (an okay quick reference) Perl CD Bookshelf, Version 4.0 (electronic version of the above books – best value, searchable, and kill fewer trees) Mastering Perl for Bioinformatics (more example based learning) CGI Programming with Perl (rather outdated treatment on the subject... Not really recommended) Perl Graphics Programming (if you want to generate graphics using Perl; side note – Perl is probably not the best tool for generating graphics)

Lecture Introduction to the Assignment Part A Goals: –To convert passive knowledge to active skills –To write some simple perl programs by yourself Consists of 2 modules –Write a program to convert the temperature from F to C –Write a program to count the frequencies of bases in a sequence (sequence MAN1.fasta can be downloaded from Day6 wiki)

Lecture Introduction to the Assignment Part B Goals: –To see the power of Perl in bioinformatics –To see how some common bioinformatics tasks are done using Perl Consists of 3 modules –Download E. coli O157:H7 proteins in FASTA format –Use Regular Expression to find a protein motif –Run BLAST on all proteins in the proteome (>5000 BLAST runs)

Lecture Introduction to the Assignment Part B Most of the code is given to you, you just have to modify them (in total, no more than 15 lines of new code!!) You are not expected to know everything in the scripts. It takes time to learn a new language TAs and your CS team mates will help you, don’t wait until last minute to ask for help Remember, you still have to hand in your own version of the assignment! No copying!

Lecture Acknowledgements Thanks to Sohrab Shah and Sanja Rojic (CS, UBC) for a wonderful collaborative work on the lecture/lab material Some ideas of this lecture is borrowed from Lincoln Stein’s workshop (