1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2009 By Eyal Privman

Slides:



Advertisements
Similar presentations
INTRODUCTION TO BIOPERL Gautier Sarah & Gaëtan Droc.
Advertisements

Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
COMP234 Perl Printing Special Quotes File Handling.
Perl Programming: Developing Key Tools for Bioinformatics An Informative Look Behind the Importance of Programming Skills and Brief Tutorial on Getting.
12.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
4.1 Controls: Ifs and Loops. 4.2 Controls: if ? Controls allow non-sequential execution of commands, and responding to different conditions else { print.
4ex.1 More loops. 4ex.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) {
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
5.1 Previously on... PERL course (let ’ s practice some more loops)
1.1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman and Dudu.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
Getting Started with Perl (and Excel) Biophysics 101 September 17, 2003 Griffin Weber (With material from Jon Radoff and Ivan Ovcharenko)
13.1 Wrapping up Running Other Programs 13.3 You may run programs using the system function: $exitValue = system("blastall.exe..."); if ($exitValue!=0)
1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 By Eyal Privman and Dudu.
1ex.1 Perl Programming for Biology Exercise 1 The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman.
4.1 Revision. 4.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n"; my $number.
10.1 Sorting and Modules בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
12ex.1. 12ex.2 The BioPerl project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science.
Bioperl modules.
1.1 Perl Programming for Biology G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2009 David Burstein and Ofir Cohen.
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
3.1 Ifs and Loops. 3.2 Revision: variables Scalar variables can store scalar values: Variable declaration my ($priority); Numerical assignment $priority.
4.1 More loops. 4.2 Loops Commands inside a loop are executed repeatedly (iteratively): my $num=0; print "Guess a number.\n"; while ($num != 31) { $num.
2.1 Lists and Arrays Summary of 1 st lesson Single quoted and double quoted strings Backslash ( \ ) – the escape character: \t \n Operators:
2ex.1 Lists and Arrays. 2ex.2 Comments on exercises Always run your script with “ perl -w ” and take care of all warnings  submitted scripts should not.
3ex.1 Note: use strict on the first line Because of a bug in the Perl Express debugger you have to put “use strict;” on the first line of your scripts.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Introduction to programming in MATLAB MATLAB can be thought of as an super-powerful graphing calculator Remember the TI-83 from calculus? With many more.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Python programs How can I run a program? Input and output.
BioPerl - documentation Bioperl tutorial tutorial Mastering Perl for Bioinformatics: Introduction.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
Introduction to Perl & BioPerl Dr G. P. S. Raghava Bioinformatics Centre Bioinformatics Centre IMTECH, Chandigarh Web:
BioPython Workshop Gershon Celniker Tel Aviv University.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Computer Programming for Biologists Oct 30 th – Dec 11 th, 2014 Karsten Hokamp  Fill out.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
13.1 בשבועות הקרובים יתקיים סקר ההוראה (באתר מידע אישי לתלמיד)באתר מידע אישי לתלמיד סקר הוראה.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
12.1 Running Other Programs And CGI Scripts Please fill the teaching survey at: I read it closely, and I.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Why? – Examples Speaking Computer-ise – How – What – Environment (windows) Basic Instructions – Declare – Conditional – Loop – Input Write a quiz game.
Introduction to Perl Yupu Liang cbio at MSKCC
Books. Perl Perl (Practical Extraction and Report Language) by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl 5 was released.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introduction to Perl “Practical Extraction and Report Language” “Pathologically Eclectic Rubbish Lister”
Installing BioPerl – how to add a repository to the PPM Start  All Programs  Active Perl…  Perl Package manager (If you don’t see a screen like the.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
BioPerl Ketan Mane SLIS, IU. BioPerl Perl and now BioPerl -- Why ??? Availability Advantages for Bioinformatics.
1.1 Perl Programming for Biology G.S. Wise Faculty of Life Science Tel Aviv University, Israel October 2012 Eli Levy Karin and Haim Ashkenazy
5.1 Revision: Ifs and Loops. 5.2 if, elsif, else It’s convenient to test several conditions in one if structure: print "Please enter your grades average:\n";
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Perl Lab #11 Intro to Perl Debbie Bartlett. Perl Lab #1 2 Perl Practical Extraction and Report Language –or- Pathologically Eclectic Rubbish Lister Created.
2.1 Scalar data - revision numeric e-14 ( = 6.35 × )‏ operators: + (addition) - (subtraction) * (multiplication) / (division)
2.1 Lesson 2: Scalar Functions and Arrays “Perl programming is an empirical science!” - Larry Wall.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
Perl Subroutines User Input Perl on linux Forks and Pipes.
Chris Knight Beginners’ workshop.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Modules and BioPerl.
Perl for Bioinformatics
T. Jumana Abu Shmais – AOU - Riyadh
Presentation transcript:

1 Perl Programming for Biology The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2009 By Eyal Privman

2 Why biologists need computers? Collecting and managing data Collecting and managing data Searching databases Searching databases Interpreting data Interpreting data Protein function prediction - heidelberg.de/ Protein function prediction - heidelberg.de/ heidelberg.de/ heidelberg.de/ Gene expression - Gene expression - Browsing genomes - Browsing genomes -

3 Why biologists need to program? (or: why are you here?)

4 Why biologists need to program? A real life example Proto-oncogene activation by retrovirus insertion c-Myc: an example for transformation caused by over- or misexpression (In w.t. cells c-Myc is expressed only during the G 1 phase).

5 A real life example Shmulik >tumor1 TAGGAAGACTGCGGTAAGTCGTGATCTGAGCGGTTCCGTTACAGCTGCTA CCCTCGGCGGGGAGAGGGAAGACGCCCTGCACCCAGTGCTG... >tumor157 Run BLAST: Click “Reformat these results”, choose “Show alignment as plain text”, click “view report” and save it to a text file: Score E Sequences producing significant alignments: (bits) Value ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic e-45 ref|NT_ |Mm6_39393_34 Mus musculus chromosome 6 genomic c ref|NT_ |Mm9_39517_34 Mus musculus chromosome 9 genomic c ref|NT_ |Mm8_39502_34 Mus musculus chromosome 8 genomic c ref|NT_ |Mm3_39274_34 Mus musculus chromosome 3 genomic c ref|NT_ |Mm2_39247_34 Mus musculus chromosome 2 genomic c >ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic contig, strain C57BL/6J Length = Score = 186 bits (94), Expect = 1e-45 Identities = 100/102 (98%) Strand = Plus / Plus Query: 1 taggaagactgcggtaagtcgtgatctgagcggttccgttacagctgctaccctcggcgg 60 ||||||||||||||| ||||||||||||||||||||||| |||||||||||||||||||| Sbjct: taggaagactgcggtgagtcgtgatctgagcggttccgtaacagctgctaccctcggcgg

6 A Perl script can do it for you Shmulik writes a simple Perl script to parse blast results and find all hits that are in the myc locus, or up to 10kb from it: Use the BioPerl package SearchIO Open and read file “mice.blast” Iteration – for each blast result: If we hit the genomic sequence “Mm15_39661_34” in the coordinates of the Myc locus (23,198, ,223,004) then print this hit (hit number and position in locus) We’ll get back to this later…

7 What is Perl ? Perl was created by Larry Wall. (read his forward to the book “Learning Perl”) Perl = Practical Extraction and Report Language (or: Pathologically Eclectic Rubbish Lister)‏forward to the book “Learning Perl” Perl is an Open Source project Perl is a cross-platform programming language.

8 Why Perl ? Perl is a popular programming language, especially for bioinformatics Perl allows a rapid development cycle Perl is strong in text manipulation Perl can easily handle files and directories Perl can easily run other programs Perl doesn’t impose arbitrary limitations (e.g. memory)‏

9 Perl & biology BioPerl: “An international association of developers of open source Perl tools for bioinformatics, genomics and other fields in life science research” BioPerl: “An international association of developers of open source Perl tools for bioinformatics, genomics and other fields in life science research” Many smaller projects, and millions of little pieces of biological Perl code (which you should use as references – google and find them!)‏ Many smaller projects, and millions of little pieces of biological Perl code (which you should use as references – google and find them!)‏

10 This workshop No experience in programming is assumed No experience in programming is assumed Hands-on practice Hands-on practice Programming tasks for molecular biology Programming tasks for molecular biology Read and manipulate sequence files Read and manipulate sequence files Extract and analyze desired information from large files Extract and analyze desired information from large files For your convenience, download this presentation from: Save it on your computer (choose “Save”, not “Open”) It will be useful to copy-paste lines from my slides to your scripts…

11 Further study... I cannot teach a full Perl course in 3 hours You could read a book: You could read a book: Beginning Perl for BioinformaticsBeginning Perl for Bioinformatics Or some of the great Perl tutorials on the internet … (Google!) Or take the full Perl course! (semester beit) Text file handling Text file handling Using complex data structure Using complex data structure Using BioPerl tools for common tasks such as: Using BioPerl tools for common tasks such as: Reading/writing sequence files in different formats Reading/writing sequence files in different formats Reverse-complementing and translating DNA sequences Reverse-complementing and translating DNA sequences Analyzing BLAST results, Genbank records, Swiss-Prot Analyzing BLAST results, Genbank records, Swiss-Prot And more … And more …

12 Free Perl software (for Windows) Getting Perl: (Follow the links to download, and choose “MSI” for windows) Editor & debugger:

13 Perl documentation There ’ s lots of Perl materials on the web: Use the central Perl web site: Look in “ Online Documentation ”, “ Manual Pages ”, “ Functions ”, etc. Use the central Perl web site: Look in “ Online Documentation ”, “ Manual Pages ”, “ Functions ”, etc. Perl-Express: In the “ Directory Window ” click the “ Perl Function ” button (it looks like a purple book), and type the name of a Perl function Perl-Express: In the “ Directory Window ” click the “ Perl Function ” button (it looks like a purple book), and type the name of a Perl function Or – Google what you ’ re looking for! Or – Google what you ’ re looking for! e.g. “Perl”, “reverse” and “complement”

14 Running Perl at the DOS command prompt Traditionally, Perl scripts are run from a command prompt (a DOS window). (Start it by clicking: Start  Accessories  Command Prompt or: Start  Run…  cmd ) Running a Perl script perl -w YOUR_SCRIPT_NAME (To check if Perl is installed in your computer use the ‘perl -v’ command)

15 Running Perl at the DOS command prompt Common DOS commands: d: change to other drive (d in this case) cd my_dir change directory cd.. move one directory up dir list files (dir /p to view it page by page) help list all dos commands help dir get help on a dos command

16 The Perl-Express editor

17 A first Perl script print "Hello world!"; A Perl statement must end with a semicolon “ ; ” The print function outputs some information to the terminal screen Try it yourself! Use Perl Express to write the script in a file named “ hello.pl ” (Save it in D:\perl_workshop) Run it!

18 Output tab Output of run Perl Express – running a script Run the script Warnings and errors

19 Data TypeDescription scalarA single number or string value "hello" arrayAn ordered list of scalar values (9,-15,3.5) Data types

20 1. Scalar Data

21 Scalar values A scalar is either a number: e4 (= 1.3 Ⅹ 10 4 ) or a string: print "hello world"; hello world print "hello\tworld"; helloworld print "hello\nworld"; hello world

22 Variables Variable declaration: my $priority; Note: Everything in Perl is case sensitive! i.e. $priority is different from $Priority Scalar variables can store scalar values: Numerical assignment: $priority = 1; String assignment: $priority = "high"; Copy the value of variable $b into $a: $a = $b; Important: To make Perl check the correctness of your variable names – always include: use strict; as the first line of all scripts!

23 Interpolating variables into strings $a = 9.5; print "a is $a!\n"; a is 9.5!

24 Built-in Perl functions: The length function The length function returns the length of a string: print length("length"); 6

25 The substr function The substr function extracts a substring out of a string. It receives 3 arguments: substr(EXPR,OFFSET,LENGTH) For example: $str = "university"; $sub = substr ($str, 3, 5); $sub is now "versi", and $str remains unchanged. Note: If length is omitted, everything to the end of the string is returned. You can use variables as the offset and length parameters. The substr function can do a lot more, google it and you will see…

26 Reading input allows us to get input from the user: print "What is your name?\n"; my $name = ; print "Hello $name!"; Here is a test run: What is your name? Eyal Hello Eyal !

27 Reading input Use the chomp function to remove the “new-line” from the end of the string: print "What is your name?\n"; my $name = ; chomp $name; # Remove the new-line print "Hello $name!"; What is your name? Eyal Hello Eyal!

28 Perl Express – entering input Click “ Std. Input ”

29 Click “ i/o ” Perl Express – entering input

30 Go back to “ Std. Output ” Perl Express – entering input Enter input

31 Exercise 1 1.Write a script that prints "goodbye world!" 2.Assign your name into a variable and then print this variable 3.Read an input line and print it 4.Read a line and print the first 5 letters (use substr) * Can you print the last 5 letters?

32 2. Lists and arrays

33 Lists and arrays A list is an ordered set of scalar values: (1,2,3,4) An array is a variable that holds a list: = (1,2,3,4); You can access an individual array element: print $a[1];2 $a[0] = is now: (6,2,3,4) 3210 scalar 4scalar 3scalar 2scalar 1

34 Reading and printing arrays You can read lines from the standard input in list context: = will store all the lines entered until the user enters ctrl-z Note: ctrl-z does not work in Perl Express… use the Command Prompt to run your script.

35 Class exercise 2 Write the following scripts: 1.Read several input lines (remember to use ctrl-z) and print the 3 rd line 2.Read a number n from the first line of input, and then read the rest of the lines and print the nth line

36 3. Controls: Ifs and Loops

37 Controls: if ? Controls allow non-sequential execution of commands, and responding to different conditions else { print "Here is your beer!\n"; } print "How old are you?\n"; my $age = ; # Read number if ($age < 18) { print "How about some orange juice?\n"; }

38 Comparison operators StringNumericComparison eq==Equal ne!=Not equal lt<Less than gt>Greater than le<= Less than or equal to ge>= Greater than or equal to if ($age == 18)... if ($name eq "Yossi")... if ($name ne "Yossi")... if ($name lt "n")...

39 Controls: Loops Loops allow iterating over an array of inputs, and performing some actions for each input line: foreach $line { my $len = length $line; print "$len,"; } 4,11,8,29,5,

40 Controls: Loops Let’s say we want the average G-C content for a file of 300 sequences… foreach $line { $sum = $sum + $GC; $count = $count + 1; } print "average = "; print ($sum/$count);

41 Class exercise 3 1.Read several protein sequences in FASTA format (see for example the file “EHD.fasta” in the zip file from the workshop webpage), and print only their header lines (lines that start with “>”) 2*.Now print the last 20 amino acid of each sequence

42 Controls: Loops We can also repeat a loop until something happens: while (length $name > 1) { $name = ; chomp $name; print "Hello $name!\n"; }

43 4. BioPerl

44 A module or a package is a collection of functions, usually stored in a separate file with a “.pm ” suffix (Perl Module). The functions of a module deal with a well-defined task. e.g. The file FileHandle.pm may contain a module of functions that read and write files, such as open_file, read_directory, etc. In order to write a script that uses a module add a “ use ” line at the beginning of the script: use FileHandle; Using modules

45 Installing modules from the internet The best place to search for Perl modules that can make your life easier is: The easiest way to download and install a module is to use the Perl Package Manager (part of the ActivePerl installation) Note: ppm installs the packages under the directory “site\lib\” in the ActivePerl directory. You can put packages there manually if you would like to download them yourself from the net, instead of using ppm. Choose “ View all packages ” Enter module name

46 A very extensive collection of modules with functions to handle all sorts of biological data: –Genbank files –DNA and protein sequences –BLAST results –Phylogenetic trees BioPerl modules are called Bio::XXXXXX You can see all available modules in: with documentation and examples for how to use them (Click BioPerl Module Documentation)BioPerl Module Documentation BioPerl

47 In order to write a script that uses a module add a “ use ” line at the beginning of the script: use FileHandle; Using modules

48 We can use the module Bio::SearchIO to read a text file with blast results: use Bio::SearchIO; Use the new command to create a Bio::SearchIO object and open the results file: my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-file' => 'mice.blast'); There are three levels to blast results: $result = $blast_report->next_result (a blast query) $hit = $result->next_hit (a blast hit) $hsp = $hit->next_hsp (a “ high scoring pair ” – an alignment of a certain region) BioPerl: reading blast output

49 Why biologists need to program? A real life example Proto-oncogene activation by retrovirus insertion c-Myc: an example for transformation caused by over- or misexpression (In w.t. cells c-Myc is expressed only during the G 1 phase).

50 A real life example Shmulik >tumor1 TAGGAAGACTGCGGTAAGTCGTGATCTGAGCGGTTCCGTTACAGCTGCTA CCCTCGGCGGGGAGAGGGAAGACGCCCTGCACCCAGTGCTG... >tumor157 Run BLAST: Click “Reformat these results”, choose “Show alignment as plain text”, click “view report” and save it to a text file: Score E Sequences producing significant alignments: (bits) Value ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic e-45 ref|NT_ |Mm6_39393_34 Mus musculus chromosome 6 genomic c ref|NT_ |Mm9_39517_34 Mus musculus chromosome 9 genomic c ref|NT_ |Mm8_39502_34 Mus musculus chromosome 8 genomic c ref|NT_ |Mm3_39274_34 Mus musculus chromosome 3 genomic c ref|NT_ |Mm2_39247_34 Mus musculus chromosome 2 genomic c >ref|NT_ |Mm15_39661_34 Mus musculus chromosome 15 genomic contig, strain C57BL/6J Length = Score = 186 bits (94), Expect = 1e-45 Identities = 100/102 (98%) Strand = Plus / Plus Query: 1 taggaagactgcggtaagtcgtgatctgagcggttccgttacagctgctaccctcggcgg 60 ||||||||||||||| ||||||||||||||||||||||| |||||||||||||||||||| Sbjct: taggaagactgcggtgagtcgtgatctgagcggttccgtaacagctgctaccctcggcgg

51 A Perl script can do it for you Shmulik writes a simple Perl script to parse blast results and find all hits that are in the myc locus, or up to 10kb from it: Use the BioPerl package SearchIO Open and read file “mice.blast” Iteration – for each blast result: If we hit the genomic sequence “Mm15_39661_34” in the coordinates of the Myc locus (23,198, ,223,004) then print this hit (hit number and position in locus)

52 A Perl script can do it for you use Bio::SearchIO; my $blast_report = new Bio::SearchIO ('-format'=>'blast', '-file' =>'mice.blast'); while (my $result = $blast_report->next_result) { print "Checking query ", $result->query_name, "...\n"; my $hit = $result->next_hit(); my $hsp = $hit->next_hsp(); if ($hit->name() =~ m/Mm15_39661_34/ && $hsp->hit->start() > && $hsp->hit->end() name(); print " (at position ", $hsp->hit->start(), ")\n"; } } Use the BioPerl package SearchIOOpen file “mice.blast” Iterate over all blast results For each blast hit – ask if we hit the genomic sequence “Mm15_39661_34” in the coordinates of the Myc locus 23,198, ,223,004 If so – print hit name and position

53 A Perl script can do it for you Checking query tumor1... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor2... Checking query tumor3... Checking query tumor4... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor5... Checking query tumor6... Checking query tumor7... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor8... Checking query tumor9... Checking query tumor10... Checking query tumor11... hit ref|NT_ |Mm15_39661_34 (at position ) Checking query tumor12...

54 Class exercise 4 1.Change the script “ ex4.pl ” : limit the search to the 2 nd and 3 rd exons, which are in coordinates: Now print just hits with e-value smaller than

55 The Bio::SeqIO module allows reading/writing sequences from/to files, using many file formats (fasta, Genbank, EMBL … ) use Bio::SeqIO; $in = new Bio::SeqIO("-file" => "inputFileName", "-format" => "embl"); $out = new Bio::SeqIO("-file" => ">outputFileName", "-format" => "fasta"); while ( my $seqObj = $in->next_seq() ) { $out->write_seq($seqObj); } BioPerl: the SeqIO module

56 The Bio::SeqIO function “ next_seq ” returns an object of the Bio::Seq module. This module provides functions like id, accession, length and subseq (read about them in the documentation!): use Bio::SeqIO; $in = new Bio::SeqIO("-file" => "inputfilename", "-format" => "fasta"); while ( my $seqObj = $in->next_seq() ) { print "Sequence ",$seqObj->id(),"\n"; print "First 10 bases "; print $seqObj->subseq(1,10),"\n"; } And other functions such as: length, revcom, translate, etc. BioPerl: the Seq module

57 Class exercise 5 1.Change the script “ ex5.1.pl ” : use it to convert the file “ pp2c.gb ” from Genbank format to fasta format (write file “ pp2c.fasta ” ) 2.Add to the script “ ex5.2.pl ” : print the sequence lengths 3.* Add to “ ex5.2.pl ” : calculated the average sequence length

58 Thanks for your patience and See you in the full Perl course …