Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist.

Similar presentations


Presentation on theme: "Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist."— Presentation transcript:

1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009

2 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 Objectives Determining whether Scriptome can … 1. Enable you to perform operations otherwise difficult/time-consuming/error-prone? 2. Help you learn Perl? And don’t worry: This experiment won’t hurt a bit! Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …

3 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists Originally developed by Harvard’s FAS Center for Systems Biology  Maintained and extended by lots more volunteers not associated with Harvard

4 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Why Bother With Scriptome? Code is visible, enabling learning on how to do things in Perl … or not Can handle arbitrarily large files  No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic!  Much faster than manual operations  You can string operations together and save these in e.g. a.bat file

5 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 How Do You Use Scriptome? You tell Scriptome which function you want it to perform (more later) You can also string Scriptome functions into a protocol protocol Input: Scriptome operates on text files  No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules, e.g., ParseExcelParseExcel Output: command line or write into another file

6 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Scriptome: Pick Your Flavor http://sysbio.harvard.edu/csb/resources/computational/scriptome/ http://lane.stanford.edu/howto/index.html?id=_1257

7 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Installing Scriptome - Windows 1. Download Scriptome_exe.tar.gz using this link: http://sysbio.harvard.edu/csb/resources/computational/scriptome/b in/Scriptome_exe.tar.gz. http://sysbio.harvard.edu/csb/resources/computational/scriptome/b in/Scriptome_exe.tar.gz → Final location: I suggest C:/Program Files/Scriptome 2. Create a directory named “Scriptome” 3. Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside 3. Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat

8 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example  Scriptome -t change_fasta_to_tab LONGhmcad.fstchange_fasta_to_tab 2. Finding a tool by type: Scriptome -t tooltype where tooltype = Calc Choose Sort Fetch Merge Change Example  Scriptome -t Calc Let’s examine each area briefly before going over specifics…

9 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

10 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 Examples and noteworthy tools

11 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Calc Tool Examples - 1 Compute column sums: Scriptome -t calc_col_sum SubjectData1.tabcalc_col_sum → select columns to add IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ " file.tab

12 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Calc Tool Examples - 2 Compute row sums: Scriptome -t calc_row_sum SubjectData1.tabcalc_row_sum → enter 1 for column 1, 2 for column 2, etc perl -e " @cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of columns @cols for each line ($. lines)\n\n~ " in.tab

13 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 Change Tool Examples - 1 Create tab-delimited file from FASTA file: Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab change_fasta_to_tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_.= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna

14 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Change Tool Examples - 2 Change rows to columns or vice versa : Scriptome -t change_transpose_table SubjectData1.tabchange_transpose_table Note: change_transpose_table operates on tab- delimited files

15 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Change Tool Examples - 3 Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst change_bio_format_to_bio_format enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installedBioperl perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile, -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while ( ) { print $out $_; $count++; } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta * Notice anything interesting? *

16 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 Conclusions Scriptome is … A good solution for manipulating medium to large data files quickly and reliably A way to learn Perl in a “real” context (no toy problems) Able to perform a wide range of tasks, from simple, generic file manipulations to bio- specific complex tasks

17 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Resources For Perl help, see resources in workshop description in Lane’s Perl Programming for BiologistsPerl Programming for Biologists Some recommended titles:

18 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again?

19 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu


Download ppt "Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist."

Similar presentations


Ads by Google