Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009
Lane Medical Library & Knowledge Management Center 2 Objectives Determining whether Scriptome can … 1. Enable you to perform operations otherwise difficult/time-consuming/error-prone? 2. Help you learn Perl? And don’t worry: This experiment won’t hurt a bit! Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …
Lane Medical Library & Knowledge Management Center 3 So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists Originally developed by Harvard’s FAS Center for Systems Biology Maintained and extended by lots more volunteers not associated with Harvard
Lane Medical Library & Knowledge Management Center 4 Why Bother With Scriptome? Code is visible, enabling learning on how to do things in Perl … or not Can handle arbitrarily large files No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic! Much faster than manual operations You can string operations together and save these in e.g. a.bat file
Lane Medical Library & Knowledge Management Center 5 How Do You Use Scriptome? You tell Scriptome which function you want it to perform (more later) You can also string Scriptome functions into a protocol protocol Input: Scriptome operates on text files No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules, e.g., ParseExcelParseExcel Output: command line or write into another file
Lane Medical Library & Knowledge Management Center 6 Scriptome: Pick Your Flavor
Lane Medical Library & Knowledge Management Center 7 Installing Scriptome - Windows 1. Download Scriptome_exe.tar.gz using this link: in/Scriptome_exe.tar.gz. in/Scriptome_exe.tar.gz → Final location: I suggest C:/Program Files/Scriptome 2. Create a directory named “Scriptome” 3. Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside 3. Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat
Lane Medical Library & Knowledge Management Center 8 Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example Scriptome -t change_fasta_to_tab LONGhmcad.fstchange_fasta_to_tab 2. Finding a tool by type: Scriptome -t tooltype where tooltype = Calc Choose Sort Fetch Merge Change Example Scriptome -t Calc Let’s examine each area briefly before going over specifics…
Lane Medical Library & Knowledge Management Center 9 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous
Lane Medical Library & Knowledge Management Center 10 Examples and noteworthy tools
Lane Medical Library & Knowledge Management Center 11 Calc Tool Examples - 1 Compute column sums: Scriptome -t calc_col_sum SubjectData1.tabcalc_col_sum → select columns to add IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ " file.tab
Lane Medical Library & Knowledge Management Center 12 Calc Tool Examples - 2 Compute row sums: Scriptome -t calc_row_sum SubjectData1.tabcalc_row_sum → enter 1 for column 1, 2 for column 2, etc perl -e 2, 3); while(<>) { /\t/, $_; $sum = 0; foreach $col { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of for each line ($. lines)\n\n~ " in.tab
Lane Medical Library & Knowledge Management Center 13 Change Tool Examples - 1 Create tab-delimited file from FASTA file: Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab change_fasta_to_tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_.= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna
Lane Medical Library & Knowledge Management Center 14 Change Tool Examples - 2 Change rows to columns or vice versa : Scriptome -t change_transpose_table SubjectData1.tabchange_transpose_table Note: change_transpose_table operates on tab- delimited files
Lane Medical Library & Knowledge Management Center 15 Change Tool Examples - 3 Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst change_bio_format_to_bio_format enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installedBioperl perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile { $in = Bio::SeqIO->newFh(-file => $infile, -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while ( ) { print $out $_; $count++; } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta * Notice anything interesting? *
Lane Medical Library & Knowledge Management Center 16 Conclusions Scriptome is … A good solution for manipulating medium to large data files quickly and reliably A way to learn Perl in a “real” context (no toy problems) Able to perform a wide range of tasks, from simple, generic file manipulations to bio- specific complex tasks
Lane Medical Library & Knowledge Management Center 17 Resources For Perl help, see resources in workshop description in Lane’s Perl Programming for BiologistsPerl Programming for Biologists Some recommended titles:
Lane Medical Library & Knowledge Management Center 18 Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again?
Lane Medical Library & Knowledge Management Center