Download presentation
Presentation is loading. Please wait.
1
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009
2
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2 Objectives Determining whether Scriptome can … 1. Enable you to perform operations otherwise difficult/time-consuming/error-prone? 2. Help you learn Perl? And don’t worry: This experiment won’t hurt a bit! Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …
3
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3 So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists Originally developed by Harvard’s FAS Center for Systems Biology Maintained and extended by lots more volunteers not associated with Harvard
4
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4 Why Bother With Scriptome? Code is visible, enabling learning on how to do things in Perl … or not Can handle arbitrarily large files No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic! Much faster than manual operations You can string operations together and save these in e.g. a.bat file
5
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5 How Do You Use Scriptome? You tell Scriptome which function you want it to perform (more later) You can also string Scriptome functions into a protocol protocol Input: Scriptome operates on text files No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules, e.g., ParseExcelParseExcel Output: command line or write into another file
6
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6 Scriptome: Pick Your Flavor http://sysbio.harvard.edu/csb/resources/computational/scriptome/ http://lane.stanford.edu/howto/index.html?id=_1257
7
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7 Installing Scriptome - Windows 1. Download Scriptome_exe.tar.gz using this link: http://sysbio.harvard.edu/csb/resources/computational/scriptome/b in/Scriptome_exe.tar.gz. http://sysbio.harvard.edu/csb/resources/computational/scriptome/b in/Scriptome_exe.tar.gz → Final location: I suggest C:/Program Files/Scriptome 2. Create a directory named “Scriptome” 3. Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside 3. Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat
8
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 8 Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example Scriptome -t change_fasta_to_tab LONGhmcad.fstchange_fasta_to_tab 2. Finding a tool by type: Scriptome -t tooltype where tooltype = Calc Choose Sort Fetch Merge Change Example Scriptome -t Calc Let’s examine each area briefly before going over specifics…
9
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous
10
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10 Examples and noteworthy tools
11
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11 Calc Tool Examples - 1 Compute column sums: Scriptome -t calc_col_sum SubjectData1.tabcalc_col_sum → select columns to add IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ " file.tab
12
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12 Calc Tool Examples - 2 Compute row sums: Scriptome -t calc_row_sum SubjectData1.tabcalc_row_sum → enter 1 for column 1, 2 for column 2, etc perl -e " @cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of columns @cols for each line ($. lines)\n\n~ " in.tab
13
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13 Change Tool Examples - 1 Create tab-delimited file from FASTA file: Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab change_fasta_to_tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_.= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna
14
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 14 Change Tool Examples - 2 Change rows to columns or vice versa : Scriptome -t change_transpose_table SubjectData1.tabchange_transpose_table Note: change_transpose_table operates on tab- delimited files
15
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15 Change Tool Examples - 3 Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst change_bio_format_to_bio_format enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installedBioperl perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile, -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while ( ) { print $out $_; $count++; } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta * Notice anything interesting? *
16
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16 Conclusions Scriptome is … A good solution for manipulating medium to large data files quickly and reliably A way to learn Perl in a “real” context (no toy problems) Able to perform a wide range of tasks, from simple, generic file manipulations to bio- specific complex tasks
17
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17 Resources For Perl help, see resources in workshop description in Lane’s Perl Programming for BiologistsPerl Programming for Biologists Some recommended titles:
18
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18 Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again?
19
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.