Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist.


Similar presentations
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!

Some computer fundamentals and jargon Memory: Basic element is a bit – value = 0 or 1 Collection of “n” bits is a “byte” Collection of several bytes is.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Aug 28 th 2007 Yannick Pouliot,
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Getting Started with Layout Compiled by Ryan Johnson May 1, 2002  Open Orcad Capture under Engineering Software  Under FILE, choose NEW, PROJECT  The.
Perl Programming: Developing Key Tools for Bioinformatics An Informative Look Behind the Importance of Programming Skills and Brief Tutorial on Getting.
Lane Medical Library & Knowledge Management Center How to Write a Program Yannick Pouliot, PhD Bioresearch Informationist
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 3: Tue Feb 17 th 2009 Yannick Pouliot,
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists SESSION 2: Tue Feb 10 th 2009 Yannick Pouliot,
Lane Medical Library & Knowledge Management Center Essential UNIX Skills for Biologists Yannick Pouliot, PhD Bioresearch Informationist.
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists PART 2: Tue Feb 12 th 2008 Yannick Pouliot,
Open and save files directly from Word, Excel, and PowerPoint No more flash drives or sending yourself documents via Stop manually merging versions.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
Using Microsoft Word’s Mail Merge Features Lunch and Learn: March 15, 2005.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to UNIX/Linux Exercises Dan Stanzione.
An Introduction to Textual Programming
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
Chapter Extension 6 Using Excel and Access Together.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Audio Dial In: or CRM to RM Visual CRM to MS-CRM 2007 Visual User Group Nov 21 st 2007.
XP New Perspectives on Integrating Microsoft Office XP Tutorial 2 1 Integrating Microsoft Office XP Tutorial 2 – Integrating Word, Excel, and Access.
Computer Programming for Biologists Class 10 Dec 5 th, 2014 Karsten Hokamp
(Chapter 10 continued) Our examples feature MySQL as the database engine. It's open source and free. It's fully featured. And it's platform independent.
Lane Medical Library & Knowledge Management Center Introductory Perl Programming for Biologists Part 1: 2/3/2009 PRELIMINARY VERSION.
Shannon K. Basher, MLS Houston Academy of Medicine – Texas Medical Center Library.
 Agenda: 4/24/13 o External Data o Discuss data manipulation tools and functions o Discuss data import and linking in Excel o Sorting Data o Date and.
Create Lists in Millennium Jenny Schmidt SWITCH Library Consortium.
Introduction to Perl Yupu Liang cbio at MSKCC
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
Installing BioPerl – how to add a repository to the PPM Start  All Programs  Active Perl…  Perl Package manager (If you don’t see a screen like the.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Comparison of different output options from Stata
Creating visual interfaces in python
Lane Medical Library & Knowledge Management Center Perl Programming for Biologists, Second Edition Part 1: 9/11/2007 Yannick Pouliot,
Files Tutor: You will need ….
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Visual Basic for Application - Microsoft Access 2003 Finishing the application.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
SQL (3) Research questions, databases, and analytics; Importing data, exporting data, using other tools Information Structures and Implications 2015 Bettina.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Lecture 7 Conditional Scripting and Importing/Exporting.
WS1-1 ADM730, Workshop 1, September 2005 Copyright  2005 MSC.Software Corporation WORKSHOP 1 INTRODUCTION Open Retracted - Bad Retracted - Good.
Thank you for looking into Policy Manager Two for your Head Start program Teresa K. Wickstrom Senior Associate Center for Community Futures
Today's Ninja Challenge: Write Your First Computer Game!
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
In cataloging module in the 01 Library, go to Services menu; choose “Retrieve Catalog Records” by highlighting it; then choose “Retrieve Catalog Records.
Microsoft Excel Consolidation. Contents Introduction to Multiple Workbook Applications Working with Multiple Workbook Applications using normal keyboard.
Custom Report Generator - Advanced
TDA Direct Certification
Plug-In T7: Problem Solving Using Access 2007
Chapter 2: Getting Data into SAS
Functions CIS 40 – Introduction to Programming in Python
ECONOMETRICS ii – spring 2018
Python I/O.
More advanced BASH usage
Notes about Homework #4 Professor Hugh C. Lauer CS-1004 — Introduction to Programming for Non-Majors (Slides include materials from Python Programming:
Introduction In today’s lesson we will look at: why Python?
Input and Output Python3 Beginner #3.
Presentation transcript:

Lane Medical Library & Knowledge Management Center Ni mble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009

Lane Medical Library & Knowledge Management Center 2 Objectives Determining whether Scriptome can … 1. Enable you to perform operations otherwise difficult/time-consuming/error-prone? 2. Help you learn Perl? And don’t worry: This experiment won’t hurt a bit! Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …

Lane Medical Library & Knowledge Management Center 3 So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists Originally developed by Harvard’s FAS Center for Systems Biology  Maintained and extended by lots more volunteers not associated with Harvard

Lane Medical Library & Knowledge Management Center 4 Why Bother With Scriptome? Code is visible, enabling learning on how to do things in Perl … or not Can handle arbitrarily large files  No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic!  Much faster than manual operations  You can string operations together and save these in e.g. a.bat file

Lane Medical Library & Knowledge Management Center 5 How Do You Use Scriptome? You tell Scriptome which function you want it to perform (more later) You can also string Scriptome functions into a protocol protocol Input: Scriptome operates on text files  No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules, e.g., ParseExcelParseExcel Output: command line or write into another file

Lane Medical Library & Knowledge Management Center 6 Scriptome: Pick Your Flavor

Lane Medical Library & Knowledge Management Center 7 Installing Scriptome - Windows 1. Download Scriptome_exe.tar.gz using this link: in/Scriptome_exe.tar.gz. in/Scriptome_exe.tar.gz → Final location: I suggest C:/Program Files/Scriptome 2. Create a directory named “Scriptome” 3. Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside 3. Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat

Lane Medical Library & Knowledge Management Center 8 Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example  Scriptome -t change_fasta_to_tab LONGhmcad.fstchange_fasta_to_tab 2. Finding a tool by type: Scriptome -t tooltype where tooltype = Calc Choose Sort Fetch Merge Change Example  Scriptome -t Calc Let’s examine each area briefly before going over specifics…

Lane Medical Library & Knowledge Management Center 9 Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous

Lane Medical Library & Knowledge Management Center 10 Examples and noteworthy tools

Lane Medical Library & Knowledge Management Center 11 Calc Tool Examples - 1 Compute column sums: Scriptome -t calc_col_sum SubjectData1.tabcalc_col_sum → select columns to add IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ "

Lane Medical Library & Knowledge Management Center 12 Calc Tool Examples - 2 Compute row sums: Scriptome -t calc_row_sum SubjectData1.tabcalc_row_sum → enter 1 for column 1, 2 for column 2, etc perl -e 2, 3); while(<>) { /\t/, $_; $sum = 0; foreach $col { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of for each line ($. lines)\n\n~ "

Lane Medical Library & Knowledge Management Center 13 Change Tool Examples - 1 Create tab-delimited file from FASTA file: Scriptome -t change_fasta_to_tab LONGhmcad.fst > change_fasta_to_tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_.= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna

Lane Medical Library & Knowledge Management Center 14 Change Tool Examples - 2 Change rows to columns or vice versa : Scriptome -t change_transpose_table SubjectData1.tabchange_transpose_table Note: change_transpose_table operates on tab- delimited files

Lane Medical Library & Knowledge Management Center 15 Change Tool Examples - 3 Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst change_bio_format_to_bio_format enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installedBioperl perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile { $in = Bio::SeqIO->newFh(-file => $infile, -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while ( ) { print $out $_; $count++; } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta * Notice anything interesting? *

Lane Medical Library & Knowledge Management Center 16 Conclusions Scriptome is … A good solution for manipulating medium to large data files quickly and reliably A way to learn Perl in a “real” context (no toy problems) Able to perform a wide range of tasks, from simple, generic file manipulations to bio- specific complex tasks

Lane Medical Library & Knowledge Management Center 17 Resources For Perl help, see resources in workshop description in Lane’s Perl Programming for BiologistsPerl Programming for Biologists Some recommended titles:

Lane Medical Library & Knowledge Management Center 18 Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again?

Lane Medical Library & Knowledge Management Center