Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)

Slides:



Advertisements
Similar presentations
Lab III – Linux at UMBC.
Advertisements

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
Lesson 6 Software and Hardware Interaction
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Guide To UNIX Using Linux Third Edition
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
DAT602 Database Application Development Lecture 15 Java Server Pages Part 1.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Java Programming, 2E Introductory Concepts and Techniques Chapter 2 Creating a Java Application and Applet.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
SQL Maestro Hello World IQ Associates. Contents Initial setup Hello World.
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
Java Programming, 3e Concepts and Techniques Chapter 2 - Part 2 Creating a Java Application and Applet.
CS441 CURRENT TOPICS IN PROGRAMMING LANGUAGES LECTURE 5_1 George Koutsogiannakis/ Summer
Guidelines for Homework 6. Getting Started Homework 6 requires that you complete Homework 5. –All of HW5 must run on the GridFarm. –HW6 may run elsewhere.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Taverna and SoapLab Elda Rossi – CINECA (Italy)
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
9/2/ CS171 -Math & Computer Science Department at Emory University.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Performing statistical analyses using the Rshell processor Original material by Peter Li, University of Birmingham, UK Adapted by Norman.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Creating a Java Application and Applet
Soaplab SOAP-based Analysis Web Services Martin Senger
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Downloading the MAXENT Software
1 Lesson 6 Software and Hardware Interaction Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Remote Api Tutorial How to call WS-PGRADE workflows from remote clients through the http protocol?
Chapter 16 Advanced Bourne Shell Programming. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To discuss numeric data processing.
Alan Williams University of Manchester.  Allows you to call a command line script as part of a workflow  Simplest case is calling a single tool  Can.
Upgrade on Windows 7. DownloadSoftware Download Software from link provided in Webliography: e/
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
CACI Proprietary Information | Date 1 Upgrading to webMethods Product Suite Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8,
An Introduction to Designing and Executing Workflows with Taverna Part 2 – Importing and exporting data Norman Morrison University of Manchester Credits:
LINUX - SAMBA
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Advanced Computing Facility Introduction
Data Virtualization Tutorial: Custom Functions
Development Environment
Soaplab - overview SOAP-based Analysis Web Services
Performing statistical analyses using the Rshell processor
Data Virtualization Tutorial: Introduction to SQL Script
An Introduction to Designing and Executing Workflows with Taverna
Basic XHTML Tables XHTML tables—a frequently used feature that organizes data into rows and columns. Tables are defined with the table element. Table.
The Linux Operating System
Introduction to Programming the WWW I
Introduction to Scripting
CS703 - Advanced Operating Systems
Server-Side Application and Data Management IT IS 3105 (Spring 2010)
PHP Introduction.
User Defined Functions
Testing REST IPA using POSTMAN
Introduction to javadoc
Statistical Analysis with Excel
How to install the Enterprise Agent using Active Directory
Statistical Analysis with Excel
Using JDeveloper.
Introduction to javadoc
ICT Word Processing Lesson 1: Introduction to Word Processing
Getting Started With Solr
Review of Previous Lesson
Lab 2: Information Retrieval
Presentation transcript:

Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)

What is CINECA Cineca is a consortium of Italian Universities and CNR Funded in 1969, now under the control of Research and University Ministry

Resources The most important national infrastructure in Italy for the computational support to scientific research Mission: promoting the use of the most advanced computing systems to support public and private scientific and technological research

R & Bioconductor Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. It is based on R , a language and environment for statistical computing and graphics.

R & Bioconductor BioConductor is a collection of “packages” Two main types: provides basic infrastructure support. Provides innovative methodology We chose a function in the affy package (type 2. )

The affy package Package: affy Description: The package contains functions for exploratory oligonucleotide array analysis. The dependance to tkWidgets only concerns few convenience functions. 'affy' is fully functional without it. Version: 1.5.8-1 Author: Rafael A. Irizarry , Laurent Gautier , Benjamin Milo Bolstad , and Crispin Miller with contributions from … Maintainer: Rafael A. Irizarry Dependencies: R (>= 1.9.0), Biobase (>= 1.4.22), reposTools Suggests: tkWidgets (>= 1.2.2), affydata SystemRequirements: None License: LGPL version 2 or newer URL: None available Function: Expresso . From raw probe intensities to expression values

The expresso function Expression measures The most common operation is certainly to convert probe level data to expression values. reading in probe level data background correction 4 methods Normalization 7 methods probe specific background correction, e.g. subtracting MM 3 methods summarizing the probe set values into one expression measure and, in some cases, a standard error for this summary 5 methods

How to run expresso Data.out Data.CEL Report script $ R > library(affy) > data<-ReadAffy() > data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish") > write.exprs(data.mas,file=“Data.out") Report script $ R CMD BATCH script library(affy) data<-ReadAffy() data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish") write.exprs(data.mas,file=“Data.out")

The files OUT file CEL file [CEL] Version=3 [HEADER] Cols=126 Rows=126 TotalX=126 TotalY=126 Baseline=Not normalized DatHeader=ctrl150:CLS=1167 … [INTENSITY] NumberCells=15876 CellHeader=X Y MEAN 0 0 551.0 1 0 10651.0 2 0 642.0 3 0 10855.0 4 0 278.0 5 0 452.0 6 0 11139.0 Sample001.cel Sample002.cel Sample003.cel 100084_at 2.68016528652511 2.75619854567269 3.82550383255225 101482_at 2.41830136307405 2.19230548692681 3.4173900695363 31962_at 12.3667390890414 12.4534076075796 12.8658623516881 32466_at 12.4078453130306 12.5262787728982 13.2129784659009 35201_at 6.73875347104673 6.36824635919863 7.53465018481639 36189_at 6.91195864883172 6.77835938949316 7.94585515997792 36678_at 10.0269997503136 9.76893096184106 11.1443619988943 37001_at 8.7690698709579 8.57322443505215 9.80956768540462 37029_at 7.58176898579828 7.24297853600119 8.67002397585278 37046_at 4.7250160934765 4.7250160934765 5.68254863921313 37189_at 7.08125646141077 7.0999566997911 7.92512679504857 37719_at 5.33679629782696 5.33679629782696 6.39140386282694 37725_at 7.634367429284 7.41050271151406 8.85664197069339 38437_at 7.54693596951725 7.16216316289552 8.3816810916508 38730_at 7.61959398527742 7.65907193898742 9.00657184492387 39425_at 6.07663839694708 6.03298499862286 7.14769809957403 40276_at 6.33983152588017 6.21300599988174 6.85968858773872 The CEL file describes the intensities determined for every feature on a chip, without providing information about which probes correspond to which probe sets (such information provided by the CDF or 1LQ file). This format is not a fixed standard, Affymetrix does not directly support file access and this format may change in the future. What we describe here is intended to explain CEL files in the context of this transcriptome data set only, not all of the statements here can be applied to CEL files for other Affymetrix chips. The file is a simple text file. There are a number of header and footer lines, the relevant data begins after the line containing the following text (line 24): CellHeader=X Y MEAN STDV NPIXELS and ends on the line containing the following text: [MASKS] The relevant lines are in tab-delimited format. There is a line for every probe on the chip, each line contains the following fields: field 1 = X, the x-coordinate of the probe. field 2 = Y, the y-coordinate of the probe. field 3 = MEAN, the 75th percentile of the pixel intensities in the stripped feature. field 4 = STDV, the standard deviation of the pixel intensities in the stripped feature. field 5 = NPIXELS, the number of pixels in the stripped feature. In the above, "stripped feature" means what remains after the outer layer of pixels has been stripped away.

Setting up SoapLab A linux based server was chosen Tomcat was installed Java was upgraded Axis was installed SoapLab was installed Vega.cineca.it Tomcat 5.0.28 Java 1.4 Axis 1.1 SoapLab precompiled for Suse Linux Up to here: No Problems !!!

Defining the Application Write the application wrapper Write the ACD file for the application Convert ACD to XML Start up the SoapLab server Deploy the new service

1. Write the application wrapper /biotools/services/affy-expresso.pl # R code to run analysis open(AFFY,">$datadir/affy"); print AFFY <<EOF ; library(affy) data<-ReadAffy() data.mas<-expresso(data, bgcorrect.method="$bgcorrect", pmcorrect.method="mas", normalize.method="$normalize", summary.method="medianpolish") write.exprs(data.mas,file="data.txt") EOF close(AFFY); # now run program system "cd $datadir; $rexe CMD BATCH affy"; # print output open(OUT,"$datadir/data.txt"); while (<OUT>) {print $_;} close(OUT); #!/usr/bin/perl use Getopt::Long; # command arguments (with default) GetOptions("bgcorrect=s"=>\$bgcorrect, "normalize=s"=>\$normalize); $bgcorrect="mas" if $bgcorrect eq ""; $normalize="constant" if $normalize eq ""; # location of R executable $rexe="/biotools/R/R-2.1.0/bin/R"; # data directory $datadir=“/biotools/services/data";

2. Write the ACD file /biotools/soapbin/analysis-interfaces/metadata/affy.acd appl: bioconductor [ documentation: "affy/expresso function of BioConductor" version: "1.0" groups: "Microarrays" nonemboss: "Y" executable: affy-expresso.pl ] string: bgcorrect [ additional: "Y" parameter: "Y" default: "mas" string: normalize [ additional:"Y" default: "constant" outfile: output [ default:“stdout" The path is defined in the shell Input1: Background correction Input1: Normalization method Output: standard output

3, 4, 5: Final steps Convert ACD to XML Start up the SoapLab server Deploy the new service /biotools/soapbin/analysis-interfaces/generator/acd2xml From: ../metadata/affy.acd To: ../metadata/microarrays/affy-al.xml /biotools/soapbin/analysis-interfaces/run-AppLab-server How to shut down the server? /biotools/soapbin/analysis-interfaces/ws/deploy-web-services

Using the service from Taverna From the Available service window select Add new SoapLab scavenger and enter our server address http://vega.cineca.it:8082/axis/services

Using the service … (2) The new processor appears in the microarrays folder you can find the affy service After connecting input & output ports, the service can be launched

Problems encountered Documentation is not so clear and complete How can we transfer (large) files from the personal WS to the server machine We need a permanent and private data area for storing data We would like to monitor the service while it is running, batch support (asynchronous services?) How can we return data in addition to stdOut and stdErr …..

A possible (future) workflow Upload one or more CEL files on the server WS-upload Analyse the data and get expression levels WS-expresso WS-plot Verify the output data YES NO OK ? download the output data and clear the personal space