Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)

Taverna and SoapLab Elda Rossi – CINECA (Italy)

What is CINECA Cineca is a consortium of Italian Universities and CNR
Funded in 1969, now under the control of Research and University Ministry

Resources The most important national infrastructure in Italy for the computational support to scientific research Mission: promoting the use of the most advanced computing systems to support public and private scientific and technological research

R & Bioconductor Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. It is based on R , a language and environment for statistical computing and graphics.

R & Bioconductor BioConductor is a collection of “packages”
Two main types: provides basic infrastructure support. Provides innovative methodology We chose a function in the affy package (type 2. )

The affy package Package: affy
Description: The package contains functions for exploratory oligonucleotide array analysis. The dependance to tkWidgets only concerns few convenience functions. 'affy' is fully functional without it. Version: Author: Rafael A. Irizarry , Laurent Gautier , Benjamin Milo Bolstad , and Crispin Miller with contributions from … Maintainer: Rafael A. Irizarry Dependencies: R (>= 1.9.0), Biobase (>= ), reposTools Suggests: tkWidgets (>= 1.2.2), affydata SystemRequirements: None License: LGPL version 2 or newer URL: None available Function: Expresso . From raw probe intensities to expression values

The expresso function Expression measures
The most common operation is certainly to convert probe level data to expression values. reading in probe level data background correction 4 methods Normalization methods probe specific background correction, e.g. subtracting MM methods summarizing the probe set values into one expression measure and, in some cases, a standard error for this summary 5 methods

How to run expresso Data.out Data.CEL Report script $ R
> library(affy) > data<-ReadAffy() > data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish") > write.exprs(data.mas,file=“Data.out") Report script $ R CMD BATCH script library(affy) data<-ReadAffy() data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish") write.exprs(data.mas,file=“Data.out")

The files OUT file CEL file [CEL] Version=3 [HEADER] Cols=126 Rows=126
TotalX=126 TotalY=126 Baseline=Not normalized DatHeader=ctrl150:CLS=1167 … [INTENSITY] NumberCells=15876 CellHeader=X Y MEAN Sample001.cel Sample002.cel Sample003.cel 100084_at 101482_at 31962_at 32466_at 35201_at 36189_at 36678_at 37001_at 37029_at 37046_at 37189_at 37719_at 37725_at 38437_at 38730_at 39425_at 40276_at The CEL file describes the intensities determined for every feature on a chip, without providing information about which probes correspond to which probe sets (such information provided by the CDF or 1LQ file). This format is not a fixed standard, Affymetrix does not directly support file access and this format may change in the future. What we describe here is intended to explain CEL files in the context of this transcriptome data set only, not all of the statements here can be applied to CEL files for other Affymetrix chips. The file is a simple text file. There are a number of header and footer lines, the relevant data begins after the line containing the following text (line 24): CellHeader=X Y MEAN STDV NPIXELS and ends on the line containing the following text: [MASKS] The relevant lines are in tab-delimited format. There is a line for every probe on the chip, each line contains the following fields: field 1 = X, the x-coordinate of the probe. field 2 = Y, the y-coordinate of the probe. field 3 = MEAN, the 75th percentile of the pixel intensities in the stripped feature. field 4 = STDV, the standard deviation of the pixel intensities in the stripped feature. field 5 = NPIXELS, the number of pixels in the stripped feature. In the above, "stripped feature" means what remains after the outer layer of pixels has been stripped away.

Setting up SoapLab A linux based server was chosen
Tomcat was installed Java was upgraded Axis was installed SoapLab was installed Vega.cineca.it Tomcat Java 1.4 Axis 1.1 SoapLab precompiled for Suse Linux Up to here: No Problems !!!

Defining the Application
Write the application wrapper Write the ACD file for the application Convert ACD to XML Start up the SoapLab server Deploy the new service

1. Write the application wrapper
/biotools/services/affy-expresso.pl # R code to run analysis open(AFFY,">$datadir/affy"); print AFFY <<EOF ; library(affy) data<-ReadAffy() data.mas<-expresso(data, bgcorrect.method="$bgcorrect", pmcorrect.method="mas", normalize.method="$normalize", summary.method="medianpolish") write.exprs(data.mas,file="data.txt") EOF close(AFFY); # now run program system "cd $datadir; $rexe CMD BATCH affy"; # print output open(OUT,"$datadir/data.txt"); while (<OUT>) {print $_;} close(OUT); #!/usr/bin/perl use Getopt::Long; # command arguments (with default) GetOptions("bgcorrect=s"=>\$bgcorrect, "normalize=s"=>\$normalize); $bgcorrect="mas" if $bgcorrect eq ""; $normalize="constant" if $normalize eq ""; # location of R executable $rexe="/biotools/R/R-2.1.0/bin/R"; # data directory $datadir=“/biotools/services/data";

2. Write the ACD file /biotools/soapbin/analysis-interfaces/metadata/affy.acd appl: bioconductor [ documentation: "affy/expresso function of BioConductor" version: "1.0" groups: "Microarrays" nonemboss: "Y" executable: affy-expresso.pl ] string: bgcorrect [ additional: "Y" parameter: "Y" default: "mas" string: normalize [ additional:"Y" default: "constant" outfile: output [ default:“stdout" The path is defined in the shell Input1: Background correction Input1: Normalization method Output: standard output

3, 4, 5: Final steps Convert ACD to XML Start up the SoapLab server
Deploy the new service /biotools/soapbin/analysis-interfaces/generator/acd2xml From: ../metadata/affy.acd To: ../metadata/microarrays/affy-al.xml /biotools/soapbin/analysis-interfaces/run-AppLab-server How to shut down the server? /biotools/soapbin/analysis-interfaces/ws/deploy-web-services

Using the service from Taverna
From the Available service window select Add new SoapLab scavenger and enter our server address

Using the service … (2) The new processor appears
in the microarrays folder you can find the affy service After connecting input & output ports, the service can be launched

Problems encountered Documentation is not so clear and complete
How can we transfer (large) files from the personal WS to the server machine We need a permanent and private data area for storing data We would like to monitor the service while it is running, batch support (asynchronous services?) How can we return data in addition to stdOut and stdErr …..

A possible (future) workflow
Upload one or more CEL files on the server WS-upload Analyse the data and get expression levels WS-expresso WS-plot Verify the output data YES NO OK ? download the output data and clear the personal space

Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)

Similar presentations

Presentation on theme: "Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)

Similar presentations

Presentation on theme: "Taverna and SoapLab Experience @ Elda Rossi – CINECA (Italy)"— Presentation transcript:

Similar presentations

About project

Feedback