EGEE is a project funded by the European Union under contract IST-2003-508833 Enabling bioinformatics applications to.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
INFSO-RI Enabling Grids for E-sciencE FloodGrid application Ladislav Hluchy, Viet D. Tran Institute of Informatics, SAS Slovakia.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite IPv6 compliance project tests Further.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
3 rd EGEE Conference Athens 18th-22nd April EGEE is a project funded by the European Union under contract IST Geant4 Production in the LCG.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status report on Application porting at SZTAKI.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Application Porting INFN Giuseppe.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks BiG: A Grid Service to Distribute Large BLAST.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IPv6 code checker tool Salvatore Monforte.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
EGEE-II INFSO-RI Enabling Grids for E-sciencE Storage Accounting for Grid Environments Fabio Scibilia INFN - Catania.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
EGEE is a project funded by the European Union under contract IST Experiment Software Installation toolkit on LCG-2
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
EGEE is a project funded by the European Union under contract IST Feedback on the gLite middleware Dietrich Liko / IT - LCG ARDA Workshop,
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Enabling Grids for E-sciencE INFN Workshop – May 7-11 Rimini 1 Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO.
GDB Meeting CERN 09/11/05 EGEE is a project funded by the European Union under contract IST A new LCG VO for GEANT4 Patricia Méndez Lorenzo.
EGEE is a project funded by the European Union under contract IST Report from the PTF Fabrizio Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Remotely access grid files with Parrot
gLite Basic APIs Christos Filippidis
Grid Computing: Running your Jobs around the World
SuperB – INFN-Bari Giacinto DONVITO.
gLite Information System
MCproduction on the grid
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Technical Board Meeting, CNAF, 14 Feb. 2004
Grid2Win: Porting of gLite middleware to Windows XP platform
Job Application Monitoring e cosa offre JAM G. Donvito,G. Maggi,M
Short update on the latest gLite status
Interoperability & Standards
Module 01 ETICS Overview ETICS Online Tutorials
Job Application Monitoring (JAM)
Information System (BDII)
Presentation transcript:

EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to access files over the grid via a GFAL plugin to Parrot G.Donvito INFN Bari In collaboration with

EGEE Conference September 2006, Geneve 2 Outlook Objectives How to access data A successful example: Blast – DB downloading – DB Indexing – Running Blast Conclusions

EGEE Conference September 2006, Geneve 3 Objectives Bioinformatics applications:  The biggest problem in running bioinformatics application is data access  Flat-file DB  RDMS  Problems dealing about flat-files:  Usually the application is written supposing “local” data access  Often this application are not simple to be modified or the code is not available  Probably some WN can not have enough disk space for the input files of some application  Often is not practicable the solution of network shared file-system  There can be problems of performances or local configuration

EGEE Conference September 2006, Geneve 4 How to access data How goal is to use the bio-application as they are “wrapping” all file-system call –The user can “see” and access remote file with the same command There are two different software there are actually capable to do that: Parrot and FUSE –The first one is most suitable for batch execution –The second one is most suitable for interactive execution (tipically on a UI); it needs the use af a kernel module Both of them support many protocols (http, ftp, gsiftp) –No one of this protocols can exploit all the gLite DM functionality The overhead of the network access is a factor of 2 (compared with local disk access) # cat myfile This is my file # # parrot cat /http/myserver.ba.infn.it/myfile This is my file # Local Access Access with Parrot # httpfs /fuse/ # cat /fuse/myfile This is my file # Access with FUSE

EGEE Conference September 2006, Geneve 5 Why we need GFAL With GFAL it is possible to exploit all the possibility of the gLite DM: –To use Logical File Name (LFN) to hide the physical location of the files  And the complexity of the Physical File Name –To hide the different implementation of the different SE –To avoid the installation of specific software to provide file access –To use VOMS authentication on the files –To use Access Control List on the files # parrot ls -l /gfal/lfn/bio/myhome/ -rw-r--r-- 1 donvito donvito 2752 Mar 30 17:39 my_out_file -rw-r--r-- 1 donvito donvito 1244 Mar 30 17:39 my_input_file -rw-r--r-- 1 donvito donvito 1188 Mar 30 17:39 my_out_file2 BARI SE dCache CNAF SE CASTOR LEGNARO SE DPM

EGEE Conference September 2006, Geneve 6 What BLAST is ? BLAST (Basic Local Alignment Search Tool) provides a method for rapid searching of nucleotide and protein databases. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

EGEE Conference September 2006, Geneve 7 A tipical Use Case: BLAST We can run BLAST with this file access method, but to do that we need: –Download all needed Databases  And update them regularly (without human intervention) –Index the DB  At each new DB update (without human intervention) –Discover automatically the available DB  And the new version as them are made available –Run the Blast search against chosen DB (as simple as possible)  Submitting hundreds of query each time  Recover the outputs ALL THIS STEPS ARE REALIZED IN ORDER TO RUN BLAST ON THE EGEE INFRASTRUCTURE

EGEE Conference September 2006, Geneve 8 BLAST: Download Databases It is realized with a simple bash script that can be put in cron It downloads the databases, checking if there are a new version of DB to be downloaded It decompress the downloaded files and create the unique file that represent the database It upload the file on a SE and register it in the file catalog (LFC) It is configured through a conf file in which are needed: –The database name –The url of the repository of the databases –The LFC to be used –A list of SE that can be used to store the DB –A pattern match string to select the files to be downloaded

EGEE Conference September 2006, Geneve 9 BLAST: Indexing the DB To be usable with BLAST a DB must be “indexed” These index must be re-runned again at each update –There is an automatic job-submission triggered from the download script using the EGEE infrastructure –The job is submitted to the farms in which blast is installed This operation require the access to the entire DB in one file –This access is done using PARROT FTP DB Download UI SE DB Registration Index Job-Submission Index Registration

EGEE Conference September 2006, Geneve 10 The home - The Services

EGEE Conference September 2006, Geneve 11 MULTI BLAST : input form /1 Upload the file with the FASTA sequences

EGEE Conference September 2006, Geneve 12 MULTI BLAST : submit Multi FASTA successfully submitted to WMProxy Inspect the status of the Collection

EGEE Conference September 2006, Geneve 13 MULTI BLAST : queue Retrieve the output of the Collection

EGEE Conference September 2006, Geneve 14 Statics & Credits % of users who runned BLAST in the last two monts Total users 45 Multi BLAST is also available to under ”Current VO Services” This is a work supported in part by the LIBI and the BIOINFOGRID project.LIBI BIOINFOGRID To obtain more information contact: -Giacinto Donvito - Vihang Duhhalkar - Nicola De Filippis - Giuseppe La Rocca

EGEE Conference September 2006, Geneve 15 ??

EGEE Conference September 2006, Geneve 16 MULTI BLAST : input form /2

EGEE Conference September 2006, Geneve 17 MULTI BLAST : Data Spooler View the output of each subjobs

EGEE Conference September 2006, Geneve 18 MULTI BLAST : LFC Catalog Show Details Download & View