Download presentation
Presentation is loading. Please wait.
Published byStanley Todd Modified over 8 years ago
1
http://www.itb.cnr.it/bioinfogrid EGEE is a project funded by the European Union under contract IST-2003-508833 Enabling bioinformatics applications to access files over the grid via a GFAL plugin to Parrot G.Donvito INFN Bari www.eu-egee.org In collaboration with http://grid-it.cnaf.infn.it/
2
EGEE Conference 2006 25-29 September 2006, Geneve 2 Outlook Objectives How to access data A successful example: Blast – DB downloading – DB Indexing – Running Blast Conclusions
3
EGEE Conference 2006 25-29 September 2006, Geneve 3 Objectives Bioinformatics applications: The biggest problem in running bioinformatics application is data access Flat-file DB RDMS Problems dealing about flat-files: Usually the application is written supposing “local” data access Often this application are not simple to be modified or the code is not available Probably some WN can not have enough disk space for the input files of some application Often is not practicable the solution of network shared file-system There can be problems of performances or local configuration
4
EGEE Conference 2006 25-29 September 2006, Geneve 4 How to access data How goal is to use the bio-application as they are “wrapping” all file-system call –The user can “see” and access remote file with the same command There are two different software there are actually capable to do that: Parrot and FUSE –The first one is most suitable for batch execution –The second one is most suitable for interactive execution (tipically on a UI); it needs the use af a kernel module Both of them support many protocols (http, ftp, gsiftp) –No one of this protocols can exploit all the gLite DM functionality The overhead of the network access is a factor of 2 (compared with local disk access) # cat myfile This is my file # # parrot cat /http/myserver.ba.infn.it/myfile This is my file # Local Access Access with Parrot # httpfs http://192.168.0.86/ /fuse/http://192.168.0.86/ # cat /fuse/myfile This is my file # Access with FUSE
5
EGEE Conference 2006 25-29 September 2006, Geneve 5 Why we need GFAL With GFAL it is possible to exploit all the possibility of the gLite DM: –To use Logical File Name (LFN) to hide the physical location of the files And the complexity of the Physical File Name –To hide the different implementation of the different SE –To avoid the installation of specific software to provide file access –To use VOMS authentication on the files –To use Access Control List on the files # parrot ls -l /gfal/lfn/bio/myhome/ -rw-r--r-- 1 donvito donvito 2752 Mar 30 17:39 my_out_file -rw-r--r-- 1 donvito donvito 1244 Mar 30 17:39 my_input_file -rw-r--r-- 1 donvito donvito 1188 Mar 30 17:39 my_out_file2 BARI SE dCache CNAF SE CASTOR LEGNARO SE DPM
6
EGEE Conference 2006 25-29 September 2006, Geneve 6 What BLAST is ? BLAST (Basic Local Alignment Search Tool) provides a method for rapid searching of nucleotide and protein databases. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.
7
EGEE Conference 2006 25-29 September 2006, Geneve 7 A tipical Use Case: BLAST We can run BLAST with this file access method, but to do that we need: –Download all needed Databases And update them regularly (without human intervention) –Index the DB At each new DB update (without human intervention) –Discover automatically the available DB And the new version as them are made available –Run the Blast search against chosen DB (as simple as possible) Submitting hundreds of query each time Recover the outputs ALL THIS STEPS ARE REALIZED IN ORDER TO RUN BLAST ON THE EGEE INFRASTRUCTURE
8
EGEE Conference 2006 25-29 September 2006, Geneve 8 BLAST: Download Databases It is realized with a simple bash script that can be put in cron It downloads the databases, checking if there are a new version of DB to be downloaded It decompress the downloaded files and create the unique file that represent the database It upload the file on a SE and register it in the file catalog (LFC) It is configured through a conf file in which are needed: –The database name –The url of the repository of the databases –The LFC to be used –A list of SE that can be used to store the DB –A pattern match string to select the files to be downloaded
9
EGEE Conference 2006 25-29 September 2006, Geneve 9 BLAST: Indexing the DB To be usable with BLAST a DB must be “indexed” These index must be re-runned again at each update –There is an automatic job-submission triggered from the download script using the EGEE infrastructure –The job is submitted to the farms in which blast is installed This operation require the access to the entire DB in one file –This access is done using PARROT FTP DB Download UI SE DB Registration Index Job-Submission Index Registration
10
EGEE Conference 2006 25-29 September 2006, Geneve 10 The home - https://glite-demo.ct.infn.it/ The Services
11
EGEE Conference 2006 25-29 September 2006, Geneve 11 MULTI BLAST : input form /1 Upload the file with the FASTA sequences
12
EGEE Conference 2006 25-29 September 2006, Geneve 12 MULTI BLAST : submit Multi FASTA successfully submitted to WMProxy Inspect the status of the Collection
13
EGEE Conference 2006 25-29 September 2006, Geneve 13 MULTI BLAST : queue Retrieve the output of the Collection
14
EGEE Conference 2006 25-29 September 2006, Geneve 14 Statics & Credits % of users who runned BLAST in the last two monts Total users 45 Multi BLAST is also available to https://glite-tutor.ct.infn.it/ under ”Current VO Services”https://glite-tutor.ct.infn.it/ This is a work supported in part by the LIBI and the BIOINFOGRID project.LIBI BIOINFOGRID To obtain more information contact: -Giacinto Donvito giacinto.donvito@ba.infn.itgiacinto.donvito@ba.infn.it - Vihang Duhhalkar vihang007@gmail.comvihang007@gmail.com - Nicola De Filippis nicola.defilippis@ba.infn.itnicola.defilippis@ba.infn.it - Giuseppe La Rocca giuseppe.larocca@ct.infn.itgiuseppe.larocca@ct.infn.it
15
EGEE Conference 2006 25-29 September 2006, Geneve 15 ??
16
EGEE Conference 2006 25-29 September 2006, Geneve 16 MULTI BLAST : input form /2
17
EGEE Conference 2006 25-29 September 2006, Geneve 17 MULTI BLAST : Data Spooler View the output of each subjobs
18
EGEE Conference 2006 25-29 September 2006, Geneve 18 MULTI BLAST : LFC Catalog Show Details Download & View
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.