Download presentation
Presentation is loading. Please wait.
Published byBertha Wright Modified over 8 years ago
1
Requirements for biomedical applications of grids V. Breton EGEE applications activity manager CNRS
2
GGF11, Hawai, June 7th 2004 - 2 Content From use cases to requirements DataGrid EGEE EGEE/DataGrid list of requirements e sciences Healthcare Drug discovery What to do next : Existing on-going activities Connection to GGF LSG Preparation of GGF12 in Brussels Satellite biomedical informatics workshop
3
GGF11, Hawai, June 7th 2004 - 3 Importance of requirements process for grid deployment projects DataGrid and EGEE are grid deployment projects DataGrid (2001-2004) for High Energy Physics, Biomedical and Earth Observation applications EGEE (2004-2006) starts with High Energy Physics and biomedical applications but soon will open to other research fields (geophysics, computational chemistry, astrophysics, drug discovery, earth observation, …) Application requirements early identified as a necessity to develop a middleware meeting user needs different user communities have different requirements Necessity to collect, classify and rank user requirements
4
GGF11, Hawai, June 7th 2004 - 4 Requirements flow Project technical groups: middleware developers, security group, network Project technical forum Ranking Application working group Classification Collection Test team Generation of Test cases High Energy Physics Biomedical Earth observation Expression User communities Implementation
5
GGF11, Hawai, June 7th 2004 - 5 Requirements classification Requirements are classified 1. User interface 2. Job submission 3. Data management 4. Information system 5. Storage 6. Network 7. Security 8. Operation Requirements are related to elementary use cases Initial list of use cases relevant for high energy physics: LCG High Energy Physics Common Application Layer (HEPCAL): https://edms.cern.ch/document/375586/1.3 Additional list of use cases for earth observation and biomedical applications EDG Application Working Group joint list of usecases: https://edms.cern.ch/document/386184 https://edms.cern.ch/document/386184
6
GGF11, Hawai, June 7th 2004 - 6 1. User Interface requirements 1.1 User interface installation and portability The user interface is a lightweight component. It should be installable on various systems with a minimal effort. 1.2 User Interface configuration. The user interface should be easy to configure. A pointer to an information service server should be sufficient for the user interface to get all the configuration information it requires. A default information service is expected. 1.3 Application Programming Interface. The application programming interface to middleware services should be available in C++ and Java. Its installation should be lightweight. It should cover all services accessible from the user interface host and the worker node: jobs submission, data manager, information system, etc. 1.4 Group/anonymous login (portals) Anonymous access to restricted services is required for users accessing the grid through specialised grid portals (e.g. bioinformatics algorithms).
7
GGF11, Hawai, June 7th 2004 - 7 2. Job submission requirements (1/2) 2.1 Short jobs execution The middleware should permit the execution of a large numbers (thousands) or short jobs (minutes at most) without introducing a prohibitive pay-off. 2.2 Parallel jobs execution Parallel execution is mandatory for some applications. The middleware is expected to enable parallel job submission (by specifying a number of hosts to allocate at submission time, at least at a site scale) and provide a message passing interface (preferably MPI). 2.3 Prioritised jobs execution It should be possible for a restricted category of users (e.g. surgeons) to order high priority jobs that will execute immediately, pre-empting resources if needed. 2.4 Multiple data jobs It should be possible to execute jobs on an input datasets (i.e. one job repeated as many time as the number of input files). 2.5 Compound jobs execution (pipelining) It should be possible to execute compound (or pipelined) jobs: jobs composed of multiple unitary tasks with any directed graph of execution flow. The mechanism should allow translation of output data sets into input data set of consecutive tasks. The mechanism is expected to handle full input datasets 2.6 Interactive jobs It should be possible to execute interactive jobs (jobs with a communication between the execution host and the user interface). The communication may be shell-based or application- specific (it should be possible to open a socket to transfer interactive feedback according to the application protocol). Resources reservation may be needed to ensure that interactive jobs are started at a precise time (the user has to be available when the interactive application starts).
8
GGF11, Hawai, June 7th 2004 - 8 2. Job submission requirements (2/2) 2.7 Job access to data It should be possible to specify data required by a job. The job submission mechanism should ensure that the data is accessible without further work once the job is started (automatic data replication, etc). 2.8 Simple job submission An authorized user should be able to submit jobs. It should be possible to register the job output before the job is terminated to enable future accessibility. At submission time, it should be possible to specify the amount of resources needed (CPU time, memory usage, disk space needed), and the environment the job is to be executed in. 2.9 Job execution control It should be possible for a user to control the target (or possible targets) for a submitted job. 2.10 Job killing It should be possible for the user that has submitted a job to kill it. 2.11 Resources reservation It should be possible to make advance reservation of resources and to update such reservation prior to job execution. 2.12 Scalability Thousands of jobs at least.
9
GGF11, Hawai, June 7th 2004 - 9 3. Data management requirements (1/2) 3.1 File names translation Logical file names given in the job command line should be automatically translated in physical file names when the job get executed on a given worker node. The user should not have to write explicit application code to translate logical file names into usable instances. 3.2 File access interface A POSIX like gridopen/read/write/close interface to access grid files is expected. It should hide the complexity of data fetching to the user. If this interface enable access to files through their logical name, it fulfils requirement 3.1. 3.3 Fine grain control of access rights Access control to files, metadata, and group of files should be possible at individual and group levels (through Access Control List-like mechanisms). 3.4 Group of files It should be possible to create a logical collection (or group) of files (a logical collection relates to a set of logical files). It should be possible to control file access rights for groups and to submit a job that require one or more groups as input (see R2.4). 3.5 Metadata associated to files It should be possible to associate metadata (stored in databases) to files. These metadata should (by default) have the same access pattern and life time than the data they are associated to. It should however be possible to give different access rights to files and associated metadata when needed. 3.6 Access rights delegation It should be possible for a user with read/write access to a data or metadata to grant the same access (without grant option) to another user. 3.7 Data updates and versioning It should be possible to update data (read-only datasets is not possible for some applications). Data versioning is expected for some applications.
10
GGF11, Hawai, June 7th 2004 - 10 Data management requirements (2/2) 3.8 File name changes It should be possible for an authorized user to change the logical file name of a file or a logical collection name. 3.9 Data replication control It should be possible to replicate a data from one storage to another explicitely. It should be possible to control the possible places where a dataset can be replicated in case of automatic replication. 3.10 Data registration, retrieval, and deletion It should be possible to register a local file on grid storage. Conversely it should be possible to retrieve grid files on local disk space. It should be possible to delete files previously registered (all physical instances plus the logical name registered). 3.11 Data access cost estimation It should be possible to estimate the time needed to read a file from a specific, or the best SE. 3.12 Partial file access It should be possible to read only part of a registered file. 3.13 File browsing It should be possible for a user to browse the file or the metadata he is authorized to read. 3.14 Scalability Millions of files at least.
11
GGF11, Hawai, June 7th 2004 - 11 4. Information system requirements 4.1 Jobs information and status notification It should be possible for a user to list its running jobs. For each job it should be possible to be notified of the job status progress through an API to enable jobs monitoring by an application (without active polling nor non- programmable status reporting such as through emails). 4.2 Top level information system index It should be possible to obtain information about the toplevel information systems through a permanent information system index. 4.3 Resource brokers index It should be possible to obtain information about available resource brokers through a permanent index. 4.4 Grid resource browsing A user should be able to browse the grid resources (VOs, RBs, CEs, SEs, etc) he has access to.
12
GGF11, Hawai, June 7th 2004 - 12 5. Storage requirements 5.1 On-disk encryption It should be possible to encrypt data on disk to prevent data leaks at the storage site level. 5.2 Interface to new storage systems It should be possible to implement an interface to new storage system to make them interoperable with the grid middleware. This implies the availability of a standard grid storage interface with flexibility to endorse various access control mechanisms. 5.3 Hook on data privacy manager The grid storage interface should enable application specific access control policy by providing a hook to application code when performing data access control.
13
GGF11, Hawai, June 7th 2004 - 13 6. Network requirements 6.1 Communications encryption It should be possible to encrypt data prior to communication between sites to prevent third party listening. 6.2 Outbound connectivity Applications will need to access services external to the grid middleware. It should be possible to establish a communication between a grid node and non-grid hosts on any port. 6.3 Guaranteed bandwidth Some applications (e.g. interactive) have a need for a guaranteed network bandwidth.
14
GGF11, Hawai, June 7th 2004 - 14 7. Security requirements 7.1 Data privacy See 5.1 and 6.1: it should be possible to enforce data privacy through encryption at all levels. 7.2 User privacy It should be possible for a user to hide its usage of the grid. 7.3 Computations privacy It should be possible to hide the kind of computation and the datasets that are used in an experiment to external users. 7.4 Data encryption and encryption keys It should be possible to protect data by encryption (see 7.1). It should be possible to ensure that encryption keys are kept by an application specific service (on a trusted site).
15
GGF11, Hawai, June 7th 2004 - 15 8. Operation requirements 8.1 VO creation Creation of new VO and propagation of VO rights to existing sites should be a simple process to ease new applications integration. 8.2 User control It should be possible to grant access to an authorized user and to revoke a user. 8.3 User login An authorized user with a valid certificate should be able to login and get success confirmation. 8.4 Software package publication It should be possible to make new version of software packages available on the grid. 8.5 Robustness The system should be able to handle thousands of jobs and multiple user loads without crashing.
16
GGF11, Hawai, June 7th 2004 - 16 Template to describe requirements IdentifierReference number DescriptionA comprehensive description of the requirement, without implementation detail AreaResearch community expressing requirement : High Energy Physics, biomedical,… PriorityOne of: - Critical: absolutely blocking for applications - Very high: required by all means, very difficult for applications to do without it - High: highly desirable, very difficult for some applications to do without it - Medium: expected, difficult for some applications to do without it - Helpful: broad usage service, would ease applications development - Low: long term requirement, not critical for any application - Very low: thing to think about in a long term perspective Requirement datePM? (a pilot application has to be ready by this date) Implementation datePM? (the requirement has to be fulfiled by the middleware at this date) Implementation proposal to explain how the requirement is supposed to be implemented Status trackingA dated list of milestones toward the requirement completion Applicable usecasesRelevant usecases from HEPCAL and the AWG joint list of usecases related to this requirement
17
GGF11, Hawai, June 7th 2004 - 17 Ongoing work on biomedical requirements in Europe… Life sciences Within the framework of EGEE Within the framework of Embrace, EC funded network of excellence dedicated to a grid for bioinformatics Health Healthgrid white paper publicly available on Healthgrid web site June 15th, 2004 Technical work to be done to extract requirements from it Drug discovery Use case: grid to address diseases of developing countries (Dengue, Malaria) Discussed at next Pharmagrid conference
18
GGF11, Hawai, June 7th 2004 - 18 Grids to address diseases of the developing world GRID-solutions may have unique capabilities to enable/enhance cross- organizational collaborations such as seem required to progress work in Rare Diseases, Diseases of the Developing World or even major research initiative such as Cancer The goal is to increase the chances and reduce time to market to develop new and better drugs and vaccines against the deadliest disease on earth Mean: introduce wherever relevant grid technology and resources Genomics research (comparative analysis, data access) In silico drug discovery (virtual screening on compounds DB) Data collection and analysis in plagued areas (DB federation) First step: large-scale virtual screening on P. Falciparum protein targets Endhorsed by the Healthgrid association (http://www.healthgrid.org) Discussed at the next Pharmagrid conference (http://www.pharmagrid.com) Contact points: V. Breton (breton@clermont.in2p3.fr), M. Hofmann (martin.hofmann@scai.fhg.de)breton@clermont.in2p3.frmartin.hofmann@scai.fhg.de
19
GGF11, Hawai, June 7th 2004 - 19 Next steps EGEE requirements process will go on for the next 2 years Creation of a database relating requirements, use cases and test cases (J. Montagnat): format may still evolve Biomedical requirements and related use cases accessible at http://egee- na4.ct.infn.it/requirements/http://egee- na4.ct.infn.it/requirements/ Proposal: use EGEE templates and requirements database as a basis for LSG-RG work on requirements Healthgrid white paper describes use cases and addresses issues (security, legal) relevant to grid application to healthcare Use cases in mammography, radiation therapy, epidemiology, … Proposal: start from White Paper to complete existing list of requirements Healthgrid/Biomedical informatics satellite workshop at GGF12 to connect to LSG-RG meeting in Brussels
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.