Requirements for biomedical applications of grids V. Breton EGEE applications activity manager CNRS.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Data Management Expert Panel - WP2. WP2 Overview.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Android Security Enforcement and Refinement. Android Applications --- Example Example of location-sensitive social networking application for mobile phones.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
Understanding Android Security Yinshu Wu William Enck, Machigar Ongtang, and PatrickMcDaniel Pennsylvania State University.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
EGEE is a project funded by the European Union under contract IST Risks of being on the Grid: the BioMedical challenge Yannick Legré CNRS/IN2P3.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Computer Emergency Notification System (CENS)
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ESafe Open Modules Overview Open modules implementing the eSafe document exchange protocol.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 1 ECHO Extended Services February 15, Agenda Review of Extended Services Policy and Governance ECHO’s Service Domain Model How to…
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
WP 10 ATF meeting April 8, 2002 Data Management and security requirements of biomedical applications Johan Montagnat - WP10.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
State of Georgia Release Management Training
E-Science Security Roadmap Grid Security Task Force From original presentation by Howard Chivers, University of York Brief content:  Seek feedback on.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
WP10 Goals and accomplishments from WP10 point of view J. Montagnat, CNRS, CREATIS V. Breton, CNRS/IN2P3 DataGrid Biomedical Work Package.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Earth Observation inputs to ATF Annalisa Terracina EU-DataGrid Project Work Package 9 – EO Applications April 2003 CERN.
Bob Jones EGEE Technical Director
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Walter Binder Giovanna Di Marzo Serugendo Jarle Hulaas
Joint JRA1/JRA3/NA4 session
Network Requirements Javier Orellana
Leigh Grundhoefer Indiana University
Cloud computing mechanisms
Understanding Android Security
Presentation transcript:

Requirements for biomedical applications of grids V. Breton EGEE applications activity manager CNRS

GGF11, Hawai, June 7th Content From use cases to requirements  DataGrid  EGEE EGEE/DataGrid list of requirements  e sciences  Healthcare  Drug discovery What to do next :  Existing on-going activities  Connection to GGF LSG  Preparation of GGF12 in Brussels Satellite biomedical informatics workshop

GGF11, Hawai, June 7th Importance of requirements process for grid deployment projects DataGrid and EGEE are grid deployment projects  DataGrid ( ) for High Energy Physics, Biomedical and Earth Observation applications  EGEE ( ) starts with High Energy Physics and biomedical applications but soon will open to other research fields (geophysics, computational chemistry, astrophysics, drug discovery, earth observation, …) Application requirements early identified as a necessity to develop a middleware meeting user needs  different user communities have different requirements  Necessity to collect, classify and rank user requirements

GGF11, Hawai, June 7th Requirements flow Project technical groups: middleware developers, security group, network Project technical forum Ranking Application working group Classification Collection Test team Generation of Test cases High Energy Physics Biomedical Earth observation Expression User communities Implementation

GGF11, Hawai, June 7th Requirements classification Requirements are classified  1. User interface  2. Job submission  3. Data management  4. Information system  5. Storage  6. Network  7. Security  8. Operation Requirements are related to elementary use cases  Initial list of use cases relevant for high energy physics: LCG High Energy Physics Common Application Layer (HEPCAL):  Additional list of use cases for earth observation and biomedical applications EDG Application Working Group joint list of usecases:

GGF11, Hawai, June 7th User Interface requirements 1.1 User interface installation and portability  The user interface is a lightweight component. It should be installable on various systems with a minimal effort. 1.2 User Interface configuration.  The user interface should be easy to configure. A pointer to an information service server should be sufficient for the user interface to get all the configuration information it requires. A default information service is expected. 1.3 Application Programming Interface.  The application programming interface to middleware services should be available in C++ and Java. Its installation should be lightweight. It should cover all services accessible from the user interface host and the worker node: jobs submission, data manager, information system, etc. 1.4 Group/anonymous login (portals)  Anonymous access to restricted services is required for users accessing the grid through specialised grid portals (e.g. bioinformatics algorithms).

GGF11, Hawai, June 7th Job submission requirements (1/2) 2.1 Short jobs execution  The middleware should permit the execution of a large numbers (thousands) or short jobs (minutes at most) without introducing a prohibitive pay-off. 2.2 Parallel jobs execution  Parallel execution is mandatory for some applications. The middleware is expected to enable parallel job submission (by specifying a number of hosts to allocate at submission time, at least at a site scale) and provide a message passing interface (preferably MPI). 2.3 Prioritised jobs execution  It should be possible for a restricted category of users (e.g. surgeons) to order high priority jobs that will execute immediately, pre-empting resources if needed. 2.4 Multiple data jobs  It should be possible to execute jobs on an input datasets (i.e. one job repeated as many time as the number of input files). 2.5 Compound jobs execution (pipelining)  It should be possible to execute compound (or pipelined) jobs: jobs composed of multiple unitary tasks with any directed graph of execution flow. The mechanism should allow translation of output data sets into input data set of consecutive tasks. The mechanism is expected to handle full input datasets 2.6 Interactive jobs  It should be possible to execute interactive jobs (jobs with a communication between the execution host and the user interface). The communication may be shell-based or application- specific (it should be possible to open a socket to transfer interactive feedback according to the application protocol). Resources reservation may be needed to ensure that interactive jobs are started at a precise time (the user has to be available when the interactive application starts).

GGF11, Hawai, June 7th Job submission requirements (2/2) 2.7 Job access to data  It should be possible to specify data required by a job. The job submission mechanism should ensure that the data is accessible without further work once the job is started (automatic data replication, etc). 2.8 Simple job submission  An authorized user should be able to submit jobs. It should be possible to register the job output before the job is terminated to enable future accessibility. At submission time, it should be possible to specify the amount of resources needed (CPU time, memory usage, disk space needed), and the environment the job is to be executed in. 2.9 Job execution control  It should be possible for a user to control the target (or possible targets) for a submitted job Job killing  It should be possible for the user that has submitted a job to kill it Resources reservation  It should be possible to make advance reservation of resources and to update such reservation prior to job execution Scalability  Thousands of jobs at least.

GGF11, Hawai, June 7th Data management requirements (1/2) 3.1 File names translation  Logical file names given in the job command line should be automatically translated in physical file names when the job get executed on a given worker node. The user should not have to write explicit application code to translate logical file names into usable instances. 3.2 File access interface  A POSIX like gridopen/read/write/close interface to access grid files is expected. It should hide the complexity of data fetching to the user. If this interface enable access to files through their logical name, it fulfils requirement Fine grain control of access rights  Access control to files, metadata, and group of files should be possible at individual and group levels (through Access Control List-like mechanisms). 3.4 Group of files  It should be possible to create a logical collection (or group) of files (a logical collection relates to a set of logical files). It should be possible to control file access rights for groups and to submit a job that require one or more groups as input (see R2.4). 3.5 Metadata associated to files  It should be possible to associate metadata (stored in databases) to files. These metadata should (by default) have the same access pattern and life time than the data they are associated to. It should however be possible to give different access rights to files and associated metadata when needed. 3.6 Access rights delegation  It should be possible for a user with read/write access to a data or metadata to grant the same access (without grant option) to another user. 3.7 Data updates and versioning  It should be possible to update data (read-only datasets is not possible for some applications). Data versioning is expected for some applications.

GGF11, Hawai, June 7th Data management requirements (2/2) 3.8 File name changes  It should be possible for an authorized user to change the logical file name of a file or a logical collection name. 3.9 Data replication control  It should be possible to replicate a data from one storage to another explicitely. It should be possible to control the possible places where a dataset can be replicated in case of automatic replication Data registration, retrieval, and deletion  It should be possible to register a local file on grid storage. Conversely it should be possible to retrieve grid files on local disk space. It should be possible to delete files previously registered (all physical instances plus the logical name registered) Data access cost estimation  It should be possible to estimate the time needed to read a file from a specific, or the best SE Partial file access  It should be possible to read only part of a registered file File browsing  It should be possible for a user to browse the file or the metadata he is authorized to read Scalability  Millions of files at least.

GGF11, Hawai, June 7th Information system requirements 4.1 Jobs information and status notification  It should be possible for a user to list its running jobs. For each job it should be possible to be notified of the job status progress through an API to enable jobs monitoring by an application (without active polling nor non- programmable status reporting such as through s). 4.2 Top level information system index  It should be possible to obtain information about the toplevel information systems through a permanent information system index. 4.3 Resource brokers index  It should be possible to obtain information about available resource brokers through a permanent index. 4.4 Grid resource browsing  A user should be able to browse the grid resources (VOs, RBs, CEs, SEs, etc) he has access to.

GGF11, Hawai, June 7th Storage requirements 5.1 On-disk encryption  It should be possible to encrypt data on disk to prevent data leaks at the storage site level. 5.2 Interface to new storage systems  It should be possible to implement an interface to new storage system to make them interoperable with the grid middleware. This implies the availability of a standard grid storage interface with flexibility to endorse various access control mechanisms. 5.3 Hook on data privacy manager  The grid storage interface should enable application specific access control policy by providing a hook to application code when performing data access control.

GGF11, Hawai, June 7th Network requirements 6.1 Communications encryption  It should be possible to encrypt data prior to communication between sites to prevent third party listening. 6.2 Outbound connectivity  Applications will need to access services external to the grid middleware. It should be possible to establish a communication between a grid node and non-grid hosts on any port. 6.3 Guaranteed bandwidth  Some applications (e.g. interactive) have a need for a guaranteed network bandwidth.

GGF11, Hawai, June 7th Security requirements 7.1 Data privacy  See 5.1 and 6.1: it should be possible to enforce data privacy through encryption at all levels. 7.2 User privacy  It should be possible for a user to hide its usage of the grid. 7.3 Computations privacy  It should be possible to hide the kind of computation and the datasets that are used in an experiment to external users. 7.4 Data encryption and encryption keys  It should be possible to protect data by encryption (see 7.1). It should be possible to ensure that encryption keys are kept by an application specific service (on a trusted site).

GGF11, Hawai, June 7th Operation requirements 8.1 VO creation  Creation of new VO and propagation of VO rights to existing sites should be a simple process to ease new applications integration. 8.2 User control  It should be possible to grant access to an authorized user and to revoke a user. 8.3 User login  An authorized user with a valid certificate should be able to login and get success confirmation. 8.4 Software package publication  It should be possible to make new version of software packages available on the grid. 8.5 Robustness  The system should be able to handle thousands of jobs and multiple user loads without crashing.

GGF11, Hawai, June 7th Template to describe requirements IdentifierReference number DescriptionA comprehensive description of the requirement, without implementation detail AreaResearch community expressing requirement : High Energy Physics, biomedical,… PriorityOne of: - Critical: absolutely blocking for applications - Very high: required by all means, very difficult for applications to do without it - High: highly desirable, very difficult for some applications to do without it - Medium: expected, difficult for some applications to do without it - Helpful: broad usage service, would ease applications development - Low: long term requirement, not critical for any application - Very low: thing to think about in a long term perspective Requirement datePM? (a pilot application has to be ready by this date) Implementation datePM? (the requirement has to be fulfiled by the middleware at this date) Implementation proposal to explain how the requirement is supposed to be implemented Status trackingA dated list of milestones toward the requirement completion Applicable usecasesRelevant usecases from HEPCAL and the AWG joint list of usecases related to this requirement

GGF11, Hawai, June 7th Ongoing work on biomedical requirements in Europe… Life sciences  Within the framework of EGEE  Within the framework of Embrace, EC funded network of excellence dedicated to a grid for bioinformatics Health  Healthgrid white paper publicly available on Healthgrid web site June 15th, 2004  Technical work to be done to extract requirements from it Drug discovery  Use case: grid to address diseases of developing countries (Dengue, Malaria)  Discussed at next Pharmagrid conference

GGF11, Hawai, June 7th Grids to address diseases of the developing world GRID-solutions may have unique capabilities to enable/enhance cross- organizational collaborations such as seem required to progress work in Rare Diseases, Diseases of the Developing World or even major research initiative such as Cancer The goal is to increase the chances and reduce time to market to develop new and better drugs and vaccines against the deadliest disease on earth Mean: introduce wherever relevant grid technology and resources  Genomics research (comparative analysis, data access)  In silico drug discovery (virtual screening on compounds DB)  Data collection and analysis in plagued areas (DB federation) First step: large-scale virtual screening on P. Falciparum protein targets Endhorsed by the Healthgrid association ( Discussed at the next Pharmagrid conference ( Contact points: V. Breton M. Hofmann

GGF11, Hawai, June 7th Next steps EGEE requirements process will go on for the next 2 years  Creation of a database relating requirements, use cases and test cases (J. Montagnat): format may still evolve  Biomedical requirements and related use cases accessible at na4.ct.infn.it/requirements/ na4.ct.infn.it/requirements/  Proposal: use EGEE templates and requirements database as a basis for LSG-RG work on requirements Healthgrid white paper describes use cases and addresses issues (security, legal) relevant to grid application to healthcare  Use cases in mammography, radiation therapy, epidemiology, …  Proposal: start from White Paper to complete existing list of requirements Healthgrid/Biomedical informatics satellite workshop at GGF12 to connect to LSG-RG meeting in Brussels