INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Report from PTF Fabrizio Pacini EGEE JRA1 IT/CZ Meeting Bologna, May 2005
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Outline Requirements evaluation Requirements prioritisation PTF sub-groups
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Requirements More than 400 requirements have been inserted in the DB by NA4/SA1/JRA3 Requirements have been assigned to the PTF members of the relevant development clusters Requirements have been evaluated and assigned with a status: –Unsatisfied –Partially satisfied –Mostly satisfied –Satisfied They will be verified by the JRA1/NA4 testing teams
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Requirements More than 75 requirements have been assigned to IT/CZ I have provided feedback and assigned a status to all of them Working group on application requirements (Biomed, HEP, ESR, EGEODE, CompChem, Magic, Planck, Egrid) has produced a prioritisation (biomed-biased) of the unsatisfied requirements Comments to JRA1 work plan have also been provided
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Requirements Short jobs and efficiency: #100477: Performance of short jobs submission #100479: Middleware overhead for short jobs #100480: Robustness to job submission avalanches #100481: Scalability of the workload management system #100482: Specification of job execution time for scheduling #100483: Job submission efficiency #100501: Job requirements for scheduling #100502: Scalability of jobs submission; #100503: Submission of jobs with data efficiency Communication between UI and WN #100496: Communication between worker nodes and user interface
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Requirements MPI: #100476: Parallel job submission independence to file system Others: The support for “pilot jobs” as expressed in the workplan is more broadly covered by the requirements on short jobs listed above: this is a way to ensure efficient jobs submission and bypassing many middleware overheads. Passing job parameters to local batch system (SA1) #100516: Submission of multiple data jobs using groups of files (Partitioning on input data) Defects fixing towards stability, stability ….
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Use Cases Communication between user interface and worker nodes An interactive program needs to connect to a running job: The user submits a job, indicating a port on which it expects to connect The interactive program is started 'out of the grid‘ The interactive program connects to the worker node running the job on the specified port The interactive program and the job can exchange any kind of data on the opened channel
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Use Cases Parallel job submission User submits an MPI job: The user writes a JDL specifying the nature of the job and the number of CPUs expected. If there are not enough CPUs available to the user, the job fails with a clear error message (preferably indicating the maximum number of CPUs accessible). Otherwise, a target cluster is selected and the job is started. The job is deployed on the target host independently of the shared or non-shared nature of the local filesystem (binaries and input files may need to be copied on all target WNs or not depending on this) The job is executed.
Enabling Grids for E-sciencE INFSO-RI EGEE JRA1 IT/CZ Meeting, May 2005, Bologna Sub-groups Requirements sub-group Interfaces sub-group Web Services sub-group Representatives from all clusters except IT/CZ Is there any volunteer?