Prototyping production and analysis frameworks for LHC experiments based on LCG, EGEE and INFN-Grid middleware CDF: DAG and Parametric Jobs ALICE: Evolution of the job submission schema BaBar: Integration of a running experiment C. Aiftimiei, D. Andreotti, S. Andreozzi, S. Bagnasco, S. Belforte, D. Bonacorsi, A. Caltroni, S. Campana, P. Capiluppi, A. Cavallid. Cesini, V. Ciaschini, M. Corvo, F. Dellipaoli, A. De Salvo, F. Donno, G. Donvito, A. Fanfani, S. Fantinel, T. Ferrari, A. Ferraro, E. Ferro, L. Gaido, D. Galli, A. Ghiselli, F. Giacomini, C. Grandi, A. Guarise, D. Lacaprara, D. Lucchesi, L Luminari, E. Luppi, G. Maggi, U. Marconi, M. Masera, A. Masoni, M. Mazzucato, E. Molinari, L. Perini, F. Prelz, D. Rebatto, S. Resconi, E. Ronchieri, G. Rubini, D. Salomoni, A. Sciaba, M. Selmi, M. Sgaravatto, L. Tomassetti, V. Vagnoni, M. Verlato, P. Veronesi, M. C. Vistoli In order to be totally supported by gLite middleware, CDF gLiteCaf need to submit to a MS gLite 1.4. Tipical CDF jobs that need to be submitted to the WMS are DAG or Parametric job constituting by segments. We try to investigate the time spent by the Resource Broker to submit jobs of variable number of sections. DAG via Network Server The submission was done via a standard glite-job-submit command for DAG jobs constituted by increasing number of sections without any InputSandbox transfer. It takes more or less 2 seconds per segment. DAG via WMProxy The submission was done via the new bulk submission command glite-wms-job-sybmit command for DAG jobs constituted by increasing number of sections. It takes seconds per segment. We notice that there is an increase in time spent for higher number of segment. Parametric jobs via WMProxy The submission was done via glite-wms-job-submit command for Parametric jobs constituted by increasing number of sections. It takes more or less 0.8 seconds per segment. The trend for higher number of segment is still present. DAG via WMProxy Parametric job via WMProxy LCG Site EDG CE WN AliEn LCG SE LCG RB Server Interface Site AliEn CE LCG UI AliEn SE Job request Status report Replica Catalogue Data Registration Data Catalogue Data Registration grid012.to.infn.it The INFN ECGI Activity LCG Site EDG CE WN JobAgent LCG SE LCG RB TQ VO-Box SCA SA Job request LFC SURL Registration File Catalogue LFN Registration PackMan LCG Site EDG CE WN JobAgent LCG SE TQ VO-Box SA Job request LFC SURL Registration File Catalogue LFN Registration PackMan Configuration request LCG RB Broker plugin Configuration request A whole Grid (namely, a Resource Broker) is seen by the server as a single AliEn CE, and the whole Grid storage is seen as a single, large AliEn SE. A set of interface services is run on a dedicated machine at each site, (mostly) providing common interfaces to underlying site services The Resource Broker can (possibly trough a generic plugin mechanism) directly request jobs from the Task Queue (or any other VO Job Database) The production of simulated events for the BaBar experiment is a distributed task based on a system of 25 computer farms located in 5 countries, for a total of about 1000 CPUs. Each remote site is managed by a local production manager. An object-oriented database (Objectivity) is used to read detector conditions and calibration constants. BaBar simulation jobs involve the mixing of real random trigger data with Monte-Carlo generated events in order to have a realistic simulation of the detector backgrounds. This background files and data produced by the simulation (Event Store) are stored in ROOT I/O format. A typical simulation jobs produces 2000 events, each of them takes about 10 s and results in 20KB of storage In 2001 the BaBar computing group started to evaluate the possibility to evolve toward a distributed computing model in a Grid environment. A Resource Broker specific for the Babar computing needs was installed. The Monte-Carlo BaBar software has been packaged in a tar archive of about 37 MB (130 MB uncompressed) and installed on each WN using standard LCG tools. Four INFN-Grid sites were configured to provide Objectivity and ROOT I/O services over the WAN using Advanced Multithreaded Server and Xrootd protocols. Sites were geographically mapped to Objectivity databases: conditions and backgrounds were read by the WN either locally or remotely. Standard BaBar simulation production tools were installed on the User Interface and configured like in the standard BaBar simulation production setup. Jobs submitted to the BaBar RB installed in Ferrara are sent to CEs satisfying the requirements and distributed to WNs.The simulated events are stored in ROOT I/O files and transferred from the WN to the closest Storage Element. The first version – “Interface sites”Current deployment – site VO-BoxesNext step – Resource Broker Plugin The INFN ECGI (Experiment Computing Grid Integration) working group has been created to help LHC and other HEP experiments understand and test the functionalities and features offered by the new middleware developed in the context of the EGEE, LCG and INFN-Grid efforts, and to provide adequate user documentation. The group works in strict collaboration and coordination with the Experiment Integration Support at CERN and with the middleware developers. Below are some examples of the group’s activities over the last two years. Atlas: G-PBox tests G-PBox stands for Grid Policy Box and allows VO or site administrators to control/check the behavior of users that submit to the Grid. “PBoxes” are the basic elements of G-PBox, they: originate and distribute policies created by VO/site admins evaluate requests from Resources/Services (es. WMS or CE) contacted by the user Through the coupling of G-PBox with WMS and VOMS services, VO policies can be used e.g. for allowing to: allow a VO group to submit to high priority queues force VO WMSs to submit to more powerful sites (T1) during data challenges ban a set of VO users ATLAS test aimed at verifying a possible use of G-PBox to give higher priority to some users within a VO. Two groups (“analysis” and “production”) were created into the ATLAS VOMS server, using a VOMS server hosted at CNAF instead of the production one, and were then populated with a small number of users. The policies set into the P-Box server were of the kind: group “analysis” can submit only to low priority queues while group “production” can submit to both high and low priority queues. It was necessary to modify all involved sites batch systems in order to create all the necessary queues for the ATLAS VO. Priority was published through the glue schema field GlueCEPolicyPriority already present in the glue schema 1.2 used in the EGEE grids: High Priority : GlueCEPolicyPriority=1 Low Priority : GlueCEPolicyPriority=0 VOMS server Group A Group B Group C G-PBox CE HIGH CE LOW WMS Policies Group A : high and lowpriority CEs Group B : low priority CEs Group C : deny everywhere cert-rb-03 (gLite WMS+LB) cert-pbox-01 (Pbox server) cert-voms-01 (gLite1.4.1 VOMS server) cert-ui-01 (gLite 1.4 UI) CNAF Production INFNGRID PADOVA Production INFNGRID CNAF Certification INFNGRID ROMA1 Production INFNGRID cert-bdii-01 (INFNGRID BDII) MILANO Production INFNGRID atlas_low_1 atlas_low_2 atlas_high_1 atlas_high_2a atlas_low atlas_low1 atlas_low2 atlas_low3 atlas_high atlas_high1 atlas_high2 atlas_high3 atlas_low1 atlas_high1 atlas_low2 atlas_high2 atlas_low3 atlas_high3 NOT USED DURING ATLAS TEST: UI WAS IN MILANO atlas_low01 atlas_low02 atlas_low03 atlas_low04 atlas_low05 atlas_high01 atlas_high02 atlas_high03 atlas_high04 atlas_high05 The final testbed involved INFN certification services (gLite WMS, LCG BDII, gLite VOMS, gLite G-PBox server) and four INFNGRID production sites. A total of 518 “production” and 915 “analysis” jobs were submitted through the ATLAS submission system (Lexor). As expected all “analysis” jobs were directed and executed to low priority queues, while “production” jobs reached all type of queues, showing the feasibility of priority management, and the G-PBox server stability in near-real production stress conditions. NOT USED DURING THE TEST