Presentation is loading. Please wait.

Presentation is loading. Please wait.

Santiago González de la Hoz on behalf of ATLAS DC2 Collaboration EGC 2005 Amsterdam, 14/02/2005.

Similar presentations


Presentation on theme: "Santiago González de la Hoz on behalf of ATLAS DC2 Collaboration EGC 2005 Amsterdam, 14/02/2005."— Presentation transcript:

1 Santiago.Gonzalez@ific.uv.es Santiago González de la Hoz (Santiago.Gonzalez@ific.uv.es) on behalf of ATLAS DC2 Collaboration EGC 2005 Amsterdam, 14/02/2005 k IFIC Instituto de Física Corpuscular CSIC-Universitat de València Spain ATLAS Data Challenge 2: A massive Monte Carlo Production on the Grid

2 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 1  Introduction oATLAS experiment oData Challenge program  ATLAS production system  DC2 production phases oThe 3 Grid flavours (LCG, GRID3 and NorduGrid) oATLAS DC2 production  Distributed analysis system  Conclusions Overview

3 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 2 LHC (CERN) Introduction: LHC/CERN Mont Blanc, 4810 m Geneva

4 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 3 The challenge of the LHC computing Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs

5 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 4 Introduction: ATLAS  Detector for the study of high- energy proton-proton collision.  The offline computing will have to deal with an output event rate of 100 Hz. i.e 10 9 events per year with an average event size of 1 Mbyte.  Researchers are spread all over the world.

6 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 5  Scope and Goals: oIn 2002 ATLAS computing planned a first series of Data Challenges (DC’s) in order to validate its:  Computing Model  Software  Data Model  The major features of the DC1 were: oThe development and deployment of the software required for the production of large event samples oThe production of those samples involving institutions worldwide.  ATLAS collaboration decided to perform the DC2 and in the future the DC3 using the Grid middleware developed in several Grid projects (Grid flavours) like: oLHC Computing Grid project (LCG), to which CERN is committed oGRID3 oNorduGRID Introduction: Data Challenges

7 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 6 ATLAS production system  The production database, which contains abstract job definitions;  The windmill supervisor that reads the production database for job definitions and present them to the different GRID executors in an easy-to-parse XML format;  The Executors, one for each GRID flavor, that receive the job- definitions in XML format and convert them to the job description language of that particular GRID;  Don Quijote, the Atlas Data Management System, moves files from their temporary output locations to their final destination on some Storage Element and registers the files in the Replica Location Service of that GRID.  In order to handle the task of ATLAS DC2 an automated production system was designed  The ATLAS production system consists of 4 components:

8 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 7 DC2 production phases Hits MCTruth Digits (RDO) MCTruth Bytestream Raw Digits ESD Geant4 Reconstruction Pile-up Bytestream Raw Digits Bytestream Raw Digits Hits MCTruth Digits (RDO) MCTruth Physics events Events HepMC Events HepMC Hits MCTruth Digits (RDO) MCTruth Geant4 Digitization Digits (RDO) MCTruth Bytestream Raw Digits Bytestream Raw Digits Bytestream Raw Digits Events HepMC Hits MCTruth Geant4 Pile-up Digitization Mixing Reconstruction ESD Pythia Event generation Detector Simulation Digitization (Pile-up) Reconstruction Event Mixing Byte stream Events HepMC Min. bias Events Piled-up events Mixed events With Pile-up Task Flow for DC2 data ~5 TB 20 TB30 TB 20 TB5 TB TB Volume of data for 10 7 events Persistency: Athena-POOL

9 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 8 DC2 production phases ProcessNo. of eventsEvent/ size CPU powerVolume of data MBkSI2k-sTB Event generation 10 7 0.06156 Simulation 10 7 1.950430 Pile-up/ Digitization 10 7 3.3/1.9~144/16~35 Event mixing & Byte- stream 10 7 2.0~5.4~20  The ATLAS DC2 which started in July 2004 finished the simulation part at the end of September 2004.  10 million events (100000 jobs) were generated and simulated using the three Grid Flavors: oThe Grid technologies have provided the tools to generate a large Monte Carlo simulation samples  The digitization and Pile-up part was completed in December. The pile-up was done on a sub-sample of 2 M events.  The event mixing and byte-stream production are going on

10 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 9 The 3 Grid flavors  LCG (http://lcg.web.cern.ch/LCG/) is to prepare the computing infrastructure for the simulation, processing and analysis of LHC data for all four of the LHC collaborations. oThe job of the LHC Computing Grid Project – LCG – is to prepare the computing infrastructure for the simulation, processing and analysis of LHC data for all four of the LHC collaborations. This includes both the common infrastructure of libraries, tools and frameworks required to support the physics application software, and the development and deployment of the computing services needed to store and process the data, providing batch and interactive facilities for the worldwide community of physicists involved in LHC.  NorduGrid (http://www.nordugrid.org/) is to deliver a robust, scalable, portable and fully featured solution for a global computational and data Grid system oThe aim of the NorduGrid collaboration is to deliver a robust, scalable, portable and fully featured solution for a global computational and data Grid system. NorduGrid develops and deploys a set of tools and services – the so-called ARC middleware, which is a free software.ARC middleware  Grid3 (http://www.ivdgl.org/grid2003/) has deployed an international Data Grid oThe Grid3 collaboration has deployed an international Data Grid with dozens of sites and thousands of processors. The facility is operated jointly by the U.S. Grid projects iVDGL, GriPhyN and PPDG, and the U.S. participants in the LHC experiments ATLAS and CMS.  Both Grid3 and NorduGrid have similar approaches using the same foundations (GLOBUS) as LCG but with slightly different middleware.

11 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 10 The 3 Grid flavors: LCG 82 sites, 22 countries (This number is evolving very fast) 6558 TB ~7269 CPUs (shared)  This infrastructure has been operating since 2003.  The resources used (computational and storage) are installed at a large number of Regional Computing Centers, interconnected by fast networks.

12 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 11 The 3 Grid flavors: NorduGRID 11 countries, 40+ sites, ~4000 CPUs, ~30 TB storage  NorduGrid is a research collaboration established mainly across Nordic Countries but includes sites from other countries.  They contributed to a significant part of the DC1 (using the Grid in 2002).  It supports production on non-RedHat 7.3 platforms

13 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 12 The 3 Grid flavors: GRID3  The deployed infrastructure has been in operation since November 2003  At this moment running 3 HEP and 2 Biological applications  Over 100 users authorized to run in GRID3 Sep 04 30 sites, multi-VO shared resources ~3000 CPUs (shared)

14 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 13 ATLAS DC2 production on: LCG, GRID3 and NorduGrid # Validated Jobs total Day  G4 simulation

15 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 14 Typical job distribution on: LCG, GRID3 and NorduGrid

16 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 15 Distributed Analysis system: ADA  The physicists want to use the Grid to perform the analysis of the data too.  ADA (ATLAS Distributed Analysis) project aims at putting together all software components to facilitate the end-user analysis.  The ADA architecture  DIAL: It defines the job components (dataset, task, applications, etc..). Together with LSF or Condor provides “interactivity” ( a low response time).  ATPROD: production system to be used for low mass scale  ARDA: Analysis system to be interfaced to EGEE middleware

17 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 16  Main problems oThe production system was in development during DC2 phase. oThe beta status of the services of the Grid caused troubles while the system was in operation  For example, the Globus RLS, the Resource Broker and the information system were unstable at the initial phase. oSpecially on LCG, lack of uniform monitoring system. oThe mis-configuration of sites and site stability related problems.  Main achievements oTo have an automatic production system making use of Grid infrastructure. o6 TB (out of 30 TB) of data have been moved among the different Grid flavours using Don Quijote servers. o235000 jobs were submitted by the production system o250000 logical files were produced and 2500-3500 jobs per day distributed over the three Grid flavours per day. Lessons learned from DC2

18 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 17  The generation and simulation of events for ATLAS DC2 have been completed using 3 flavours of Grid Technology. oThey have been proven to be usable in a coherent way for a real production and this is a major achievement.  This exercise has taught us that all the involved elements (Grid middleware, production system, deployment and monitoring tools) need improvements.  Between the start of DC2 in July 2004 and the end of September 2004 (it corresponds G4-simulation phase), the automatic production system has submitted 235000 jobs, they consumed ~1.5 million SI2K months of cpu and produced more than 30TB of physics data.  ATLAS is also pursuing a model for distributed analysis which would improve the productivity of end users by profiting from Grid available resources. Conclusions

19 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 18 Backup Slides

20 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 19 Supervisor-Executors Windmill numJobsWanted executeJobs getExecutorData getStatus fixJob killJob Jabber communication pathway executors Prod DB (jobs database) execution sites (grid) 1. lexor 2. dulcinea 3. capone 4. legacy supervisors execution sites (grid) Don Quijote (file catalog)

21 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 20 NorduGRID: ARC features  ARC is based on Globus Toolkit with core services replaced oCurrently uses Globus Toolkit 2  Alternative/extended Grid services: oGrid Manager that  Checks user credentials and authorization  Handles jobs locally on clusters (interfaces to LRMS)  Does stage-in and stage-out of files oLightweight User Interface with built-in resource broker oInformation System based on MDS with a NorduGrid schema oxRSL job description language (extended Globus RSL) oGrid Monitor  Simple, stable and non-invasive

22 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 21 LCG software  LCG-2 core packages: oVDT (Globus2, condor) oEDG WP1 (Resource Broker, job submission tools) oEDG WP2 (Replica Management tools) + lcg tools  One central RMC and LRC for each VO, located at CERN, ORACLE backend oSeveral bits from other WPs (Config objects, InfoProviders, Packaging…) oGLUE 1.1 (Information schema) + few essential LCG extensions oMDS based Information System with significant LCG enhancements (replacements, simplified (see poster)) oMechanism for application (experiment) software distribution  Almost all components have gone through some reengineering orobustness oscalability oefficiency oadaptation to local fabrics  The services are now quite stable and the performance and scalability has been significantly improved (within the limits of the current architecture)

23 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 22 Grid3 software  Grid environment built from core Globus and Condor middleware, as delivered through the Virtual Data Toolkit (VDT) oGRAM, GridFTP, MDS, RLS, VDS  …equipped with VO and multi-VO security, monitoring, and operations services  …allowing federation with other Grids where possible, eg. CERN LHC Computing Grid (LCG) oUSATLAS: GriPhyN VDS execution on LCG sites oUSCMS: storage element interoperability (SRM/dCache)  Delivering the US LHC Data Challenges

24 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 23 ATLAS DC2 (CPU)

25 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 24 Typical job distribution on LCG

26 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 25 Typical Job distribution on Grid3

27 Amsterdam, 14-Feb-2005 EGC 2005 Santiago.Gonzalez@ific.uv.es 26 Jobs distribution on NorduGrid


Download ppt "Santiago González de la Hoz on behalf of ATLAS DC2 Collaboration EGC 2005 Amsterdam, 14/02/2005."

Similar presentations


Ads by Google