Download presentation
Presentation is loading. Please wait.
Published byOwen Morrison Modified over 8 years ago
1
EXPERIENCE WITH ATLAS DISTRIBUTED ANALYSIS TOOLS S. González de la Hoz (Santiago.Gonzalez@ific.uv.es), L. March (Luis.March@ific.uv.es) IFIC, Instituto de Física Corpuscular, Centro Mixto Universitat de València – CSIC, Edificio Institutos de Investigación, Apartado de Correos 22085, E-40671 Valencia, Spain D. Liko (Dietrich.Liko@cern.ch) CERN, European Organization for the Nuclear Research, 1211 Genève 23, Switzerland Experience running the Analysis PRODUCTION SYSTEM EXPERIENCE: - Analysis has been done running our own supervisor and Lexor/CondorG instance. - Delays due to data transfer are not an issue any more because AOD input is available on-site and jobs are sent to those sites only. - System setup is not able yet to support long queues (simulation) and short queues (analysis) in parallel: - queues are filled with simulation jobs - long pending times for analysis jobs - The analysis has been launched over 350000. - Z H ttbar reconstructed masses, after merging the histogram files, were produced through the ATLAS production system and GANGA (“a la Grid”) with 100 input files each. - With free resources, system was able to process 10k-event jobs in about 10 min (total). One datasets was used: - 50 events per file, a total of 400 files. - Jobs with 100 input files each were defined with ATCOM and GANGA. - these jobs ran in several LCG sites. Each job produced three output files (ntuple, histogram and log) stored at Castor. - ROOT has been used to merge these histogram output files, in a post-processing step. Contact: Santiago González de la Hoz Astroparticle, Particle, Space Physics, Detectors and Medical Physics Applications 8-12 October 2007, Villa Olmo (COMO), Italy The production database, which contains abstract job definitions; The Eowyn supervisor that reads the production database for job definitions and present them to the different Grid executors in an easy-to-parse XML format; The Executors, one for each Grid flavor, that receive the job-definitions in XML format and convert them to the job description language of that particular Grid; DDM, the Atlas Distributed Data Management System, moves files from their temporary output locations to their final destination on some Storage Element and registers the files in the Replica Location Service of that Grid. ATLAS Production System (ProdSys) In order to handle the task of ATLAS Data Challenges, an automated production system was designed. The ATLAS production system consists of 4 components The ATLAS production system has been successfully used to run production of ATLAS jobs at an unprecedented scale. On successful days there were more then 10000 jobs processed by the system. The experiences obtained operating the system, which includes several grid systems, are considered to be essential also to perform analysis using Grid resources. DDM Eowyn CondorG Panda/OSG 10 th ICATPP Conference onIntroduction Detector for the study of high-energy proton- proton collisions. The offline computing will have to deal with an output event rate of 200 Hz. i.e 10 9 events per year, with an average event size of 1.6 MB. In 2002 ATLAS computing planned a first series of Data Challenges (DC’s) in order to validate its: - Computing Model - Software - Data Model The ATLAS collaboration decided to perform the DCs using the Grid middleware developed in several Grid projects (Grid flavours) like: - LHC Computing Grid project (LCG), to which CERN is committed - OSG - NorduGRID Storage: - Raw recording rate 320 MBytes/sec - Accumulating at 5-8 PetaBytes/year - 20 PetaBytes of disk - 10 PetaBytes of tape Processing: - 40,000 of today’s fastest PCs Motivation for HEP-GRID solution ATLAS collaboration is preparing for data taking and analysis at the CERN LHC, scheduled to start operating in 2008. Physics studies in ATLAS will require analysis of data volumes of the order of PetaBytes per year. The analysis will reply on the computing resources and the data will be distributed over the world-wide collaborating institutions. These will be collected together and shared in a coordinated way using grid technology that provides the infrastructure required to facilitate the distributed of data and the pooling of computing and storage resources between these institutions. Setup for Distributed Analysis - Distributed Analysis Strategy The grid-based ATLAS distributed analysis aims to deal with the challenge of supporting distributed users, data and processing enabling physicists to exploit the whole computing resource provided by the three ATLAS grid infrastructures: LCG, OSG and Nordugrid. Distributed Analysis must support all the analysis activities, including the simulated data production, hiding users from the complexities of the grid environment. According to the ATLAS computing model, Distributed Analysis will enable users to submit jobs from any location helping them to effectively use the grid for performing their analysis activities. In addition, Distributed Analysis should satisfy the ATLAS analysis model requirement: data is distributed among several computing facilities and analysis jobs in turn routed base on the availability of relevant data. ATLAS strategy takes several approaches for Distributed Analysis to fully exploit its major grid deployments. Setup for Distributed Analysis Using latest version of Production System: - Supervisor: Eowyn - Executors: Condor-G, Lexor - Data Management: DDM and LFC catalog - Database: dedicated DA database Generic analysis transformation has been created: - compiles user code/package on the worker node - processes Analysis Object Data (AOD) input files - produces histogram + n-tuple file as outputs User Interface: AtCom4 The ATLAS Commander (ATCOM) was used as a graphical user interface. Currently used for task and job definitions: - task: contains summary information about the jobs to be run (input/output datasets, transformation parameters, resource + environment requirements, etc). - job: concrete parameters needed for running, but no Grid-specifics - Following the ProdDB schema and xml description The algorithm of choice has been a Z H ttbar, a heavy Z decaying into tops in the Little Higgs model. This dataset was made in the official production for the Exotics working group using the Athena full chain simulation. A total of 400 AOD´s were produced, each AOD containing 50 events (20000 events in total). The analysis has been performed using the production system and GANGA. Using GANGA: GANGA provides a set of ATLAS-specific features such as application configuration based on the Athena framework and input data location based on Distributed Data Management. It can be run either on the command line, with Python scripts or through a graphical interface A job in GANGA is constructed from a set of building blocks. All jobs have to specify the software to be run (application) and the processing system (back-end) to be used. GANGA EXPERIENCE: - The IFIC Tier-2 infrastructure was used to process jobs using our CE with dedicated queues for analysis jobs. The processing is started within a few minutes. Also jobs were sent to several LCG sites. In this case the waiting time to get the job executing were very long because of the CE queues were occupied by the production job. Hence, the deployed of the job priority mechanism is relevant important to take full advance from the whole grid infrastructure for distributed analysis. Concerning to GANGA, in terms of configuring, submitting, monitoring and output retrieving has demonstrated a good performance. However, error handling and recovery of failed jobs in the user analysis code needs to be improved by an automatic error parsing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.