The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.

Slides:

Advertisements

Similar presentations

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Advertisements

GRID INTEROPERABILITY USING GANGA Soonwook Hwang (KISTI) YoonKee Lee and EunSung Kim (Seoul National Uniersity) KISTI-CCIN2P3 FKPPL Workshop December 1,

CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

K. Harrison CERN, 15th May 2003 GANGA: GAUDI/ATHENA AND GRID ALLIANCE - Development strategy - Ganga prototype - Release plans - Conclusions.

The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.

Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.

INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.

DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.

Grid Workload Management Massimo Sgaravatto INFN Padova.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.

Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.

Summary Distributed Data Analysis Track F. Rademakers, S. Dasu, V. Innocente CHEP06 TIFR, Mumbai.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.

EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.

CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.

INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.

Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.

ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.

DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.

David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.

ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.

Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

K. Harrison CERN, 21st February 2005 GANGA: ADA USER INTERFACE - Ganga release Python client for ADA - ADA job builder - Ganga release Conclusions.

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.

INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.

ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

EXPERIENCE WITH ATLAS DISTRIBUTED ANALYSIS TOOLS S. González de la Hoz L. March IFIC, Instituto.

Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.

DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.

Workload Management Workpackage

POW MND section.

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Readiness of ATLAS Computing - A personal view

LCG middleware and LHC experiments ARDA project

Presentation transcript:

The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community

Overview Distributed Analysis in ATLAS  Grids, Computing Model The ATLAS Strategy  Production system  Direct submission Common Aspects  Datamanagement  Transformation  GUI Initial experiences  Production system on LCG  PANDA in OSG  GANGA

ATLAS Grid Infrastructure Three grids  LCG  OSG  Nordugrid Significant resources, but different middleware  Teams working on solutions are typically associated to a grid and its middleware In principle ATLAS resources are available to all ATLAS users  Interest by our users to use their local systems in priority  Not only a central system, flexibility concerning middleware Poster 181: Prototype of the Swiss ATLAS Computing Infrastructure

Distributed Analysis At this point emphasis on batch model to implement the ATLAS Computing model  Interactive solutions are difficult to realize on top of the current middleware layer We expect our users to send large batches of short jobs to optimize their turnaround  Scalability  Data Access Analysis in parallel to production  Job Priorities

ATLAS Computing Model Data for analysis will be available distributed on all Tier-1 and Tier-2 centers  AOD & ESD  T1 & T2 are open for analysis jobs  The computing model foresees 50 % of grid resources to be allocated for analysis Users will send jobs to the data and extract relevant data  typically NTuples or similar

Requirements Data for a year of data taking  AOD – 150 TB  ESD Scalability  Last year up to jobs per day for production (job duration up to 24 hours) Grid and our needs will grow We expect that our analysis users will run much shorter jobs  Job delivery capacity of the order of 10 6 jobs per day Peak capacity Involves several grids Longer jobs can reduce this number (but might not always be practical) Job Priorities  Today we need short queues  In the future we need to steer the resource consumption of our physics and detector groups based on VOMS groups

ATLAS Strategy Production system  Seamless access to all ATLAS grid resources Direct submission to GRID  LCG LCG/gLite Resource Broker CondorG  OSG PANDA  Nordugrid ARC Middleware

ProdDB CE Dulcinea Lexor Dulcinea CondorG CG PANDA RB ATLAS Prodsys

Production System Provides a layer on top of the middleware  Increases the robustness by the system Retrials and fallback mechanism both for workload and data management  Our grid experience is captured in the executors  Jobs can be run in all systems Redesign based on the experiences of last year  New Supervisor - Eowyn  New Executors  Connects to new Data Management Adaptation for Distributed Analysis  Configurable user jobs  Access control based on X509 Certificates  Graphical User Interface ATCOM Presentation 110: ATLAS Experience on Large Scale Production on the Grid

LCG Resource Broker  Scalability  Reliability  Throughput New gLite Resource Broker  Bulk submission  Many other enhancements  Studied in ATLAS LCG/EGEE Taskforce Special setup in Milano & Bolongna  gLite – 2-way Intel Xeon 2.8 CPU (with hyper-threading), 3 GByte memory  LCG – 2-way Intel Xeon 2.4 CPU (without hyper-threading), 2 GByte memory  Both are using the same BDII (52 CE in total) Several bug fixes and optimization  Steady collaboration with the developers

LCG vs gLite Resource Broker Bulk submission much faster Sandbox handling better and faster Now the match making is the limiting factor  Strong effect from ranking gLiteLCG (sec/job)Submission Match- making OverallSubmission Match- making Overall To any CE ~ To one CE ~

CondorG Conceptually similar to LCG RB, but different architecture  Scaling by increasing the number of schedulers  No logging & bookkeeping, but a scheduler keeps tracks of the job Used in parallel during DC2 & Rome production and increased our use of grid resources Submission via the Production System, but also direct submission is imaginable Presentation 401: A Grid of Grids using CondorG

Last years experience Adding CondorG based executor in the production system helped us to increase the number of jobs on LCG

PANDA New prodsys executor for OSG  Pilot jobs  Resource Brokering  Close integration with DDM Operational in the production since December Presentation 347: PANDA: Production and Distributed Analysis System for ATLAS

PANDA Direct submission  Regional production  Analysis jobs Key features for analysis  Analysis Transformations  Job-chaining  Easy job-submission  Monitoring  DDM end-user tool  Transformation repository

ARC Middleware Standalone ARC client software – 13 MB Installation CE has extended functionality  Input files can be staged and are cached  Output files can be staged  Controlled by XRSL, an extended version of globus RSL Brokering is part of the submission in the client software  Job delivery rates of 30 to 50 per min have been reported  Logging & bookkeeping on the site Currently about 5000 CPUs, 800 available for ATLAS

Common Aspects Data management Transformations GUI

ATLAS Data Management Based on Datasets PoolFileCatalog API is used to hide grid differences  On LCG, LFC acts as local replica catalog  Aims to provide uniform access to data on all grids FTS is used to transfer data between the sites Evidently Data management is a central aspect of Distributed Analysis  PANDA is closely integrated with DDM and operational  LCG instance was closely coupled with SC3  Right now we run a smaller instance for test purposes  Final production version will be based on new middleware for SC4 (FPS) Presentation 75: A Scalable Distributed Data Management System for ATLAS

Transformations Common transformations is a fundamental aspect of the ATLAS strategy Overall no homogeneous system …. but a common transformation system allows to run the same job on all supported systems  All system should support them  In the end the user can adapt easily to a new submission system, if he does not need to adapt his jobs Separation of functionality in grid dependant wrappers and grid independent execution scripts. A set of parameters is used to configure the specific job options A new implementation in terms of python is under way

GANGA – The GUI for the Grid Common project with LHCb Plugins allow define applications  Currently: Athena and Gaudi, ADA (DIAL) And backends  Currently: Fork, LSF, PBS, Condor, LCG, gLite, DIAL and DIRAC Presentation 318: GANGA – A Grid User Interface

GANGA latest development New version 4 Job splitting GUI Work on plugins to various system is ongoing

Initial experiences PANDA on OSG Analysis with the Production System GANGA

PANDA on OSG pathena  Lightweight submission interface to PANDA DIAL  System submits analysis jobs to PANDA to get acces to grid resources First users are working on the system Presentation 38: DIAL: Distributed Interactive Analysis of Large Datasets

Distributed Analysis using Prodsys Currently based on CondorG  Lexor based system on its way GUI ATCOM Central team operates the executor as a service Several analysis were ported to the system Selected users are testing it Poster 264: Distributed Analysis with the ATLAS Production System

GANGA Most relevant  Athena application  LCG backend Evaluated by several users  Simulation & Analysis  Faster submission necessary Prodsys/PANDA/gLite/CondorG Feedback  All based on the CLI  New GUI will be presented soon

Summary Systems have been exposed to selected users  Positive feedback  Direct contact to the experts still essential  For this year – power users and grid experts … Main issues  Data distribution → New DDM  Scalability → New Prodsys/PANDA/gLite/CondorG  Analysis in parallel to Production → Job Priorities

Conclusions As of today Distributed Analysis in ATLAS is still work in progress (the detector too) The expected data volume require us to perform analysis on the grid Important pieces are coming into place We will verify Distributed Analysis according to the ATLAS Computing Model in the context of SC4