Readiness of ATLAS Computing - A personal view

Slides:

Advertisements

Similar presentations

The PanDA Distributed Production and Analysis System Torre Wenaus Brookhaven National Laboratory, USA ISGC 2008 Taipei, Taiwan April 9, 2008 Torre Wenaus.

Advertisements

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.

Alex Read, Dept. of Physics Grid Activities in Norway R-ECFA, Oslo, 15 May, 2009.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

The GridPP DIRAC project DIRAC for non-LHC communities.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)

ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.

ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.

WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006

Computing Operations Roadmap

University of Texas At Arlington Louisiana Tech University

Virtualization and Clouds ATLAS position

Overview of the Belle II computing

U.S. ATLAS Grid Production Experience

U.S. ATLAS Tier 2 Computing Center

POW MND section.

David Adams Brookhaven National Laboratory September 28, 2006

Data Challenge with the Grid in ATLAS

INFN-GRID Workshop Bari, October, 26, 2004

A full demonstration based on a “real” analysis scenario

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

ATLAS DC2 ISGC-2005 Taipei 27th April 2005

Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

ATLAS Sites Jamboree, CERN January, 2017

ALICE Computing Upgrade Predrag Buncic

Monitoring of the infrastructure from the VO perspective

US ATLAS Physics & Computing

LHC Data Analysis using a worldwide computing grid

DØ MC and Data Processing on the Grid

ATLAS DC2 & Continuous production

The LHCb Computing Data Challenge DC06

Presentation transcript:

Readiness of ATLAS Computing - A personal view Michael Ernst Brookhaven National Laboratory WLCG Collaboration Meeting September 1, 2007

ATLAS Production Computer System Commissioning (CSC) Software integration and operations exercise Started more than 1.5 years ago (since end of 2005) Distributed (MC) production goals Validation of Athena software using ‘standard’ samples Continuous production of high statistics physics samples To exercise widely distributed computing infrastructure Distributed Data Management (DDM) goals Worldwide data distribution and replication Controlled flow of data T0 -> T1 -> T2 -> T1 Bring sites and services to steady operations mode Michael Ernst Sep 1, 2007

Distributed Production Operations ATLAS utilizes three production grids EGEE, NDGF and OSG (10 Tier 1’s, ~40 Tier 2’s, +++) NDGF (Nordic) ~500 CPU’s, ~60TB disk OSG (U.S.) ~2500 CPU’s, ~386 TB disk EGEE ~3500 CPU’s, ~358 TB disk Common features Task/Job definition, Job dispatch (supervisor), metadata (AMI) Access to data through ATLAS DDM system Differences Production software (executors), grid middleware Independent operations team for each grid Service architecture, storage systems… Michael Ernst Sep 1, 2007

Production Service Infrastructure used in the U.S. PanDA is the distributed computing service infrastructure deployed in the U.S. and Canada (joined few weeks ago) Includes all U.S. and Canadian Tier 1 and Tier 2’s Plus some opportunistic/shared sites Works both with OSG (U.S.) and EGEE (Canada) middleware PanDA uses pilot jobs and provides a single task queue Pilot jobs allow instant activation of highest priority tasks Task queue brings familiar ‘batch system’ to distributed grids Athena jobs transparently become ‘pathena’ jobs running on PanDA PanDA provides an integrated software and computing system for U.S./CA ATLAS sites, managing all production and user analysis activities Michael Ernst Sep 1, 2007

CSC Production Statistics Many hundreds of physics processes have been simulated Tens of thousands of tasks spanning two major releases Dozens of sub-releases (about every three weeks) have been tested and validated Thousands of ‘bug reports’ fed back to software and physics 50M+ events done from CSC12 >300 TB of MC data on disk Michael Ernst Sep 1, 2007

US / Canada Production Statistics (PanDA) PanDA has completed ~30M fully simulated physics events (simul+digit step), >30% of total central production Also successfully completed >15M single particle events Since November, all available CPU’s occupied (ran out of jobs only for few days, plus few days of service outages) About 400 TB of original data stored at BNL T1 (includes data generated on other grids) Additional ~100 TB of replicas kept at U.S. ATLAS Tier 2 sites Michael Ernst Sep 1, 2007

Resource Usage CPU and disk resources available to ATLAS rising steadily Production system efficiencies are steadily increasing But much more will be required for data taking Additional resources are coming online soon Michael Ernst Sep 1, 2007

DDM Status – a Personal View ATLAS Distributed Data Management is a Critical Component To catalogue and manage flow of data It has not performed at the level needed Main issues Robustness of site services (transfer agents) Quality of Service (completion of data movement tasks) Too much manual intervention required What can we do? Add more effort to DQ2 development Develop backup plan for data movement (Catalogues are fine) Giving up on ATLAS computing model is not an option! We are discussing these topics actively and daily Michael Ernst Sep 1, 2007

The End Game Goal: steady ramp up of operations till the end of 2007 MC Production Achieve 10 Million evts/day by the end of 2007 (100 -200k production jobs/day) All targets that were set in 2005 have been met – so far DDM Operations Steady operations Automation and monitoring improvements Computing Model data flow implemented by end of 2007 Michael Ernst Sep 1, 2007

Data Location (U.S.) Tier 1 – main repository of data (MC & Primary) Store complete set of ESD, AOD, AANtuple & TAG’s on disk Fraction of RAW and all U.S. generated RDO data Tier 2 – repository of analysis data Store complete set of AOD, AANtuple & TAG’s on disk Complete set of ESD data divided among 5 Tier 2’s Data distribution to Tier 1 & Tier 2’s will be managed Tier 3 – unmanaged data matching local interest Data through locally initiated subscriptions Mostly AANtuple’s, some AOD’s Tier 3’s will be associated with Tier 2 sites? Tier 3 model is still not fully developed – need physisit’s input Michael Ernst Sep 1, 2007

Conclusion Computing systems are in good shape – through long running commissioning (CSC) exercise Current ATLAS Data Replication system needs significant improvements 2006 – successful transition into operations mode 2007 – high volume production and DDM operations Successful distributed MC production for CSC notes But many new challenges to come Need more participation by physicists – as users, developers, shifters… provide feedback to ATLAS Computing Lots of resources available for physicists Overall, progressing well towards full readiness for LHC data processing Michael Ernst Sep 1, 2007