Status and Prospects of The LHC Experiments Computing

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.

December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.

ALICE Operations short summary LHCC Referees meeting June 12, 2012.

Glenn Patrick Rutherford Appleton Laboratory GridPP22 1 st April 2009.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.

Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

А.Минаенко Совещание по физике и компьютингу, 03 февраля 2010 г. НИИЯФ МГУ, Москва Текущее состояние и ближайшие перспективы компьютинга для АТЛАСа в России.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.

The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.

Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.

LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.

Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.

WLCG November Plan for shutdown and 2009 data-taking Kors Bos.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.

Storage discovery in AliEn

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.

Daniele Bonacorsi Andrea Sciabà

WLCG IPv6 deployment strategy

Computing Operations Roadmap

WLCG Network Discussion

Ian Bird WLCG Workshop San Francisco, 8th October 2016

LCG Service Challenge: Planning and Milestones

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Grid site as a tool for data processing and data analysis

evoluzione modello per Run3 LHC

Workshop Computing Models status and perspectives

Data Challenge with the Grid in ATLAS

for the Offline and Computing groups

Bernd Panzer-Steindel, CERN/IT

Update on Plan for KISTI-GSDC

CMS transferts massif Artem Trunov.

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Readiness of ATLAS Computing - A personal view

The LHC Computing Grid Visit of Her Royal Highness

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.

Artem Trunov and EKP team EPK – Uni Karlsruhe

Storage elements discovery

Simulation use cases for T2 in ALICE

ALICE Computing Model in Run3

ALICE Computing Upgrade Predrag Buncic

Bernd Panzer-Steindel CERN/IT

ILD Ichinoseki Meeting

New strategies of the LHC experiments to meet

R. Graciani for LHCb Mumbay, Feb 2006

ATLAS DC2 & Continuous production

The ATLAS Computing Model

The LHC Computing Grid Visit of Professor Andreas Demetriou

The LHCb Computing Data Challenge DC06

Presentation transcript:

Status and Prospects of The LHC Experiments Computing computing models, computing commissioning and its practical problems CHEP, Prague Kors Bos, NIKHEF&CERN March 23, 2009

This Talk Disclaimer 1: The title that Milos gave me cannot be done in 20 minutes and maybe not even in 20 hours. A good fraction of this whole conference is about this. So this merely will be an introduction Disclaimer 2: I try to talk about all 4 LHC experiments but I am obviously biased towards one … Disclaimer 3: I may get things not completely right when talking about other VO’s than my own and I apologize beforehand and refer to all specialized talks at this conference Disclaimer 4: I can not guarantee that I will explain all acronyms, but I will try

First events

Status and Prospects of The LHC Experiments Computing computing models, computing commissioning and its practical problems CHEP, Prague Kors Bos, NIKHEF&CERN March 23, 2009

Ubiquitous Wide Area Network Bandwidth First Computing TDR’s assumed not enough network bandwidth The Monarch project proposed multi Tier model with this in mind Today network bandwidth is our least problem But we still have the Tier model in the LHC experiments Not in all parts of the world ideal network yet (last mile) LHCOPN provides excellent backbone for Tier-0 and Tier-1’s Each LHC experiment has adopted differently

ATLAS Workflows Calibration & Alignment Express Stream Analysis Prompt Reconstruction Tier-0 CAF CASTOR 650 MB/sec RAW Re-processing HITS Reconstruction 50-500 MB/sec Tier-1 Tier-1 Tier-1 50-500 MB/sec Tier-2 Tier-2 Tier-2 Simulation Analysis

Prompt Reconstruction CMS Workflows Prompt Reconstruction TIER-0 CAF Calibration Express-Stream Analysis CASTOR 600MB/s Re-Reco Skims 50-500MB/s TIER-1 TIER-1 TIER-1 50-500MB/s ~20MB/s Simulation Analysis TIER-2 TIER-2 TIER-2 TIER-2 February 16, 2009 WLCG LHCC Mini-review M.Kasemann

Similarities & Differences CMS vs ATLAS Tier-0 and CAF very much the same functionality Rates are quite similar Functionality of Tier-1’s much the same: re-reconstruction Functionality of Tier-2’s much the same: Simulation and analysis CMS: analysis jobs in Tier-2’s can get data from any Tier-1 ATLAS: analysis jobs in Tier-2’s can get data only from Tier-1 within the same cloud CMS: analysis coordinated per Tier-2 ATLAS: coordinated per physics group and/or cloud

CAF Simulation LHCb Workflows TIER-0 TIER-1 TIER-1 TIER-1 TIER-2 Reconstruction Skimming Analysis TIER-0 Calibration Expr-Stream Analysis CASTOR CAF RAW ESD Reconstruction Skimming Analysis TIER-1 TIER-1 TIER-1 ESD ESD Simulation TIER-2 TIER-2 TIER-2 TIER-2 TIER-2

Similarities & Differences CMS & ATLAS vs LHCb CAF very much the same functionality Rates are much higher but data volume much smaller Different functionality of Tier-1: reconstruction, skimming and analysis The Tier-0 acts as another Tier-1: reconstruction, skimming and analysis The Tier-2’s do only simulation (+digitization +reconstruction) production Output from simulation (DST) can be uploaded to any Tier-1 No cloud concept RAW and RDST (output from reconstruction) go to tape in Tier-0/1 DST (output from skimming) goes to all Tier-0/1’s on disk

Storage hypervisor – xrootd global redirector ALICE Workflows Calibration & Alignment Express Stream Analysis Prompt Reconstruction Tier-0 CAF CASTOR Storage hypervisor – xrootd global redirector RAW Re-processing Simulation, analysis (if free resources) Tier-1 Tier-1 Tier-1 T1 AF Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Simulation Analysis T2 AF

Similarities & Differences CMS vs ATLAS vs ALICE Tier-0 and CAF very much the same functionality Functionality of Tier-1’s much the same: re-reconstruction If resources available, T1s can do MC and analysis (ALICE job queue prioritization) Functionality of Tier-2’s much the same: Simulation and analysis ALICE: analysis jobs are allowed to ‘pull’ data from any storage in case of local data not found (Grid catalogue-SE discrepancy) Through xrootd global redirector (SE collaboration on Grid scale) Network is ubiquitous, limited ‘ad hoc’ data transfers do not pose a problem Allow the job to complete and fix the discrepancy afterwards ESDs/AODs can be stored at any T1/T2 depending on the resources availability, there is no ‘targeted, per data or physics type’ data placement

ATLAS Jobs go to the Data Detector data 110 TB RAW, ESD, AOD, DPD Centrally managed Managed with space tokens Example for a 200 TB T2 Simulated data 40 TB RAW, ESD, AOD, DPD Centrally managed MC CPUs Physics Group data 20 TB DnPD, ntup, hist, .. Group managed GROUP Analysis tools User Scratch data 20 TB User data Transient SCRATCH @Tier-2 @Tier-3 Local Storage Non pledged User data Locally managed LOCAL

(without space tokens)

LHCb Analysis is done in the place (Tier-0 and Tier-1’s) where the already data is LHCb uses 6 space tokens Alice Jobs go to the data But… Data can also go to the jobs depending on where the free resources are Alice doesn’t use space tokens at all

Status and Prospects of The LHC Experiments Computing computing models, computing commissioning and its practical problems CHEP, Prague Kors Bos, NIKHEF&CERN March 23, 2009

How SAM works

ALICE latest results (VOBOXes and CE)

ALICE: SAM results integrated also in MonALISA

LHCb latest results A snapshot similar to the CMS and LHCb one could be retrieved also to ALTAS

CMS last 2weeks availability

CMS site ranking

Functional tests in ATLAS

Status and Prospects of The LHC Experiments Computing computing models, computing commissioning and its practical problems CHEP, Prague Kors Bos, NIKHEF&CERN March 23, 2009

Practical Problem 1: Big Step at once A run of ~1 year without interruption Without having had a chance to test in a short period Without having ran all services of all 4 VO’s at the same time Do we have the bandwidth everywhere ? Do we have the people to run all shifts ? Have sites appreciated what it means ? Only very short (max 1 day) scheduled downtimes ..

Scheduled down times of the sites we better be prepared that not all sites are always up ..

Practical Problem 2 : Tapes but calculable ATLAS writes RAW data and G4 HITS to tape and ESD from re-processing ATLAS read RAW back from tape for re-processing and HITS for (re-)reconstruction CMS writes RAW data to tape CMS reads RAW data back fro re-processing LHCb writes RAW data to tape and RDST from reconstruction LHCb reads RAW data back for re-processing Alice writes RAW data to tape as well as ESD and AOD And reads RAW back for re-processing All these processes have been tested individually But not all together ! A Tier-1 supporting all 4 experiments needs to worry about Tape families Number of tape drives Bandwidth to/from tape Buffer sizes Probably one of the biggest unknown for the next run Very hard to plan & test beforehand

Practical Problem 3 : Users and non-calculable Roughly known how many there are: a few thousand How many jobs they will run ? We already have “power users” running thousands of jobs at once How many power users will we have? will they always run over all data? Which data will they use? Are there enough copies of the data? Are the the right data? Is there enough CPU capacity where also the data is? Will the free market work or do we have to regulate? Is there enough bandwidth to the data? Copy to the worker node? Via remote access protocol? Can the protocols cope with the rate? Will they be able to store their output? On the grid temporarily or locally for permanent storage How will physics groups want to organize their storage How will users do their end-analysis? What is the role of Tier-2 and -3 What will the analysis centers provide? The biggest unknown for the next run We have no control on testing this beforehand

2009-2010 Run the calculable and non-calculable Data acquisition will work and also the data distribution Calibration and alignment will work and also the reconstruction in the sites Monte Carlo Simulation production will work Tape writing will work … scales with the hardware available Tape reading may be trickier … hard to do it all efficiently CPU’s will work … but there will never be enough Bandwidth to the data may become an issue Users will be the big unknown … and yet it is the most important Only this will validate or falsify the computing models We will know better in Taipei !