ATLAS Computing Model & Service Challenges Roger Jones 12 th October 2004 CERN.

Slides:

Advertisements

Similar presentations

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Advertisements

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.

ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.

1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Stuart Wakefield Imperial College London1 How (and why) HEP uses the Grid.

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.

Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Worldwide event filter processing for calibration Calorimeter Calibration Workshop Sander Klous September 2006.

Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.

Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.

Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.

LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.

Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.

ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

ATLAS Computing Model – US Research Program Manpower J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004.

Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.

EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,

Network Performance for ATLAS Real-Time Remote Computing Farm Study Alberta, CERN Cracow, Manchester, NBI MOTIVATION Several experiments, including ATLAS.

ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007.

Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.

Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.

Geneva – Kraków network measurements for the ATLAS Real-Time Remote Computing Farm Studies R. Hughes-Jones (Univ. of Manchester), K. Korcyl (IFJ-PAN),

2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.

ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.

SC4 Planning Planning for the Initial LCG Service September 2005.

ATLAS Grid Computing Rob Gardner University of Chicago ICFA Workshop on HEP Networking, Grid, and Digital Divide Issues for Global e-Science THE CENTER.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.

CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.

10-Jan-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 January 2000 Les Robertson CERN/IT.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.

GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.

Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)

LHCb Current Understanding of Italian Tier-n Centres Domenico Galli, Umberto Marconi Roma, January 23, 2001.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.

WLCG November Plan for shutdown and 2009 data-taking Kors Bos.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.

11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.

ATLAS Grid Computing in the Real World Taipei, 27 th April 2005 Roger Jones ATLAS International Computing Board Chair GridPP Applications Co-ordinator.

Database Replication and Monitoring

ALICE internal and external network

Data Challenge with the Grid in ATLAS

ILD Ichinoseki Meeting

R. Graciani for LHCb Mumbay, Feb 2006

ATLAS DC2 & Continuous production

The ATLAS Computing Model

LHCb thinking on Regional Centres and Related activities (GRIDs)

Development of LHCb Computing Model F Harris

Presentation transcript:

ATLAS Computing Model & Service Challenges Roger Jones 12 th October 2004 CERN

RWL Jones, Lancaster University Computing Model  A new revision started late spring 03  Workshop in June 04, input from CDF, D Ø and ATLAS  Very little input so far from some key communities:  detector groups on calibration and alignment computing needs  Calibration input on 2 October 2004!  physics groups on data access patterns  This is a major concern, as some have unrealistic ideas!  Still large uncertainties on the final event sizes  Huge potential impact on access and on costs!  With the advertised assumptions, we are at the limit of available disk  RAW data cannot be bigger because of TDAQ bandwidth

RWL Jones, Lancaster University Tier2 Centre ~200kSI2k Event Builder Event Filter ~7.5MSI2k T0 ~5MSI2k UK Regional Centre (RAL) US Regional Centre French Regional Centre Dutch Regional Centre SheffieldManchesterLiverpool Lancaster ~0.25TIPS Workstations 10 GB/sec 450 MB/sec MB/s links Some data for calibration and monitoring to institutes Calibrations flow back Each Tier 2 has ~15-20 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Physics data cache ~Pb/sec ~ 75MB/s/T1 for ATLAS Tier2 Centre ~200kSI2k  622Mb/s links Tier 0 Tier 1 Desktop PC (2004) = ~1 kSpecInt2k Northern Tier ~200kSI2k Tier 2  ~200 Tb/year/T2  ~2MSI2k/T1  ~2 Pb/year/T1  ~5 Pb/year  No simulation  622Mb/s links The ATLAS System 10 Tier 1s assumed

RWL Jones, Lancaster University Computing Resources  Assumption:  50 days running in 2007 and 2008 (over the year break)  Tier-0 has raw+calibration data+first-pass ESD  CERN ‘tier-1’ is analysis-only (big tier-2)  Notes:  One-off purchase of disk buffer is needed in first year  Allows coherent data sets available while reprocessing  Efficiencies folded-in to the numbers  Model trys to capture analysis activity  The analysis test in phase 3 of DC2 will be important to answer some of the questions  Especially hard to quantify year-1 semi-private reprocessing need, but estimates included

RWL Jones, Lancaster University The input Numbers Rate(Hz)sec/yearEvents/ySize(MB)Total(TB) Raw Data (inc express etc) E E ESD (inc express etc) E E General ESD E E General AOD E E General TAG E E Calibration (ID, LAr, MDT) 44 (8 long-term) MC Raw 2.00E ESD Sim 2.00E AOD Sim 2.00E TAG Sim 2.00E Tuple 0.01 Nominal year 10 7 s Accelerator efficiency 50%

RWL Jones, Lancaster University Year 1 T0 requirements Table Y1.1 CERN T0 : Storage requirement Disk (TB)Tape (TB) Raw 03040 General ESD (prev..) 01000 Total Table Y1.2 CERN T0 : Computing requirement Reconstr.Reprocess.Calibr.Cent.AnalysisUser AnalysisTotal (kSI2k) CPU (KSI2k) Note that the calibration load is evolving Aim here is for steady-state requirements (then vire resources for start-up) ESD is 24% of Tape ESD 0.5MB

RWL Jones, Lancaster University CERN Super-T2 (First year) Table Y1.3 Storage requirement Disk (TB)Auto.Tape (TB) Raw460 General ESD (curr.)260 General ESD (prev.)018 AOD (curr.)2570 AOD (prev.)04 TAG (curr.)30 TAG (prev.)02 ESD Sim (curr.)1430 ESD Sim (prev.)0100 AOD Sim (curr.)290 AOD Sim (prev.)020 Tag Sim (curr.)0.20 Tag Sim (prev.)00.2 Calibration57 User Data (100 users)1730 Total  Small-sample chaotic reprocessing 170kSI2k  Calibration 530kSI2k  User analysis ~810kSI2k  This site does not share in the global simulation load  The start-up balance would be very different, but we should try to respect the envelope ESD is 23% of Disk ESD is 82% of Tape

RWL Jones, Lancaster University Estimate about 1660kSi2k for each of 10 T1s Central analysis (by groups, not users) ~1200kSI2k Table Y1.5 T1 : Storage requirement Disk (TB)Auto.Tape (TB) Raw46320 General ESD (curr.)25790 General ESD (prev..)12990 AOD28336 TAG30 Calib570 RAW Sim040 ESD Sim (curr.)2910 ESD Sim (prev.)1410 AOD Sim314 Tag Sim00 User Data (20 groups)690 Total Typical Tier-1 Year 1 resources This includes a ‘1year, 1 pass’ buffer ESD is 47% of Disk ESD is 33% of Tape Current pledges are ~55% of this requirement Making event sizes bigger makes things worse! Year 1 T1 Requirements

RWL Jones, Lancaster University Single T1 Evolution (totals)

RWL Jones, Lancaster University Single T1/year (per year)

RWL Jones, Lancaster University Tier-2 (First year) Table Y1.7 Typical Storage requirement Disk (TB) Raw1 General ESD (curr.)13 AOD64 TAG3 RAW Sim0 ESD Sim (curr.)3 AOD Sim7 Tag Sim0 User Group17 User Data26 Total134  User activity includes some reconstruction (algorithm development etc)  Also includes user simulation  T2s also share the event simulation load, but not the output data storage Table Y1.8 Typical Computing requirement Reconstruction.ReprocessingSimulationUser AnalysisTotal (kSI2k) CPU (KSI2k)

RWL Jones, Lancaster University Overall Year-1 Resources CERN All T1 All T2 Total Tape (Pb)4.3Pb6.0Pb0.0Pb10.3Pb Disk (Pb)0.7Pb9.2Pb5.4Pb15.3Pb CPU (MSI2k)5.6MSI2k16.6MSI2k5.7MSI2k27.9MSI2k If T2 supports private analysis, add about 1 TB and 1 kSI2k/user

RWL Jones, Lancaster University Issues  Lively debate on streaming and exclusive streams  e.g. ESD/AOD meeting last Friday  Model always had streaming of AOD etc  No problem with post T0 streaming of RAW/ESD or small strims  We need typical selections to determine the most efficient streams  Level of access to RAW?  Depends on functionality of ESD  Discussion of small fraction of DRD – augmented RAW data  Much bigger concerns about non-exclusive streaming  How do you handle the overlaps when you spin over 2 streams?  Real use cases needed to calculate the most efficient access  On the input side of the T0, assume following:  Primary stream – every physics event  Publications should be based on this, uniform processing  Calibration stream – calibration + copied selected physics triggers  Need to reduce latency of processing primary stream  Express stream – copied high-pT events for ‘excitement’ and (with calibration stream) for detector optimisation  Must be a small percentage of total

RWL Jones, Lancaster University Networking – T0-T1  EF  T0 maximum 300MB/s (450MB/s with headroom)  If EF away from pit, require 7GB/s for SFI inputs (10x10Gbps with headroom)  Offline networking off-site now being calculated with David Foster  Recent exercise with (almost) current numbers  Full bandwidth estimated as requirement*1.5(headroom)*2(capacity)  Propose dedicated networking test beyond DC2 RAL (typical T1?) (typical T1?) T0 Total ATLAS (MB/s) T1 Total ATLAS Gbps T1 Gbps full Assumed Gbps 1070

RWL Jones, Lancaster University Additional Bandwidth  There will also be traffic direct from the online to the outside world  Monitoring – low volume overall, but may be in bursts  Calibration – generally low volume, but some – MDT for example – may be large for short periods (~Gbps)  A possibility (for dedicated periods) is offline event filtering  Full rate would be ~10x10Gbps  More likely a reduced stream, several Gbps for short periods  Big issues here, but should not be excluded a priori

RWL Jones, Lancaster University Organisation/Resources  The T0-T1 transfer tests are much delayed  Now in Nov-Dec 04, so parallel with Service Challenge?  Organisation of tools is also slow  Oxanna Smirnova co-ordinating  Need for higher-volume tests to  Stress the network  Stress the data management tool (Don Quixote)  Willing participants/partners:  Lancaster (with Richard Hughes-Jones, Manchester):  Through ESLEA, interest in dedicated light-paths  2-year post to be released shortly, available early in new year  CERN  Brian Martin and Catalin Meirosu  Continuing online tests  Need to be integrated with general effort

RWL Jones, Lancaster University

Bandwidth measurements  The networking is layered, from the physical transmission media (layer 1) to the application (layer 7)  Tests at layer 2,3  relevant when the remote and the local sites are logically in the same LAN  example: throughput between CERN – INP Krakow, August 2004:~1000 Mbit/s  Layer 4 tests: TCP, UDP  Relevant for general-purpose, Internet-style connectivity  Performed tests between Geneva and Copenhagen, Edmonton, Krakow, Manchester  Test equipment: server PCs, running patched Linux kernels and open source software for network measurements Example: Geneva – Manchester The network can sustain 1Gbps of UDP traffic, but the average server has problems with smaller packets Degradation for packets smaller than ~1000bytes, caused by the PC receiving the traffic

RWL Jones, Lancaster University Real application in an ATLAS context  Simple request-response program  Emulation of the request-response communication between the SFI and EFD in the Event Filter  Runs over TCP/IP  The client sends a small request message  The server answers with an up to 2 MB message  Results … to be understood EF SFI request event network ATLAS Event Filter scenario

RWL Jones, Lancaster University Request – Response results, CERN – Uni. Manchester connection Good response if the TCP stack is properly tuned, poor if not Need to understand TCP implementation issues, not only the generic protocol 800Mbit/s achievable with tuned stack, 120 Mbit/s without – the same end nodes were used in both cases ! Out-of-the-box TCP settings Tuned TCP stack 64 byte Request green 1 MByte Response blue

RWL Jones, Lancaster University Conclusions and Timetable  Computing Model documents required by end of year  We will not now have completed the DC2 Tests by then, especially the analysis tests  We can have serious input from detector calibrators and physics groups (sample sizes, access patterns)  We also need to know (urgently) about off-site networking from online (calibration, monitoring, …)  Event access times  Computing Model review in January 2005 (P McBride)  We need to have serious inputs at this point  Documents to April RRBs  MoU Signatures in Summer 2005  Computing & LCG TDR June 2005