WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
CERN – June 2007 View of the ATLAS detector (under construction) 150 million sensors deliver data … … 40 million times per second.
1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
External and internal data traffic in Tier-2 ATLAS farms. Sketch of farm organization Some approximate estimate s of internal and external data flows in.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
Status of 2015 pledges 2016 requests RRB Report Concezio Bozzi INFN Ferrara LHCb NCB, November 3 rd 2014.
Dr. M.-C. Sawley IPP-ETH Zurich Nachhaltige Begegnungen Standing at the crossing point between data analysis and simulation Knowledge Discovery Panel.
Introduction to CMS computing J-Term IV 8/3/09 Oliver Gutsche, Fermilab.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
CMS Computing Model summary UKI Monthly Operations Meeting Olivier van der Aa.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Ian Bird WLCG Workshop San Francisco, 8th October 2016
ALICE internal and external network
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
ALICE Computing Model in Run3
An introduction to the ATLAS Computing Model Alessandro De Salvo
ALICE Computing Upgrade Predrag Buncic
New strategies of the LHC experiments to meet
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich

WLCG/8 July 2010/MCSawley CMS Data Flow in the Computing Grid TIER-0 TIER-1 TIER-1 CAF TIER-2TIER-2TIER-2TIER-2 Prompt Reconstruction Re-Reconstruction Skims Simulation Analysis Calibration Express-Stream Analysis 600MB/s MB/s ~ 20MB /s TIER-1 CASTOR MB / s 2

WLCG/8 July 2010/MCSawley Purpose of the predictive model Need to – Re-examine the computing model according to conditions, and adapt whenever needed – Keep track of the deployment of the resources at different Tiers Develop a tool which would yield reliable information for the ramping up for the years to come – Tool has no value without dialog nor comparison with the measured rates

WLCG/8 July 2010/MCSawley 4 Exporting custodial data: methodology T0—> T1s : exporting FEVT – BW=(RAW+RECO) x Trigger frequency x (1+overlap factor). For the chosen parameters, this yields: BW= 2 MB x 300Hz x 1.4 = 840 MB/sec, or 6.75 Gb/sec. Each T1 receives a share according to its relative size in CPUs Proportional to the trigger rate, event size and Tier-1 relative size In 2010 we will continue to send more than 1 copy of the data, but the event size is smaller

WLCG/8 July 2010/MCSawley 5 CERN to Tier-1 in 2010 Rate is defined by the accelerator, the detector and the data distribution policy – Livetime of the machine is lower than we expect for the future System is specified to recover between fills – Data is over subscribed Will continue as resources allow – RAW event size is smaller than our estimates – Event rate is defined by the physics program We expect the average rate from CERN to Tier-1s will increase, but we would like to track the changes so that planning matches measured rates – Dimension according to expected bursts or peaks

WLCG/8 July 2010/MCSawley 6 Tier-0 to Tier-1 CERN to Tier-1 Average since beginning of 2010 run 600MB/s

WLCG/8 July 2010/MCSawley 7 Tier-1 to Tier-1 in 2010 The CMS plan currently is ~ 3.5 copies of the AOD – After an refresh of the full sample of a year’s running this is 1.6PB of disk to update – Using 10Gb/s that takes 20 days. – Achieving 30Gb/s is a week The Computing TDR had 2 weeks In 2010 we will also be replicating large samples of RECO Recovering from a data loss event at a Tier-1 is more challenging because the data might be coming from 1 place only – Could also take longer with the normal risk of double failure

WLCG/8 July 2010/MCSawley 8 Tier-1 to Tier-1 Transfers are used to replicate raw, reco and AOD data, recover from losses and failures at Tier-1 sites

WLCG/8 July 2010/MCSawley 9 Tier-1 to Tier-2 Data from Tier-1 to Tier-2 is driven by event selection efficiency, frequency of reprocessing, level of activity – All of these are harder to predict, but translate into physics potential The connections between data production sites and analysis tiers needs to allow prompt replication – CMS is currently replicating 35TB of data that took 36 hours to produce to 3 sites (~100TB) – These bursts are not atypical

WLCG/8 July 2010/MCSawley 10 June 11/Ian Fisk Tier-1 to Tier-2 CMS is very close to completing commissioning the full mesh of Tier-1 to Tier-2 transfers at a low rate – Working on demonstrating more links at 100MB/s – Daily average exceeding 1GB/s

WLCG/8 July 2010/MCSawley Conditions CONSTANTS Trigger rate300 Hz RAW Size.500MB SimRAW2.00MB RECO size.500MB AOD size.200MB Total number of events 2360 MEvents Overlap between PD40, then 20% Total number of simulated events2076MEvents Total size of RAW 1474 TB Total size RECO 1474 TB Total Primary AOD 472 TB

WLCG/8 July 2010/MCSawley 2. Data taking: total sample 6. Daily export for data analysis: -from affliliated T2s for the whole AOD sample (re-RECO) -from all Tiers 2 for custodial AOD Number kSP06 TO Tier1s Tape system 5. MC PROD 1.Custodial Data (prop. to the relative size) 4. AOD Sync N% Users Affiliated Tier2s Generic Tier1 Imports Exports 4. AOD Sync 3. During re-reprocessing

WLCG/8 July 2010/MCSawley The results 1 slide per regional Tier1 Pledged Cores are for 2010 Remember: this are really raw values Links: – Solid line: sustained bandwidth (data taking and re-processing periods ONLY) – Broken line: peak bandwidth (may happen at any time: numbers shown is the total if it all happens at the same time ) For each Tier 1, the fraction of served users for analysis is a combination based on – Relative size T2s for analyzing the share of 1srt AOD at considered Tier1, number of users based on the number of supported physics groups – Relative size of T1 for analyzing the full AOD

WLCG/8 July 2010/MCSawley 6. Daily export for data analysis: 70 TB IN2P kSP06 IN2P kSP06 TO Tier1s Tape system 7 Gb/s 5. MC PROD: 700TB 0.20 Gb/s 0.40 Gb/s 1. Data taking 4. AOD Sync2.5 Gb/s 13% Users 6 Tier2s FRANCE IN2P3 Imports 520TB Exports 370 TB 4. AOD Sync 3.5 Gb/s 45 MB/sec 20 MB/sec 2. Data taking: (total sample 300TB) 50 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 6. Daily export for data analysis: 36 TB FZK kSP06 FZK kSP06 TO Tier1s Tape system 3.5 Gb/s 5. MC PROD: 580 TB 0.15 Gb/s 0.3 Gb/s 1. Data taking 4. AOD Sync 2.3 Gb/s 8% Users 8% Users 3 Tier2s GERMANY FZK Imports 522 TB Exports 367 TB 4. AOD Sync 3.6 Gb/s 40 MB/sec 20 MB/sec 2. Data taking: (total sample 280TB) 40 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 5. Daily export for data analysis: 40 TB CNAF kSP06 CNAF kSP06 TO Tier1s Tape system 3.6 Gb/s 5. MC PROD: 440TB 0.12 Gb/s 0.4 Gb/s 1. Data taking 4. AOD Sync 2.8 Gb/s 10 % Users 4 Tier2s ITALY CNAF Imports 514 TB Exports 415 TB 4. AOD Sync 3.5 Gb/s 50 MB/sec 20 MB/sec 2. Data taking: (total sample 345TB) 50 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 5. Daily export for data analysis: 32 TB PIC 5.46 kSP06 PIC 5.46 kSP06 TO Tier1s Tape system 3.0 Gb/s 5. MC PROD: 420 TB 0.08 Gb/s 0.2 Gb/s 1. Data taking 4. AOD Sync 1.2 Gb/s 8 % Users 8 % Users 3 Tier2s SPAIN PIC Imports 553 TB Exports 181 TB 4. AOD Sync 3.7 Gb/s 20 MB/sec 10 MB/sec 2. Data taking: (total sample 151TB) 20 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 6. Daily export for data analysis: 5 TB TAIWAN 14 kSP06 TAIWAN 14 kSP06 TO Tier1s Tape system 0.4 Gb/s 5. MC PROD: 250 TB 0.08 Gb/s 0.4 Gb/s 1. Data taking 4. AOD Sync 3.10 Gb/s 5 % Users 5 % Users 3 Tier2s TAIWAN ASGC Imports 506 TB Exports 464 TB 4. AOD Sync 3.40 Gb/s 50 MB/sec 25 MB/sec 2. Data taking: (total sample 400 TB) 50 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 6. Daily export for data analysis: 39 TB RAL 8.04 kSP06 RAL 8.04 kSP06 TO Tier1s Tape system 3.7 Gb/s 5. MC PROD: 700 TB 0.2 Gb/s 0.25 Gb/s 1. Data taking 4. AOD Sync 1.8 Gb/s 15% Users 4 Tier2s UK RAL Imports 530 TB Exports 267 TB 4. AOD Sync 3.6 Gb/s 30 MB/sec 20 MB/sec 2. Data taking: (total sample 225TB) 30 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley 6. Daily export for data analysis: 120 TB FNAL 44.4 kSP06 FNAL 44.4 kSP06 TO Tier1s Tape system 11 Gb/s 5. MC PROD: 1274 TB 0.34 Gb/s 1.30 Gb/s 1. Data taking 4. AOD Sync 10 Gb/s 24 % Users 9 Tier2s USA FNAL Imports 340 TB Exports 1.5 PB 4. AOD Sync 2.3 Gb/s 165 MB/sec 80 MB/sec 2. Data taking: (total sample 1230 TB) 160 MB/sec 3. During re-reprocessing

WLCG/8 July 2010/MCSawley Data rate intake by Tier2 estimate Data import simulated taking the parameters of the run The association with Physics groups is taken into consideration The global processing capacity of Tier1s and the relative disk space of each Tier2 are taken into consideration Expected rate if all PG work at the same time, on a single day – Sustained rate on a peak day (any better definition?) Purpose – Inform sites about usage/load, planning – Helping sites which may run into imbalance, such as WAN likely to be a limitation, especially if site is serving >1 VO Imbalance between number of PG and the amount of local resources

WLCG/8 July 2010/MCSawley Data rate intake by Tier-2 Preliminary comparison with measured rate Data from Data Ops – T1 to T2, best rate from sample of measures over a few hours, between November and March (200 files -2GB each- sent between T1 and T2) For 27 sites, the simulation gives a number below the measured data rate  satisfactory For 9 sites, there are no valuable data yet to be compared For 7 sites, 1 (or more) link is above simulated data rate, however the average is below  monitor and try to understand For 4 sites, all measured links were below simulated data rate  go deeper

WLCG/8 July 2010/MCSawley Possible reasons for significant deviation between results Simulation may be inaccurate for that particular site – Model keeps being refined New measurements keep coming Real limitations may come from – WAN – A few parameters to tune at the Tier-2 – ….. Still very much Work in Progress, do not jump on conclusions before further analysis

WLCG/8 July 2010/MCSawley Data rate intake for T2s CountryTier 2s data rate Incoming <-- T1 (MB/sec) Number PG Installed WAN (Gb/sec) Measured rate (average)remark AustriaT2-AT-Vienna BelgiumT2-BE-IIHE BelgiumT2-BE-UCL BrazilT2-BR-UERJ BrazilT2-BR-SPRACE ChinaT2-CN-Beijing EstoniaT2-EE-Estonia FinlandT2-FI-HIP702?104 FranceT2-FR-IPHC FranceT2-FR-GRIF FranceT2-FR-IN2P GermanyT2-DE-Desy GermanyT2-DE-RWTH HungaryT2-HU-Budapest401158

WLCG/8 July 2010/MCSawley Data rate intake for T2s CountryTier 2s data rate Incoming <-- T1 (MB/sec) Number PGInstalled WAN (Gb/sec)remark IndiaT2-IN-TIFR ItalyT2-IT-Bari ItalyT2-IT-Legnaro ItalyT2-IT-PISA ItalyT2-IT-Roma KoreaT2-KR-KNU PakistanT2-PK-NCP20.07 Model yields insignificant rate as disk space is very modest PolandT2-PL-Warsaw5297 Model yields insignificant rate as disk space is very modest PortugalT2-PR-LIP11.60 PortugalT2-PR-NGC301?70 RussiaT2-RU-IHEP Model yields insignificant rate as disk space is very modest RussiaT2_RU_INR1?29 Model yields insignificant rate as disk space is very modest RussiaT2-RU-ITEP RussiaT2-RU-JINR RussiaT2-RU-PNPI Model yields insignificant rate as disk space is very modest

WLCG/8 July 2010/MCSawley Data rate import for T2s CountryTier 2s data rate Incoming <-- T1 (MB/sec) Number PGInstalled WAN (Gb/sec)remark RussiaT2-RU-RRC_KI12.5 RussiaT2-RU-SINP SpainT2-ES-CIEMAT SpainT2-ES-IFCA SwitzerlandT2-CH-CSCS TaiwanT2-TW-TAIWAN31077 Model yields insignificant rate as disk space is very modest TurkeyT2-TR-METU4155 Model yields insignificant rate as disk space is very modest UKT2-UK-IC UKT2-UK-Brunel UK T2-UK-Southgrid RAL UKRAINET2-UA-KIPT Model yields insignificant rate as disk space is very modest

WLCG/8 July 2010/MCSawley Data rate import for T2s CountryTier 2s data rate Incoming <-- T1 (MB/sec) Number PGInstalled WAN (Gb/sec) USAT2_US_Caltech USAT2_US_Florida USAT2_US_MIT USAT2_US_Nebraska USAT2_US_Purdue USAT2_US_UCSD USAT2_US_Wisconsin

WLCG/8 July 2010/MCSawley 28 Outlook Development of the model is on going Taking regularly into account real activities – CERN to Tier-1s is driven by the detector and the accelerator – Tier-1 to Tier-1 is driven by need to replicate samples and to recover from problems. See reasonable bursts that will grow with the datasets. – Tier-1 to Tier-2 is driven by activity and physics choices – Large bursts already. Scale as activity level and integrated lumi – Tier-2 to Tier-2 is ramping up. Keeping the dialog with the sites and the specialists to enrich the model

WLCG/8 July 2010/MCSawley Thank you Acknowledgements Ian Fisk, Daniele Bonacorsi, Josep Flix, Markus Klute, Oli Gutsche, Matthias Kasemann Question or feedback?