LCG Service Challenges Overview

Slides:



Advertisements
Similar presentations
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Advertisements

LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
The LCG Service Challenges: Experiment Participation Jamie Shiers, CERN-IT-GD 4 March 2005.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
SC4 Planning Planning for the Initial LCG Service September 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
LCG Storage Management Workshop - Goals and Timeline of SC3 Jamie Shiers, CERN-IT-GD April 2005.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
LCG LHC Grid Deployment Board Regional Centers Phase II Resource Planning Service Challenges LHCC Comprehensive Review November 2004 Kors Bos, GDB.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
Planning the Planning for the LCG Service Challenges Jamie Shiers, CERN-IT-GD 28 January 2005.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
T0-T1 Networking Meeting 16th June Meeting
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Ian Bird WLCG Workshop San Francisco, 8th October 2016
“A Data Movement Service for the LHC”
The LHC Computing Environment
LCG Service Challenge: Planning and Milestones
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
James Casey, IT-GD, CERN CERN, 5th September 2005
Service Challenge 3 CERN
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
Database Readiness Workshop Intro & Goals
Update on Plan for KISTI-GSDC
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
The LHC Computing Grid Visit of Her Royal Highness
The LCG Service Challenges: Ramping up the LCG Service
Project Status Report Computing Resource Review Board Ian Bird
Summary of Service Challenges
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
LHC Tier 2 Networking BOF
ATLAS DC2 & Continuous production
LHCb thinking on Regional Centres and Related activities (GRIDs)
The LHC Computing Grid Visit of Professor Andreas Demetriou
The LHCb Computing Data Challenge DC06
Presentation transcript:

LCG Service Challenges Overview Based on SC Conferences and meetings: http://agenda.cern.ch/displayLevel.php?fid=3l181 Victor Zhiltsov JINR

SC3 GOALS Service Challenge 1 (end of 2004): Demonstrate the possibility of throughput of 500 MByte/s to Tier1 in LCG environment. Service Challenge 2 (spring 2005): Maintain the throughput 500 MByte/s cumulative on all Tier1s for prolonged time, and evaluate the data transfer environment on Tier0 и Tier1s. Service Challenge 3 (Summer-end of 2005) Show reliable and stable data transfer on each Tier1: to disk -150 MByte/s, to tape - 60 MByte/s. All Tier1s and some Tier2s involved. Service Challenge 4 (Spring 2006): Prove the GRID infrastructure performance to handle the LHC data in proposed rate (from raw data transfer up to final analysis) with all Tier1s and majority of Tier2s. Final Goal: Build the production GRID-infrastructure on all Tier0, Tier1 и Tier2 according to the LHC experiments specifics.

Summary of Tier0/1/2 Roles Tier0 (CERN): safe keeping of RAW data (first copy); first pass reconstruction, distribution of RAW data and reconstruction output to Tier1; reprocessing of data during LHC down-times; Tier1: safe keeping of a proportional share of RAW and reconstructed data; large scale reprocessing and safe keeping of corresponding output; distribution of data products to Tier2s and safe keeping of a share of simulated data produced at these Tier2s; Tier2: Handling analysis requirements and proportional share of simulated event production and reconstruction. No long term data storage. N.B. there are differences in roles by experiment Essential to test using complete production chain of each!

SC2 met its throughput targets >600MB/s daily average for 10 days was achieved - Midday 23rd March to Midday 2nd April Not without outages, but system showed it could recover rate again from outages Load reasonable evenly divided over sites (give network bandwidth constraints of Tier-1 sites)

Division of Data between sites SC2 10 days period Average throughput (MB/s) Data Moved (TB) BNL 61 51 FNAL GridKA 133 109 IN2P3 91 75 INFN 81 67 RAL 72 58 SARA 106 88 TOTAL 600 500

Storage and Software used Most sites ran Globus gridftp servers CCIN2P3, CNAF, GridKa, SARA The rest of the sites ran dCache BNL, FNAL, RAL Most sites used local or system-attached disk FZK used SAN via GPFS FNAL used production CMS dCache, including tape Load-balancing for gridftp sites was done by the RADIANT software running at CERN in push mode

Tier0/1 Network Topology April-July changes Triumf 2x1G CNAF IN2P3 GridKa 10G 1G shared CERN Tier-0 GEANT SARA 10G 2x1G Nether Light BNL PIC 1G shared 1G shared Nordic ESNet 10G 10G 2x1G 2x1G 10G StarLight UKLight RAL 2x1G 10G ASCC 2x1G 10G FNAL

GridPP Estimates of T2 Networking Number of T1s Number of T2s Total T2 CPU Total T2 Disk Average T2 CPU Average T2 Disk Network In Network Out KSI2K TB Gb/s ALICE 6 21 13700 2600 652 124 0.010 0.600 ATLAS 10 30 16200 6900 540 230 0.140 0.034 CMS 6 to 10 25 20725 5450 829 218 1.000 0.100 LHCb 14 7600 23 543 2 0.008 1 kSI2k corresponds to 1 Intel Xeon 2.8 GHz processor; 1000 600 1.0 The CMS figure of 1Gb/s into a T2 comes from the following: Each T2 has ~10% of current RECO data and 1/2 AOD (real+MC sample) These data are refreshed every 3 weeks compatible with frequency of major selection pass at T1s See CMS Computing Model S-30 for more details

SC3 – Milestones

SC3 – Milestone Decomposition File transfer goals: Build up disk – disk transfer speeds to 150MB/s SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s Tier1 goals: Bring in additional Tier1 sites wrt SC2 PIC and Nordic most likely added later: SC4? US-ALICE T1? Others? Tier2 goals: Start to bring Tier2 sites into challenge Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc. Experiment goals: Address main offline use cases except those related to analysis i.e. real data flow out of T0-T1-T2; simulation in from T2-T1 Service goals: Include CPU (to generate files) and storage Start to add additional components Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …

Key dates for Connectivity SC2 SC3 SC4 LHC Service Operation Full physics run 2005 2007 2006 2008 First physics First beams cosmics June05 - Technical Design Report  Credibility Review by LHCC Sep05 - SC3 Service – 8-9 Tier-1s sustain - 1 Gbps at Tier-1s, 5 Gbps at CERN Extended peaks at 10 Gbps CERN and some Tier-1s Jan06 - SC4 Setup – AllTier-1s 10 Gbps at >5 Tier-1s, 35 Gbps at CERN July06 - LHC Service – All Tier-1s 10 Gbps at Tier-1s, 70 Gbps at CERN

2005 Sep-Dec - SC4 preparation Historical slides from Les / Ian In parallel with the SC3 model validation period, in preparation for the first 2006 service challenge (SC4) – Using 500 MByte/s test facility test PIC and Nordic T1s and T2’s that are ready (Prague, LAL, UK, INFN, ..) Build up the production facility at CERN to 3.6 GBytes/s Expand the capability at all Tier-1s to full nominal data rate SC2 SC3 SC4 Full physics run 2005 2007 2006 2008 First physics First beams cosmics

2006 Jan-Aug - SC4 slides from Les / Historical Ian SC4 – full computing model services - Tier-0, ALL Tier-1s, all major Tier-2s operational at full target data rates (~2 GB/sec at Tier-0) - acquisition - reconstruction - recording – distribution, PLUS ESD skimming, servicing Tier-2s Goal – stable test service for one month – April 2006 100% Computing Model Validation Period (May-August 2006) Tier-0/1/2 full model test - All experiments - 100% nominal data rate, with processing load scaled to 2006 cpus SC2 SC3 SC4 Full physics run 2005 2007 2006 2008 First physics First beams cosmics

2006 Sep – LHC service available Historical slides from Les / Ian 2006 Sep – LHC service available The SC4 service becomes the permanent LHC service – available for experiments’ testing, commissioning, processing of cosmic data, etc. All centres ramp-up to capacity needed at LHC startup TWICE nominal performance Milestone to demonstrate this 3 months before first physics data  April 2007 SC2 SC3 SC4 LHC Service Operation Full physics run 2005 2007 2006 2008 First physics First beams cosmics

Tier2 Roles Tier2 roles vary by experiment, but include: Production of simulated data; Production of calibration constants; Active role in [end-user] analysis Must also consider services offered to T2s by T1s e.g. safe-guarding of simulation output; Delivery of analysis input. No fixed dependency between a given T2 and T1 But ‘infinite flexibility’ has a cost…

Tier2 Functionality (At least) two distinct cases: Simulation output This is relatively straightforward to handle Most simplistic case: associate a T2 with a given T1 Can be reconfigured Logical unavailability of a T1 could eventually mean that T2 MC production might stall More complex scenarios possible But why? Make it as simple as possible, but no simpler… Analysis Much less well understood and likely much harder…

Tier2s are assumed to offer, in addition to the basic Grid functionality: Client services whereby reliable file transfers maybe initiated to / from Tier1/0 sites, currently based on the gLite File Transfer software (gLite FTS); Managed disk storage with an agreed SRM interface, such as dCache or the LCG DPM.

To participate in the Service Challenge, it is required: Tier2 sites install the gLite FTS client and a disk storage manager (dCache?), For the throughput phase, no long term storage of the data transferred is required, but they nevertheless need to agree with the corresponding Tier1 that the necessary storage area to which they upload data (analysis is not included in Service Challenge 3) and the gLite FTS backend service is provided.

Tier2 Model As Tier2s do not typically provide archival storage, this is a primary service that must be provided to them, assumed via a Tier1. Although no fixed relationship between a Tier2 and a Tier1 should be assumed, a pragmatic approach for Monte Carlo data is nevertheless to associate each Tier2 with a ‘preferred’ Tier1 that is responsible for long-term storage of the Monte Carlo data produced at the Tier2. By default, it is assumed that data upload from the Tier2 will stall should the Tier1 be logically unavailable. This in turn could imply that Monte Carlo production will eventually stall, if local storage becomes exhausted, but it is assumed that these events are relatively rare and the production manager of the experiment concerned may in any case reconfigure the transfers to an alternate site in case of prolonged outage.

A Simple T2 Model (1/2) N.B. this may vary from region to region Each T2 is configured to upload MC data to and download data via a given T1 In case the T1 is logical unavailable, wait and retry MC production might eventually stall For data download, retrieve via alternate route / T1 Which may well be at lower speed, but hopefully rare Data residing at a T1 other than ‘preferred’ T1 is transparently delivered through appropriate network route T1s are expected to have at least as good interconnectivity as to T0

A Simple T2 Model (2/2) Each Tier-2 is associated with a Tier-1 that is responsible for getting them set up Services at T2 are managed storage and reliable file transfer FTS: DB component at T1, user agent also at T2; DB for storage at T2 1GBit network connectivity – shared (less will suffice to start with, more maybe needed!) Tier1 responsibilities: Provide archival storage for (MC) data that is uploaded from T2s Host DB and (gLite) File Transfer Server (Later): also data download (eventually from 3rd party) to T2s Tier2 responsibilities: Install / run dCache / DPM (managed storage s/w with agreed SRM i/f) Install gLite FTS client (batch service to generate & process MC data) (batch analysis service – SC4 and beyond) Tier2s do not offer persistent (archival) storage!

Service Challenge in Russia T2 T1 T0

Tier1/2 Network Topology

Tier2 in Russia IHEP 5+ 1.6 TB ITEP 20 ? 2 TB ? JINR 10 ? SINP 30 ? Institute Link CPUs Disk OS/Middleware IHEP 100 Mb/s half-duplex 5+ 1.6 TB …? ITEP 60 Mb/s 20 ? 2 TB ? SL-3.0.4 (kernel 2.4.21) JINR 40Mb/s 10 ? SLC3.0.X LCG-2_4_0, Castor, gridftp, gLite? SINP 1Gbit/s 30 ? gridftp, Castor

Summary The first T2 sites need to be actively involved in Service Challenges from Summer 2005 ~All T2 sites need to be successfully integrated just over one year later Adding the T2s and integrating the experiments’ software in the SCs will be a massive effort! Initial T2s for SC3 have been identified A longer term plan is being executed

Conclusions To be ready to fully exploit LHC, significant resources need to be allocated to a series of Service Challenges by all concerned parties These challenges should be seen as an essential on-going and long-term commitment to achieving production LCG The countdown has started – we are already in (pre-)production mode Next stop: 2020