Summary of Service Challenges

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
HEPiX Edinburgh 28 May 2004 LCG les robertson - cern-it-1 Data Management Service Challenge Scope Networking, file transfer, data management Storage management.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SC4 Planning Planning for the Initial LCG Service September 2005.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
LCG Storage Management Workshop - Goals and Timeline of SC3 Jamie Shiers, CERN-IT-GD April 2005.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
T0-T1 Networking Meeting 16th June Meeting
WLCG IPv6 deployment strategy
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
“A Data Movement Service for the LHC”
WP18, High-speed data recording Krzysztof Wrona, European XFEL
The LHC Computing Environment
LCG Service Challenge: Planning and Milestones
EGEE Middleware Activities Overview
James Casey, IT-GD, CERN CERN, 5th September 2005
Service Challenge 3 CERN
3D Application Tests Application test proposals
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
Database Readiness Workshop Intro & Goals
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Castor services at the Tier-0
Olof Bärring LCG-LHCC Review, 22nd September 2008
The LCG Service Challenges: Ramping up the LCG Service
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Simulation use cases for T2 in ALICE
Workshop Summary Dirk Duellmann.
R. Graciani for LHCb Mumbay, Feb 2006
LCG Service Challenges Overview
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
Development of LHCb Computing Model F Harris
IPv6 update Duncan Rand Imperial College London
The LHCb Computing Data Challenge DC06
Presentation transcript:

Summary of Service Challenges Service Challenge Meeting Summary of Service Challenges Based on slides and discussions at March 15 SC Meeting in Lyon

Agenda Status of Service Challenge 2 Preparations for Service Challenge 3 Involvement of Experiments in SC3 Milestones and Workplan CERN IT-GD

Service Challenge Summary “Service Challenge 2” started 14th March Set up Infrastructure to 7 Sites BNL, CNAF, FNAL, FZK, IN2P3, NIKHEF, RAL 100MB/s to each site 500MB/s combined to all sites at same time 500MB/s to a few sites individually Use RADIANT transfer software Still with dummy data 1GB file size Goal : by end March, sustained 500 MB/s at CERN BNL status and plans in next slides CERN IT-GD

BNL Plans  Week of 3/14: “Functionality” – Demonstrate the basic features: Verify infrastructure Check SRM interoperability Check load handling (effective distribution of requests over multiple servers) Verify controls (tools that manage the transfers and the cleanup) ??? Week of 3/21: “Performance” : Run SRM/SRM transfers for 12-24h at 80-100MB/sec Week of 3/28: “Robustness”: Run SRM/SRM transfers for the entire week at ~60% of available capacity (600Mbit/sec?) CERN IT-GD

BNL Status Deployed dCache testbed for SC; the system is equipped with: One core node and four pool servers; SRM and gridftp doors. 4 x 300GB = 1.2TB space. Disk only as backend the production dCache system is interfaced with HPSS, however during this SC phase we’d like to not involve tape access due to reduced total capacity and critical production schedule conflicts. Tested successfully the following configurations: CERN SRM-CASTOR ---> BNL SRM-dCache CERN SRM-CASTOR ---> BNL GRIDFTP-dCache BNL SRM-dCache ---> CERN SRM-CASTOR BNL GRIDFTP-dCache ---> CERN SRM-CASTOR Completed successfully a test using list-driven SRM 3rd party transfers between CERN SRM-CASTOR and BNL SRM-dCache. The duration of the test was two hours. CERN IT-GD

Plans and timescale - Proposals Monday 14th- Friday 18th Tests site individually. 6 sites, so started early with NL ! Monday IN2P3 Tuesday FNAL Wednesday FZK Thursday INFN Friday RAL Monday 21st – Friday 1st April Test all sites together Monday 4th – Friday 8th April Up to 500MB/s single site tests Fermi, FZK, SARA Monday 11th - … Tape tests ? Fermi, INFN, RAL Overlaps with Easter and Storage Management w/s at CERN This will be updated with BNL asap CERN IT-GD

Meetings SC2 Meetings Post SC2 Meetings Storage Workshop Phone Conference Monday Afternoon Daily call with different sites Post SC2 Meetings Tuesday 4pm (starting April) Storage Workshop 5-7 April @ CERN Pre-SC3 Meeting June 13th – 15th at CERN Taipei SC Meeting 26th April Hepix @FZK 9-13 May There are also others: Weekly SC meetings at CERN; T2 discussions with UK next June in Glasgow? T2 INFN workshop in Bari 26-27 May etc. etc. CERN IT-GD

Longer Term Planning Mid April – Mid June – SC2 SC3 deployment Taipei BNL – now! PIC TRIUMF SC3 deployment SRM Tests FZK, SARA, IN2P3 BNL (end March?) CERN Tape Write at given rate (300MB/s) Split over 6 sites for remote write CERN IT-GD

2005 Sep-Dec - SC3 Service 50% Computing Model Validation Period The service exercised in SC3 is made available to experiments as a stable, permanent service for computing model tests Additional sites are added as they come up to speed End-to-end sustained data rates – 500 Mbytes/s at CERN (aggregate) 60 Mbytes/s at Tier-1s Modest Tier-2 traffic SC2 SC3 Full physics run 2005 2007 2006 2008 First physics First beams cosmics CERN IT-GD

SC3 – Milestone Decomposition File transfer goals: Build up disk – disk transfer speeds to 150MB/s SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s Tier1 goals: Bring in additional Tier1 sites wrt SC2 PIC and Nordic most likely added later: SC4? US-ALICE T1? Others? Tier2 goals: Start to bring Tier2 sites into challenge Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc. Experiment goals: Address main offline use cases except those related to analysis i.e. real data flow out of T0-T1-T2; simulation in from T2-T1 Service goals: Include CPU (to generate files) and storage Start to add additional components Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, … CERN IT-GD

LCG Service Challenges: Tier2 Issues

A Simple T2 Model N.B. this may vary from region to region Each T2 is configured to upload MC data to and download data via a given T1 In case the T1 is logical unavailable, wait and retry MC production might eventually stall For data download, retrieve via alternate route / T1 Which may well be at lower speed, but hopefully rare Data residing at a T1 other than ‘preferred’ T1 is transparently delivered through appropriate network route T1s need at least as good interconnectivity as to T0 (?) CERN IT-GD

Basic T2 Services T2s will need to provide services for data up- and down-load Assume that this uses the same components as between T0 and T1s Assume that this also includes an SRM interface to local disk pool manager This / these should / could also be provided through LCG Networking requirements need to be further studied but current estimates suggest 1Gbit connections will satisfy needs of a given experiment Can be dramatically affected by analysis model, heavy ion data processing etc CERN IT-GD

Which Candidate T2 Sites? Would be useful to have: Good local support from relevant experiment(s) Some experience with disk pool mgr and file transfer s/w ‘Sufficient’ local CPU and storage resources Manpower available to participate in SC3+ And also define relevant objectives? 1Gbit/s network connection to T1? Probably a luxury at this stage… First T2 site(s) will no doubt be a learning process Hope to (semi-)automate this so that adding new sites can be achieved with low overhead CERN IT-GD

Possible T2 Procedure(?) Work through existing structures where possible INFN in Italy, GridPP in the UK, HEPiX, etc. USATLAS, USCMS etc. Probably need also some ‘regional’ events Complemented with workshops at CERN and elsewhere Work has already started with GridPP + INFN Other candidates T2s (next) Choosing T2s for SC3 together with experiments Strong interest also expressed from US sites Work through BNL and FNAL / US-ATLAS and US-CMS + ALICE US T2 sites a solved issue? CERN IT-GD

Tier 2s New version of ‘Joining the SC as a Tier-1’ Test with DESY/FZK – April ‘How to be a Tier-2’ document DESY (CMS + ATLAS) Lancaster (ATLAS) London (CMS) Torino (ALICE) Scotgrid (LHCb) Each Tier-2 is associated with a Tier-1 who is responsible for getting them set up Services at Tier-2 are managed storage and reliable file transfer CERN IT-GD

File Transfer Software SC3 Gavin McCance – JRA1 Data Management Cluster Service Challenge Meeting March 15, 2005, Lyon, France

Reliable File Transfer Requirements LCG created a set of requirements based on the Robust Data Transfer Service Challenge LCG and gLite teams translated this into a detailed architecture and design document for the software and the service A prototype (radiant) was created to test out the architecture and used in SC1 and SC2 gLite FTS (“File Transfer Service”) is an instantiation of the same architecture and design, and is the candidate for use in SC3 Current version of FTS and SC2 radiant software are interoperable CERN IT-GD

Did we do it? The goal was to demonstrate ~ 1 week unattended continuous running of the software for the SC meeting On setup 1: demonstrated 1 week ~continuous running But, 2 interventions were necessary Setup 2: higher bandwidth, newer version of software After false start has been stable for a few days 1 intervention necessary (same cause as on setup1) Issues uncovered and bugs found (and fixed!) We now have a test setup we can use to keep stressing the software CERN IT-GD

Test setups Plan is to maintain these test setups 24/7 They’ve already been vital in finding issues that we could not find on more limited test setups Test 1 (low bandwidth) should be used for testing the new version of the software Make sure no bugs have been introduced Try to maintain stable service level, but upgrade to test new versions Test 2 (higher bandwidth) should be used to demonstrate stability and performance Maintain service level Upgrade less frequently, only when new versions have demonstrated stability on smaller test setup Ramp up this test setup as we head to SC3 More machines More sites Define operating procedures CERN IT-GD

LCG Service Challenges: Experiment Plans

Experiment involvement SC1 and SC2 do not involve the experiments At highest level, SC3 tests all offline Use Cases except for analysis Future Service Challenges will include also analysis This adds a number of significant issues over and above robust file transfer! Must be discussed and planned now CERN IT-GD

ALICE vs. LCG Service Challenges We could: Sample (bunches of) “RAW” events stored at T0 from our Catalogue Reconstruct at T0 Ship from T0 to T1’s Reconstruct at T1 with calibration data Store/Catalogue the output As soon as T2’s start to join SC3: Keep going with reconstruction steps + Simulate events at T2’s Ship from T2 to T1’s Reconstruct at T1’s and store/catalogue the output CERN IT-GD

ALICE vs. LCG Service Challenges What would we need for SC3? AliRoot deployed on LCG/SC3 sites - ALICE Our AliEn server with: - ALICE task queue for SC3 jobs catalogue to sample existing MC events and mimic raw data generation from DAQ UI(s) for submission to LCG/SC3 - LCG WMS + CE/SE Services on SC3 - LCG Appropriate amount of storage resources - LCG Appropriate JDL files for the different tasks - ALICE Access to the ALICE AliEn Data Catalogue from LCG CERN IT-GD

ALICE vs. LCG Service Challenges Last step: Try the analysis of reconstructed data That is SC4: We have some more time to think about it CERN IT-GD

ATLAS & SC3 July: SC3 phase 1 Infrastructure performance demonstration Little direct ATLAS involvement ATLAS observes performance of components and services to guide future adoption decisions Input from ATLAS database group CERN IT-GD

ATLAS & SC3 September: SC3 phase 2 Experiment testing of computing model Running 'real' software production and data management systems but working with throw-away data. ATLAS production involvement Release 11 scaling debugging Debug scaling for distributed conditions data access, calibration/alignment, DDM, event data distribution and discovery T0 exercise testing CERN IT-GD

ATLAS & SC3 Mid-October: SC3 phase 3 Service phase: becomes a production facility Production, adding more T2s Operations at SC3 scale producing and distributing useful data New DDM system deployed and operating Conduct distributed calibration/align scaling test Conduct T0 exercise Progressively integrate new tier centers into DDM system. After T0 exercise move to steady state operations for T0 processing and data handling workflows CERN IT-GD

CMS & SC3 Would like to use SC3 ‘service’ asap Are already shipping data around & processing it as foreseen in SC3 service phase More details in e.g. FNAL SC presentation CERN IT-GD

Service Challenge III Service Challenge III - stated Goal by LCG: Achieve 50% of the nominal data rate Evaluating tape capacity to achieve challenge while still supporting the experiment Add significant degrees of additional complexity file catalogs OSG currently has only experiment supported catalogs, CMS has a prototype data management system at the same time, so exactly what this means within the experiment needs to be understood VO management software Anxious to reconcile VO management infrastucture in time for the challenge. Within OSG we have started using some of the more advanced functionality of VOMS that we would clearly like to maintain. Uses the CMS’s offline frameworks to generate the data and drive the data movement. Requirements are still under negotiation CERN IT-GD

LHCb Would like to see robust infrastructure before getting involved Timescale: October 2005 Expecting LCG to provide somewhat more than other experiments i.e. higher level functionality that file transfer service CERN IT-GD

Experiment plans - Summary SC3 phases Setup and config - July + August Experiment software with throwaway data - September Service phase ATLAS – Mid October ALICE – July would be best… LHCb – post-October CMS – July (or sooner) Tier-0 exercise Distribution to Tier-1 … CERN IT-GD

Milestones and Workplan Presentation to PEB next Tuesday Draft SC TDR chapter on Web http://agenda.cern.ch/askArchive.php?base=agenda&categ=a051047&id=a051047s1t3%2Fmoreinfo%2FService-Challenge-TDR.doc Will be updated following discussions this week Need to react asap to experiments’ SC3 plans More discussions with T1s and T2s on implications of experiment involvement in SC3. CERN IT-GD

Summary We continue to learn a lot from the on-going SCs In practice, even ‘straight-forward’ things involve significant effort / negotation etc. e.g. network configuration at CERN for SC2/SC3 There is a clear high-level roadmap for SC3 and even SC4 But a frightening amount of work buried in the detail… Should not underestimate additional complexity of SC3/4 over SC1/2! Thanks for all the great participation and help! CERN IT-GD