USATLAS SC4. 2 ?!…… The same host name for dual NIC dCache door is resolved to different IP addresses depending.


Similar presentations
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
CERN – June 2007 View of the ATLAS detector (under construction) 150 million sensors deliver data … … 40 million times per second.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
FZU participation in the Tier0 test CERN August 3, 2006.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Jamie Shiers February 2004 Assembled from SC4 Workshop presentations + Les’ plenary talk at CHEP The Worldwide LHC Computing Grid Service Experiment Plans.
USATLAS SC4. 2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending.
The ATLAS Grid Progress Roger Jones Lancaster University GridPP CM QMUL, 28 June 2006.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Dario Barberis: ATLAS Activities at Tier-2s Tier-2 Workshop June ATLAS Activities at Tier-2s Dario Barberis CERN & Genoa University.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
David Stickland CMS Core Software and Computing
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
LHCC meeting – Feb’06 1 SC3 - Experiments’ Experiences Nick Brook In chronological order: ALICE CMS LHCb ATLAS.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
LCG Service Challenge: Planning and Milestones
Data Challenge with the Grid in ATLAS
SRM2 Migration Strategy
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Dario Barberis CERN & Genoa University
The LHCb Computing Data Challenge DC06
Presentation transcript:


2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending on which DNS is inquired.



5 Meeting Notes  Use Dual-home dCache doors  The external interface of doors are in  The internal interface of the doors are in  The data flow (in/out) will always go through doors.  Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal IP address, determined by which DNS is used.  Bring the routing for and /23 back to USATLAS SW7.  Request ACL for VLAN 315(?) which reside.  One end: LHC OPN address blocks or 3+2 Tier 2s.  The other end will be  What about other T3 sites to contact with the external interface of dCache doors?  Need to go through firewall or not?  Two types of storage (Durable and Permanent)  When we received ESD2, the ESD1 will be discarded. Therefore, we do not need to save ESD to HPSS. We need them, we can get from other Tier 0 and Tier 1 sites.  RAW, our fraction of ESD, AOD, Tier 2 simulation results => Permanent which has tape backend.  Other ESD, AOD will go to durable storage which is not necessarily backed up by tape system.

6 BNL SC4 Plans  VLAN 315 can send network traffic?  FTS and LFC will be setup.  LCG  VObox: We also installed ATLAS DQ2 installed on top of it (done)  BDII provide static and dynamic monitoring information (STATIC Setup?)  R-GMA provide traffic monitoring from Tier 1 to Tier 2. (Plan to make it available before SC4 Service Phase)  CE is based on BNL condor system (Plan to be ready before SC4 service phase June)  Lcg-utils (done) dCache Preparation (Durable, Permanent, Information Publish).  Permanent  System manages cache, tape copy, Access sometimes slow  Durable  User (VO) manages cache, WITHOUT tape copy, Access fast

7 Publish Information for BNL dCache  List of transfer protocols per SE available from information system  SRM knows what it supports, can inform client  FTS Channel Information.  LFC Information. dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/durable-path/dteam GlueSAPath: /pnfs/my_domain/durable-path/dteam GlueSAType: durable [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain [...] dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/permanent-path/dteam GlueSAPath: /pnfs/my_domain/permanent-path/dteam GlueSAType: permanent [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain

8 SC4 Pre-Production System  Pre-production service will be used as soon as it is available and its usage won't go away when SC4 starts. There may be periods where the pre-production service is not extensively used, but the goal is from now on to always develop against the pre-production service.

9 SC4 April Throughput  Need dCache!!!  April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an average daily rate to each Tier1 at or above the full nominal rate (200MB/Second).  We should continue to run at the same rates unattended over Easter weekend ( April).  Tuesday April 18th - Monday April 24th we should perform the tape tests at the rates in the table below (75 MB/second).  From after the con-call on Monday April 24th until the end of the month experiment-driven transfers can be scheduled. (LFC will be needed by then for DQ2).

10 SC4 Tier 1 to Tier 1 Data Transfer (May)   Within Each VO, the details of the T1 T1 transfers still need to be finalized. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows:   We have to focus on our two sister Tier 1 site: IN2P3 and FZK first.   All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s.   dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s   These tests would take place during May, after the April throughput tests and before the SC4 service begins in June.

11 ATLAS Specific Plan  Plans (ATLAS)  Tier 2 Plans  Tier 2 Workshop  Background Information (Darios Slides)

12 Summary of requests from ATLAS  March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)  April-May (pre-SC4): tests of distributed operations on a “small” testbed (PPS)  Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s ( Send AODs to (at least) a few Tier-2s  Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL), and Send AODs to (at least) a few Tier-2s  3 weeks in July: distributed processing tests (Part 1)  2 weeks in July-August: distributed analysis tests (Part 1)  3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with data to Tier-2s  3 weeks in October: distributed processing tests (Part 2)  3-4 weeks in November: distributed analysis tests (Part 2)

13 Tier 2 Plans  Details of involving Tier 2 are in planning too.  Tier 2 dCache: dCache needs to be stabilize and operational in one or all sites at Midwest, southwest and Northwest ( first week of June) for receiving AODs to (at least) a few Tier-2s.  All Tier 2 dCache should be up and in production in September  Extend data distribution to all (most) Tier-2s  Use 3D tools to distribute calibration data  Base line client tools should be deployed at Tier 2 centers.  No any other services required for Tier 2 except SRM and DQ2.

14 WLCG Tier 2 Workshop   eeting&showDate=all&showSession=all&detailLevel=contribution eeting&showDate=all&showSession=all&detailLevel=contribution eeting&showDate=all&showSession=all&detailLevel=contribution  from Monday 12 June 2006 (11:00) to Wednesday 14 June 2006 (18:00) at CERN ( Council Chamber )Council Chamber  Four Experiment Activities Introduction.  MC Simulation User Cases  An Overview of Calibration & Alignment  Analysis Use Cases  Services Required at / for Tier2s (Grid, Application).  Support and Operation Issues.  Happen in the middle of June.

ATLAS plans for 2006: Computing System Commissioning and Service Challenge 4 Dario Barberis CERN & Genoa University

16 Computing System Commissioning Goals  Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007:  Calibration and alignment procedures and conditions DB  Full trigger chain  Event reconstruction and data distribution  Distributed access to the data for analysis  At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates

17 ATLAS Computing Model  Tier-0:  Copy RAW data to Castor tape for archival  Copy RAW data to Tier-1s for storage and reprocessing  Run first-pass calibration/alignment (within 24 hrs)  Run first-pass reconstruction (within 48 hrs)  Distribute reconstruction output (ESDs, AODs & TAGS) to Tier-1s  Tier-1s:  Store and take care of a fraction of RAW data  Run “slow” calibration/alignment procedures  Rerun reconstruction with better calib/align and/or algorithms  Distribute reconstruction output to Tier-2s  Keep current versions of ESDs and AODs on disk for analysis  Tier-2s:  Run simulation  Keep current versions of AODs on disk for analysis

18 ATLAS Tier-0 Data Flow EF CPU farm T1 T1s Castor buffer RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s Tape

19 Recent Update for Tier 0 Tier 1 Data Transfer

20 BNL Data Flow (2008 Based on 20%) Tier-0 CPU farm T1 Other Tier-1s BNL disk buffer RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day AODm2 500 MB/file Hz 0.34K f/day 4 MB/s 0.32 TB/day RAW ESD2 AODm Hz 7.48K f/day 88 MB/s 7.32 TB/day T1 Other Tier-1s T1 Tier-2s BNL Tape RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day disk storage AODm2 500 MB/file Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 80 MB/s 0.8 TB/day AODm2 500 MB/file 0.03 Hz 3.0K f/day 16 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file Hz 3.1K f/day 4*9 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s*3 1.6 TB/day AODm2 500 MB/file Hz 0.70 f/day 4 MB/s* TB/day Plus simulation & analysis data flow Real data storage, reprocessing and distribution 234MB*n analysis

21 BNL to 3+2 Tier2 (Estimation!)  See  Tier 1 to Tier 2 likely to be very bursty and driven by analysis demands Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of 10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.).  Desire to reach 100MBs for each of 3+2 Tier 2 clusters.  300MB/second ~ 500MB/second in total to BNL.  Tier 2 to Tier 1 transfer are almost entirely continuous simulation transfers  The aggregate input rate to Tier 1 center is comparable to 20%~25% of the rate from tier 0.

22 Tier-0 Tier-1 BNL Write buffer T1 Tier-2s BNL Tape Read storage AODm2 500 MB/file Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day ESD1 100MB AODM1 20MB RAW 64MB ESD2 80MB (80% EST from T1s) AODM2 16MB ESD2 20MB AODM2 36MB CPU farm BNL Data Flow (2008) 88MB (RAW, ESD, AOD) 350MB (including raw data) (Analysis AOD)500MB 200MB(?) (304MB*20%)~60MB, Simu 60MB (Tier 2)

23 ATLAS SC4 Tests  Complete Tier-0 test  Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm  Calibration loop and handling of conditions data  Including distribution of conditions data to Tier-1s (and Tier-2s)  Transfer of RAW, ESD, AOD and TAG data to Tier-1s  Transfer of AOD and TAG data to Tier-2s via Tier 1  Data and dataset registration in DB (add meta-data information to meta-data DB)  Distributed production  Full simulation chain run at Tier-2s (and Tier-1s)  Data distribution to Tier-1s, other Tier-2s and CAF  Reprocessing raw data at Tier-1s  Data distribution to other Tier-1s, Tier-2s and CAF  Distributed analysis  “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly)  Tests of performance of job submission, distribution and output retrieval

24 ATLAS SC4 Plans (1)  Tier-0 data flow tests:  Phase 0: 3-4 weeks in March-April for internal Tier-0 tests  Phase 1: last 3 weeks of June with data distribution to Tier-1s  Run integrated data flow tests using the SC4 infrastructure for data distribution  Send AODs to (at least) a few Tier-2s  Automatic operation for O(1 week)  First version of shifter’s interface tools  Treatment of error conditions  Phase 2: 3-4 weeks in September-October  Extend data distribution to all (most) Tier-2s  Use 3D tools to distribute calibration data

25 ATLAS SC4 Plans (2)  ATLAS includes continuous distributed simulation productions (Kaushik)  SC4: distributed reprocessing tests:  Test of the computing model using the SC4 data management infrastructure  Needs file transfer capabilities between Tier-1s and back to CERN CAF  Also distribution of conditions data to Tier-1s (3D)  Storage management is also an issue  Could use 3 weeks in July and 3 weeks in October  SC4: distributed simulation intensive tests:  Once reprocessing tests are OK, we can use the same infrastructure to implement our computing model for simulation productions  As they would use the same setup both from our ProdSys and the SC4 side  First separately, then concurrently.

26 Overview of requirements for SC4  SRM (“baseline version”) on all storages  VO Box per Tier-1 and in Tier-0  LFC server per Tier-1 and in Tier-0  FTS server per Tier-1 and in Tier-0  Permanent Storage and Durable Storage.  separate SRM entry points for permanent and durable storages.  Disk space is managed by DQ2.  Counts as online (“disk”) data in the ATLAS Computing Model  Ability to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO Box  Ability to deploy DQ2 services on VO Box as during SC3  No new requirements on the Tier-2s besides SRM SE

27 Overview of FTS and VO Box  Hence, an ATLAS VO Box will contain:  FTS ATLAS agents  And remaining DQ2 persistent services (less s/w than for SC3 as some functionality merged into FTS in the form of FTS VO agents)  DQ2 site services will have associated SFTs for testing

28 ATLAS SC4 Requirement (PPS)  Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to test our distributed systems (ProdSys, DDM, DA) prior to deployment  It would allow testing new m/w features without disturbing other operations  We could also tune properly the operations on our side  The aim is to get to the agreed scheduled time slots with an already tested system and really use the available time for relevant scaling tests  This setup would not interfere with concurrent large-scale tests or data transfers run by other experiments