Download presentation
Presentation is loading. Please wait.
Published byNeil Benson Modified over 9 years ago
1
USATLAS SC4
2
2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC dCache door is resolved to different IP addresses depending on which DNS is inquired.
3
3
4
4
5
5 Meeting Notes Use Dual-home dCache doors The external interface of doors are in 192.12.15.0 The internal interface of the doors are in 130.199.185.0. The data flow (in/out) will always go through doors. Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal IP address, determined by which DNS is used. Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7. Request ACL for VLAN 315(?) which 192.12.15.0 reside. One end: LHC OPN address blocks or 3+2 Tier 2s. The other end will be 192.12.15.0. What about other T3 sites to contact with the external interface of dCache doors? Need to go through firewall or not? Two types of storage (Durable and Permanent) When we received ESD2, the ESD1 will be discarded. Therefore, we do not need to save ESD to HPSS. We need them, we can get from other Tier 0 and Tier 1 sites. RAW, our fraction of ESD, AOD, Tier 2 simulation results => Permanent which has tape backend. Other ESD, AOD will go to durable storage which is not necessarily backed up by tape system.
6
6 BNL SC4 Plans VLAN 315 can send network traffic? FTS and LFC will be setup. LCG 2.7.0 VObox: We also installed ATLAS DQ2 installed on top of it (done) BDII provide static and dynamic monitoring information (STATIC Setup?) R-GMA provide traffic monitoring from Tier 1 to Tier 2. (Plan to make it available before SC4 Service Phase) CE is based on BNL condor system (Plan to be ready before SC4 service phase June) Lcg-utils (done) dCache Preparation (Durable, Permanent, Information Publish). Permanent System manages cache, tape copy, Access sometimes slow Durable User (VO) manages cache, WITHOUT tape copy, Access fast
7
7 Publish Information for BNL dCache List of transfer protocols per SE available from information system SRM knows what it supports, can inform client FTS Channel Information. LFC Information. dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/durable-path/dteam GlueSAPath: /pnfs/my_domain/durable-path/dteam GlueSAType: durable [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain [...] dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/permanent-path/dteam GlueSAPath: /pnfs/my_domain/permanent-path/dteam GlueSAType: permanent [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain
8
8 SC4 Pre-Production System Pre-production service will be used as soon as it is available and its usage won't go away when SC4 starts. There may be periods where the pre-production service is not extensively used, but the goal is from now on to always develop against the pre-production service.
9
9 SC4 April Throughput Need dCache!!! April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an average daily rate to each Tier1 at or above the full nominal rate (200MB/Second). We should continue to run at the same rates unattended over Easter weekend (14 - 16 April). Tuesday April 18th - Monday April 24th we should perform the tape tests at the rates in the table below (75 MB/second). From after the con-call on Monday April 24th until the end of the month experiment-driven transfers can be scheduled. (LFC will be needed by then for DQ2).
10
10 SC4 Tier 1 to Tier 1 Data Transfer (May) Within Each VO, the details of the T1 T1 transfers still need to be finalized. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows: We have to focus on our two sister Tier 1 site: IN2P3 and FZK first. All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s. dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s These tests would take place during May, after the April throughput tests and before the SC4 service begins in June.
11
11 ATLAS Specific Plan Plans (ATLAS) Tier 2 Plans Tier 2 Workshop Background Information (Darios Slides)
12
12 Summary of requests from ATLAS March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0) April-May (pre-SC4): tests of distributed operations on a “small” testbed (PPS) Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s ( Send AODs to (at least) a few Tier-2s Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL), and Send AODs to (at least) a few Tier-2s 3 weeks in July: distributed processing tests (Part 1) 2 weeks in July-August: distributed analysis tests (Part 1) 3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with data to Tier-2s 3 weeks in October: distributed processing tests (Part 2) 3-4 weeks in November: distributed analysis tests (Part 2)
13
13 Tier 2 Plans Details of involving Tier 2 are in planning too. Tier 2 dCache: dCache needs to be stabilize and operational in one or all sites at Midwest, southwest and Northwest ( first week of June) for receiving AODs to (at least) a few Tier-2s. All Tier 2 dCache should be up and in production in September Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data Base line client tools should be deployed at Tier 2 centers. No any other services required for Tier 2 except SRM and DQ2.
14
14 WLCG Tier 2 Workshop https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution from Monday 12 June 2006 (11:00) to Wednesday 14 June 2006 (18:00) at CERN ( Council Chamber )Council Chamber Four Experiment Activities Introduction. MC Simulation User Cases An Overview of Calibration & Alignment Analysis Use Cases Services Required at / for Tier2s (Grid, Application). Support and Operation Issues. Happen in the middle of June.
15
ATLAS plans for 2006: Computing System Commissioning and Service Challenge 4 Dario Barberis CERN & Genoa University
16
16 Computing System Commissioning Goals Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007: Calibration and alignment procedures and conditions DB Full trigger chain Event reconstruction and data distribution Distributed access to the data for analysis At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates
17
17 ATLAS Computing Model Tier-0: Copy RAW data to Castor tape for archival Copy RAW data to Tier-1s for storage and reprocessing Run first-pass calibration/alignment (within 24 hrs) Run first-pass reconstruction (within 48 hrs) Distribute reconstruction output (ESDs, AODs & TAGS) to Tier-1s Tier-1s: Store and take care of a fraction of RAW data Run “slow” calibration/alignment procedures Rerun reconstruction with better calib/align and/or algorithms Distribute reconstruction output to Tier-2s Keep current versions of ESDs and AODs on disk for analysis Tier-2s: Run simulation Keep current versions of AODs on disk for analysis
18
18 ATLAS Tier-0 Data Flow EF CPU farm T1 T1s Castor buffer RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s Tape
19
19 Recent Update for Tier 0 Tier 1 Data Transfer
20
20 BNL Data Flow (2008 Based on 20%) Tier-0 CPU farm T1 Other Tier-1s BNL disk buffer RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day AODm2 500 MB/file 0.004 Hz 0.34K f/day 4 MB/s 0.32 TB/day RAW ESD2 AODm2 0.088 Hz 7.48K f/day 88 MB/s 7.32 TB/day T1 Other Tier-1s T1 Tier-2s BNL Tape RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day disk storage AODm2 500 MB/file 0.008 Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 80 MB/s 0.8 TB/day AODm2 500 MB/file 0.03 Hz 3.0K f/day 16 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 4*9 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s*3 1.6 TB/day AODm2 500 MB/file 0.008 Hz 0.70 f/day 4 MB/s*3 0.32 TB/day Plus simulation & analysis data flow Real data storage, reprocessing and distribution 234MB*n analysis
21
21 BNL to 3+2 Tier2 (Estimation!) See https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow Tier 1 to Tier 2 likely to be very bursty and driven by analysis demands Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of 10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.). Desire to reach 100MBs for each of 3+2 Tier 2 clusters. 300MB/second ~ 500MB/second in total to BNL. Tier 2 to Tier 1 transfer are almost entirely continuous simulation transfers The aggregate input rate to Tier 1 center is comparable to 20%~25% of the rate from tier 0.
22
22 Tier-0 Tier-1 BNL Write buffer T1 Tier-2s BNL Tape Read storage AODm2 500 MB/file 0.008 Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day ESD1 100MB AODM1 20MB RAW 64MB ESD2 80MB (80% EST from T1s) AODM2 16MB ESD2 20MB AODM2 36MB CPU farm BNL Data Flow (2008) 88MB (RAW, ESD, AOD) 350MB (including raw data) (Analysis AOD)500MB 200MB(?) (304MB*20%)~60MB, Simu 60MB (Tier 2)
23
23 ATLAS SC4 Tests Complete Tier-0 test Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm Calibration loop and handling of conditions data Including distribution of conditions data to Tier-1s (and Tier-2s) Transfer of RAW, ESD, AOD and TAG data to Tier-1s Transfer of AOD and TAG data to Tier-2s via Tier 1 Data and dataset registration in DB (add meta-data information to meta-data DB) Distributed production Full simulation chain run at Tier-2s (and Tier-1s) Data distribution to Tier-1s, other Tier-2s and CAF Reprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF Distributed analysis “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) Tests of performance of job submission, distribution and output retrieval
24
24 ATLAS SC4 Plans (1) Tier-0 data flow tests: Phase 0: 3-4 weeks in March-April for internal Tier-0 tests Phase 1: last 3 weeks of June with data distribution to Tier-1s Run integrated data flow tests using the SC4 infrastructure for data distribution Send AODs to (at least) a few Tier-2s Automatic operation for O(1 week) First version of shifter’s interface tools Treatment of error conditions Phase 2: 3-4 weeks in September-October Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data
25
25 ATLAS SC4 Plans (2) ATLAS includes continuous distributed simulation productions (Kaushik) SC4: distributed reprocessing tests: Test of the computing model using the SC4 data management infrastructure Needs file transfer capabilities between Tier-1s and back to CERN CAF Also distribution of conditions data to Tier-1s (3D) Storage management is also an issue Could use 3 weeks in July and 3 weeks in October SC4: distributed simulation intensive tests: Once reprocessing tests are OK, we can use the same infrastructure to implement our computing model for simulation productions As they would use the same setup both from our ProdSys and the SC4 side First separately, then concurrently.
26
26 Overview of requirements for SC4 SRM (“baseline version”) on all storages VO Box per Tier-1 and in Tier-0 LFC server per Tier-1 and in Tier-0 FTS server per Tier-1 and in Tier-0 Permanent Storage and Durable Storage. separate SRM entry points for permanent and durable storages. Disk space is managed by DQ2. Counts as online (“disk”) data in the ATLAS Computing Model Ability to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO Box Ability to deploy DQ2 services on VO Box as during SC3 No new requirements on the Tier-2s besides SRM SE
27
27 Overview of FTS and VO Box Hence, an ATLAS VO Box will contain: FTS ATLAS agents And remaining DQ2 persistent services (less s/w than for SC3 as some functionality merged into FTS in the form of FTS VO agents) DQ2 site services will have associated SFTs for testing
28
28 ATLAS SC4 Requirement (PPS) Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to test our distributed systems (ProdSys, DDM, DA) prior to deployment It would allow testing new m/w features without disturbing other operations We could also tune properly the operations on our side The aim is to get to the agreed scheduled time slots with an already tested system and really use the available time for relevant scaling tests This setup would not interfere with concurrent large-scale tests or data transfers run by other experiments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.