Presentation is loading. Please wait.

Presentation is loading. Please wait.

USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC dCache door is resolved to different IP addresses depending.

Similar presentations


Presentation on theme: "USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC dCache door is resolved to different IP addresses depending."— Presentation transcript:

1 USATLAS SC4

2 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC dCache door is resolved to different IP addresses depending on which DNS is inquired.

3 3

4 4

5 5 Meeting Notes  Use Dual-home dCache doors  The external interface of doors are in 192.12.15.0  The internal interface of the doors are in 130.199.185.0.  The data flow (in/out) will always go through doors.  Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal IP address, determined by which DNS is used.  Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7.  Request ACL for VLAN 315(?) which 192.12.15.0 reside.  One end: LHC OPN address blocks or 3+2 Tier 2s.  The other end will be 192.12.15.0.  What about other T3 sites to contact with the external interface of dCache doors?  Need to go through firewall or not?  Two types of storage (Durable and Permanent)  When we received ESD2, the ESD1 will be discarded. Therefore, we do not need to save ESD to HPSS. We need them, we can get from other Tier 0 and Tier 1 sites.  RAW, our fraction of ESD, AOD, Tier 2 simulation results => Permanent which has tape backend.  Other ESD, AOD will go to durable storage which is not necessarily backed up by tape system.

6 6 BNL SC4 Plans  VLAN 315 can send network traffic?  FTS and LFC will be setup.  LCG 2.7.0  VObox: We also installed ATLAS DQ2 installed on top of it (done)  BDII provide static and dynamic monitoring information (STATIC Setup?)  R-GMA provide traffic monitoring from Tier 1 to Tier 2. (Plan to make it available before SC4 Service Phase)  CE is based on BNL condor system (Plan to be ready before SC4 service phase June)  Lcg-utils (done) dCache Preparation (Durable, Permanent, Information Publish).  Permanent  System manages cache, tape copy, Access sometimes slow  Durable  User (VO) manages cache, WITHOUT tape copy, Access fast

7 7 Publish Information for BNL dCache  List of transfer protocols per SE available from information system  SRM knows what it supports, can inform client  FTS Channel Information.  LFC Information. dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/durable-path/dteam GlueSAPath: /pnfs/my_domain/durable-path/dteam GlueSAType: durable [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain [...] dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain,... [...] GlueSARoot: dteam:/pnfs/my_domain/permanent-path/dteam GlueSAPath: /pnfs/my_domain/permanent-path/dteam GlueSAType: permanent [...] GlueChunkKey: GlueSEUniqueID=dcache.my_domain

8 8 SC4 Pre-Production System  Pre-production service will be used as soon as it is available and its usage won't go away when SC4 starts. There may be periods where the pre-production service is not extensively used, but the goal is from now on to always develop against the pre-production service.

9 9 SC4 April Throughput  Need dCache!!!  April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an average daily rate to each Tier1 at or above the full nominal rate (200MB/Second).  We should continue to run at the same rates unattended over Easter weekend (14 - 16 April).  Tuesday April 18th - Monday April 24th we should perform the tape tests at the rates in the table below (75 MB/second).  From after the con-call on Monday April 24th until the end of the month experiment-driven transfers can be scheduled. (LFC will be needed by then for DQ2).

10 10 SC4 Tier 1 to Tier 1 Data Transfer (May)   Within Each VO, the details of the T1 T1 transfers still need to be finalized. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows:   We have to focus on our two sister Tier 1 site: IN2P3 and FZK first.   All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s.   dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s   These tests would take place during May, after the April throughput tests and before the SC4 service begins in June.

11 11 ATLAS Specific Plan  Plans (ATLAS)  Tier 2 Plans  Tier 2 Workshop  Background Information (Darios Slides)

12 12 Summary of requests from ATLAS  March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)  April-May (pre-SC4): tests of distributed operations on a “small” testbed (PPS)  Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s ( Send AODs to (at least) a few Tier-2s  Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Tier-1s (720MB/s + full ESD to BNL), and Send AODs to (at least) a few Tier-2s  3 weeks in July: distributed processing tests (Part 1)  2 weeks in July-August: distributed analysis tests (Part 1)  3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with data to Tier-2s  3 weeks in October: distributed processing tests (Part 2)  3-4 weeks in November: distributed analysis tests (Part 2)

13 13 Tier 2 Plans  Details of involving Tier 2 are in planning too.  Tier 2 dCache: dCache needs to be stabilize and operational in one or all sites at Midwest, southwest and Northwest ( first week of June) for receiving AODs to (at least) a few Tier-2s.  All Tier 2 dCache should be up and in production in September  Extend data distribution to all (most) Tier-2s  Use 3D tools to distribute calibration data  Base line client tools should be deployed at Tier 2 centers.  No any other services required for Tier 2 except SRM and DQ2.

14 14 WLCG Tier 2 Workshop  https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials  http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_m eeting&showDate=all&showSession=all&detailLevel=contribution  from Monday 12 June 2006 (11:00) to Wednesday 14 June 2006 (18:00) at CERN ( Council Chamber )Council Chamber  Four Experiment Activities Introduction.  MC Simulation User Cases  An Overview of Calibration & Alignment  Analysis Use Cases  Services Required at / for Tier2s (Grid, Application).  Support and Operation Issues.  Happen in the middle of June.

15 ATLAS plans for 2006: Computing System Commissioning and Service Challenge 4 Dario Barberis CERN & Genoa University

16 16 Computing System Commissioning Goals  Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007:  Calibration and alignment procedures and conditions DB  Full trigger chain  Event reconstruction and data distribution  Distributed access to the data for analysis  At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates

17 17 ATLAS Computing Model  Tier-0:  Copy RAW data to Castor tape for archival  Copy RAW data to Tier-1s for storage and reprocessing  Run first-pass calibration/alignment (within 24 hrs)  Run first-pass reconstruction (within 48 hrs)  Distribute reconstruction output (ESDs, AODs & TAGS) to Tier-1s  Tier-1s:  Store and take care of a fraction of RAW data  Run “slow” calibration/alignment procedures  Rerun reconstruction with better calib/align and/or algorithms  Distribute reconstruction output to Tier-2s  Keep current versions of ESDs and AODs on disk for analysis  Tier-2s:  Run simulation  Keep current versions of AODs on disk for analysis

18 18 ATLAS Tier-0 Data Flow EF CPU farm T1 T1s Castor buffer RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s Tape

19 19 Recent Update for Tier 0 Tier 1 Data Transfer

20 20 BNL Data Flow (2008 Based on 20%) Tier-0 CPU farm T1 Other Tier-1s BNL disk buffer RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day AODm2 500 MB/file 0.004 Hz 0.34K f/day 4 MB/s 0.32 TB/day RAW ESD2 AODm2 0.088 Hz 7.48K f/day 88 MB/s 7.32 TB/day T1 Other Tier-1s T1 Tier-2s BNL Tape RAW 1.6 GB/file 0.04 Hz 3.4K f/day 64 MB/s 5.4 TB/day disk storage AODm2 500 MB/file 0.008 Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AOD2 10 MB/file 0.4 Hz 34K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 80 MB/s 0.8 TB/day AODm2 500 MB/file 0.03 Hz 3.0K f/day 16 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 4*9 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s*3 1.6 TB/day AODm2 500 MB/file 0.008 Hz 0.70 f/day 4 MB/s*3 0.32 TB/day Plus simulation & analysis data flow Real data storage, reprocessing and distribution 234MB*n analysis

21 21 BNL to 3+2 Tier2 (Estimation!)  See https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow  Tier 1 to Tier 2 likely to be very bursty and driven by analysis demands Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of 10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.).  Desire to reach 100MBs for each of 3+2 Tier 2 clusters.  300MB/second ~ 500MB/second in total to BNL.  Tier 2 to Tier 1 transfer are almost entirely continuous simulation transfers  The aggregate input rate to Tier 1 center is comparable to 20%~25% of the rate from tier 0.

22 22 Tier-0 Tier-1 BNL Write buffer T1 Tier-2s BNL Tape Read storage AODm2 500 MB/file 0.008 Hz 0.68K f/day 4 MB/s 0.32 TB/day ESD2 0.5 GB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day ESD1 100MB AODM1 20MB RAW 64MB ESD2 80MB (80% EST from T1s) AODM2 16MB ESD2 20MB AODM2 36MB CPU farm BNL Data Flow (2008) 88MB (RAW, ESD, AOD) 350MB (including raw data) (Analysis AOD)500MB 200MB(?) (304MB*20%)~60MB, Simu 60MB (Tier 2)

23 23 ATLAS SC4 Tests  Complete Tier-0 test  Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm  Calibration loop and handling of conditions data  Including distribution of conditions data to Tier-1s (and Tier-2s)  Transfer of RAW, ESD, AOD and TAG data to Tier-1s  Transfer of AOD and TAG data to Tier-2s via Tier 1  Data and dataset registration in DB (add meta-data information to meta-data DB)  Distributed production  Full simulation chain run at Tier-2s (and Tier-1s)  Data distribution to Tier-1s, other Tier-2s and CAF  Reprocessing raw data at Tier-1s  Data distribution to other Tier-1s, Tier-2s and CAF  Distributed analysis  “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly)  Tests of performance of job submission, distribution and output retrieval

24 24 ATLAS SC4 Plans (1)  Tier-0 data flow tests:  Phase 0: 3-4 weeks in March-April for internal Tier-0 tests  Phase 1: last 3 weeks of June with data distribution to Tier-1s  Run integrated data flow tests using the SC4 infrastructure for data distribution  Send AODs to (at least) a few Tier-2s  Automatic operation for O(1 week)  First version of shifter’s interface tools  Treatment of error conditions  Phase 2: 3-4 weeks in September-October  Extend data distribution to all (most) Tier-2s  Use 3D tools to distribute calibration data

25 25 ATLAS SC4 Plans (2)  ATLAS includes continuous distributed simulation productions (Kaushik)  SC4: distributed reprocessing tests:  Test of the computing model using the SC4 data management infrastructure  Needs file transfer capabilities between Tier-1s and back to CERN CAF  Also distribution of conditions data to Tier-1s (3D)  Storage management is also an issue  Could use 3 weeks in July and 3 weeks in October  SC4: distributed simulation intensive tests:  Once reprocessing tests are OK, we can use the same infrastructure to implement our computing model for simulation productions  As they would use the same setup both from our ProdSys and the SC4 side  First separately, then concurrently.

26 26 Overview of requirements for SC4  SRM (“baseline version”) on all storages  VO Box per Tier-1 and in Tier-0  LFC server per Tier-1 and in Tier-0  FTS server per Tier-1 and in Tier-0  Permanent Storage and Durable Storage.  separate SRM entry points for permanent and durable storages.  Disk space is managed by DQ2.  Counts as online (“disk”) data in the ATLAS Computing Model  Ability to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO Box  Ability to deploy DQ2 services on VO Box as during SC3  No new requirements on the Tier-2s besides SRM SE

27 27 Overview of FTS and VO Box  Hence, an ATLAS VO Box will contain:  FTS ATLAS agents  And remaining DQ2 persistent services (less s/w than for SC3 as some functionality merged into FTS in the form of FTS VO agents)  DQ2 site services will have associated SFTs for testing

28 28 ATLAS SC4 Requirement (PPS)  Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to test our distributed systems (ProdSys, DDM, DA) prior to deployment  It would allow testing new m/w features without disturbing other operations  We could also tune properly the operations on our side  The aim is to get to the agreed scheduled time slots with an already tested system and really use the available time for relevant scaling tests  This setup would not interfere with concurrent large-scale tests or data transfers run by other experiments


Download ppt "USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC dCache door is resolved to different IP addresses depending."

Similar presentations


Ads by Google