Presentation is loading. Please wait.

Presentation is loading. Please wait.

ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,

Similar presentations


Presentation on theme: "ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,"— Presentation transcript:

1 ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen, B. Kónya

2 ARC-CE: brief summary  Compute element by NorduGrid  A key component of ARC middleware –Other components: clients, information services  A key enabler of the Nordic Tier-1 –Along with the distributed dCache storage  Used by all LHC experiments –Mostly by ATLAS (so far) Increasing use by CMS –The only CE used in Nordic and Baltic countries –Main CE in Slovenia and Ukraine –Increasing use in the UK and Germany 2014-07-01www.nordugrid.org/arc2

3 ARC-CE instances in GOCDB 2014-07-01 3 www.nordugrid.org/arc cf. CREAM-CE: 566 instances as of today

4 ARC-CE geography 2014-07-01www.nordugrid.org/arc4 Data as of end-2013

5 Optimisation for data-intensive jobs ARC-CE can do all the data transfers Allows to cache frequently used files Minimizes bandwidth Maximizes worker nodes usage efficiency ARC-CE is a rather complex service Is built of many individual services and tools Requires high-end storage for cache 2014-07-01www.nordugrid.org/arc5 Linux Cluster Worker Node Linux Cluster Worker Node Storage ARC-CE cache Other CE Worker Node

6 ARC-CE components on a (SLURM) cluster 2014-07-01www.nordugrid.org/arc6 Head Node SLURM controller DTR ARISJURA infoproviders CA certificates users Control directory user mapping registration process gridftp/http A-REX files processes Legend (ARC components in orange):

7 Integration with EGI services 2014-07-017www.nordugrid.org/arc

8 Control tower for pilots: aCT  ARC Control Tower is the layer between ATLAS and ARC –Picks up job descriptions from Panda –Converts them to XRSL job description –Submits and manages jobs on ARC-CEs –Fetches output, handles common failures and updates Panda  aCT2: based on ARC API  Modular –ARC actors: submitter, status checker, fetcher, cleaner –ATLAS actors: autopilot, panda2arc, atlas status checker, validator 2014-07-01www.nordugrid.org/arc8

9 aCT2  Runs on two 24 CPU, 50GB RAM servers at CERN –Usually one for production, one for testing –Goal: 1 Mjobs/day 2014-07-01www.nordugrid.org/arc9  ATLAS part can be replaced by another workflow manager –Implemented in Python

10 Gateways to supercomputers  ATLAS is pursuing usage of HPC resources –Long-term funding guaranteed –Not suitable for all kinds of jobs still  In Europe, ARC is often used as a “gateway” –C2PAP/SuperMUC, Hydra: via aCT –Piz Daint: via ARC-CE ssh back-end (testing)  Still, a lot remains to be done on both HEP and HPC sides to make it useable 2014-07-01www.nordugrid.org/arc10

11 HPC challenges to overcome  WAN access on nodes is limited or not available  Whole-node, whole-socket, whole-partition scheduling  Shared filesystem might suffer with heavy I/O jobs  Limited access to login/edge nodes  HPC policies and procedures –Sites are tuned for few classes of massively parallel applications with relatively low I/O –Limited time slots allocated –Access via a SSH-login front node 2014-07-01www.nordugrid.org/arc11

12 Overcoming the difficulties  Custom ARC-CE service machines: –Using RHEL6 packages on unsupported OS –Porting to unsupported OS (SLES11)  User mode services: –Adapting ARC-CE to run as a non-privileged user –Limited usability (uid mapped to one batch account)  No WAN access: –aCT, manual software sync, Parrot, no db access Experienced GPFS lock-up due to heavy software area usage (bug fixed by IBM)  Limited connectivity to edge nodes: –ARC SSH backend in the works 2014-07-01www.nordugrid.org/arc12

13 ATLAS@Home: BOINC via ARC  The scale of volunteer computing: –At Amazon EC2 price: 1.16USD/hour for a CPU intensive Windows/Linux instance (~1.5GFLOPS) All BOINC activities would cost ~ (7.2PetaFLOPS/1.5GFLOPS)*1.16USD= =5.56M USD/hour –Successful Individual projects: Einstein@home (280 TeraFLOPS), SETI@home (581 TeraFLOPS) LHC@home (2TeraFLOPS) Slide by Wenjing Wu 2014-07-01www.nordugrid.org/arc13

14 ATLAS@Home  Why use volunteer computing? –It’s free! –Public outreach  Considerations –Low priority jobs with high CPU-I/O ratio Non-urgent Monte Carlo simulation –Need virtualisation for ATLAS sw environment CERNVM image and CVMFS –No grid credentials or access on volunteer hosts ARC middleware for data staging –The resources should look like a regular Panda queue ARC Control Tower 2014-07-01www.nordugrid.org/arc14

15 ATLAS@Home Architecture ARC Control Tower Panda Server ARC CE Session Directory BOINC LRMS Plugin Boinc server Volunteer PC Boinc Client VM Shared Directory Grid Catalogs and Storage DB proxy cert http://atlasathome.cern.ch 2014-07-01www.nordugrid.org/arc15

16 Participants No idea who they are! 2014-07-01www.nordugrid.org/arc16

17 Summary and outlook  ARC-CE is well established beyond Nordics now –More contributors, too  aCT2 opens up many more possibilities, such as usage of HPC and volunteer computing –Very new, much optimisation is still needed –… and also documentation and packaging  Future directions: –Focus on LHC requirements –Enhance support of HPC systems, more batch system options –Develop more user-friendly task schedulers, using aCT2 experience 2014-07-01www.nordugrid.org/arc17


Download ppt "ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,"

Similar presentations


Ads by Google