ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,

Slides:

Advertisements

Similar presentations

Volunteer Computing Laurence Field IT/SDC 21 November 2014.

Advertisements

P-GRADE and WS-PGRADE portals supporting desktop grids and clouds Peter Kacsuk MTA SZTAKI

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)

WebFTS as a first WLCG/HEP FIM pilot

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.

1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

1 port BOSS on Wenjing Wu (IHEP-CC)

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Grid Computing Oxana Smirnova NDGF- Lund University R-ECFA meeting in Sweden Uppsala, May 9, 2008.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

The Data Bridge Laurence Field IT/SDC 6 March 2015.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing.

David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,

PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.

2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

1 BOINC + CernVM Ben Segal / CERN (describing the work of many people at CERN and elsewhere ) Pre-GDB on Volunteer Computing CERN, November 11, 2014.

+ AliEn site services and monitoring Miguel Martinez Pedreira.

Status of WLCG FCPPL project Status of Beijing site Activities over last year Ongoing work and prospects for next year LANÇON Eric & CHEN Gang.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.

– Past, Present, Future Volunteer Computing at CERN Helge Meinhard, Nils Høimyr / CERN for the CERN BOINC service team H. Meinhard et al. - Volunteer.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Danila Oleynik (On behalf of ATLAS collaboration)

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.

Volunteer Clouds for the LHC experiments H. Riahi – 12/11/15 EGI User Forum Laurence Field Hassen Riahi CERN IT-SDC.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

PANDA PILOT FOR HPC Danila Oleynik (UTA). Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2.

Parrot and ATLAS Connect

Review of the WLCG experiments compute plans

Status of WLCG FCPPL project

Volunteer Computing for Science Gateways

Blueprint of Persistent Infrastructure as a Service

Dag Toppe Larsen UiB/CERN CERN,

Belle II Physics Analysis Center at TIFR

Dag Toppe Larsen UiB/CERN CERN,

Example: Rapid Atmospheric Modeling System, ColoState U

David Cameron ATLAS Site Jamboree, 20 Jan 2017

R.Mashinistov (UTA) July

WLCG Collaboration Workshop;

Grid Canada Testbed using HEP applications

Ivan Reid (Brunel University London/CMS)

Backfilling the Grid with Containerized BOINC in the ATLAS computing

Exploit the massive Volunteer Computing resource for HEP computation

Exploring Multi-Core on

Presentation transcript:

ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen, B. Kónya

ARC-CE: brief summary  Compute element by NorduGrid  A key component of ARC middleware –Other components: clients, information services  A key enabler of the Nordic Tier-1 –Along with the distributed dCache storage  Used by all LHC experiments –Mostly by ATLAS (so far) Increasing use by CMS –The only CE used in Nordic and Baltic countries –Main CE in Slovenia and Ukraine –Increasing use in the UK and Germany www.nordugrid.org/arc2

ARC-CE instances in GOCDB cf. CREAM-CE: 566 instances as of today

ARC-CE geography www.nordugrid.org/arc4 Data as of end-2013

Optimisation for data-intensive jobs ARC-CE can do all the data transfers Allows to cache frequently used files Minimizes bandwidth Maximizes worker nodes usage efficiency ARC-CE is a rather complex service Is built of many individual services and tools Requires high-end storage for cache www.nordugrid.org/arc5 Linux Cluster Worker Node Linux Cluster Worker Node Storage ARC-CE cache Other CE Worker Node

ARC-CE components on a (SLURM) cluster www.nordugrid.org/arc6 Head Node SLURM controller DTR ARISJURA infoproviders CA certificates users Control directory user mapping registration process gridftp/http A-REX files processes Legend (ARC components in orange):

Integration with EGI services www.nordugrid.org/arc

Control tower for pilots: aCT  ARC Control Tower is the layer between ATLAS and ARC –Picks up job descriptions from Panda –Converts them to XRSL job description –Submits and manages jobs on ARC-CEs –Fetches output, handles common failures and updates Panda  aCT2: based on ARC API  Modular –ARC actors: submitter, status checker, fetcher, cleaner –ATLAS actors: autopilot, panda2arc, atlas status checker, validator www.nordugrid.org/arc8

aCT2  Runs on two 24 CPU, 50GB RAM servers at CERN –Usually one for production, one for testing –Goal: 1 Mjobs/day www.nordugrid.org/arc9  ATLAS part can be replaced by another workflow manager –Implemented in Python

Gateways to supercomputers  ATLAS is pursuing usage of HPC resources –Long-term funding guaranteed –Not suitable for all kinds of jobs still  In Europe, ARC is often used as a “gateway” –C2PAP/SuperMUC, Hydra: via aCT –Piz Daint: via ARC-CE ssh back-end (testing)  Still, a lot remains to be done on both HEP and HPC sides to make it useable www.nordugrid.org/arc10

HPC challenges to overcome  WAN access on nodes is limited or not available  Whole-node, whole-socket, whole-partition scheduling  Shared filesystem might suffer with heavy I/O jobs  Limited access to login/edge nodes  HPC policies and procedures –Sites are tuned for few classes of massively parallel applications with relatively low I/O –Limited time slots allocated –Access via a SSH-login front node www.nordugrid.org/arc11

Overcoming the difficulties  Custom ARC-CE service machines: –Using RHEL6 packages on unsupported OS –Porting to unsupported OS (SLES11)  User mode services: –Adapting ARC-CE to run as a non-privileged user –Limited usability (uid mapped to one batch account)  No WAN access: –aCT, manual software sync, Parrot, no db access Experienced GPFS lock-up due to heavy software area usage (bug fixed by IBM)  Limited connectivity to edge nodes: –ARC SSH backend in the works www.nordugrid.org/arc12

BOINC via ARC  The scale of volunteer computing: –At Amazon EC2 price: 1.16USD/hour for a CPU intensive Windows/Linux instance (~1.5GFLOPS) All BOINC activities would cost ~ (7.2PetaFLOPS/1.5GFLOPS)*1.16USD= =5.56M USD/hour –Successful Individual projects: (280 TeraFLOPS), (581 TeraFLOPS) (2TeraFLOPS) Slide by Wenjing Wu www.nordugrid.org/arc13

 Why use volunteer computing? –It’s free! –Public outreach  Considerations –Low priority jobs with high CPU-I/O ratio Non-urgent Monte Carlo simulation –Need virtualisation for ATLAS sw environment CERNVM image and CVMFS –No grid credentials or access on volunteer hosts ARC middleware for data staging –The resources should look like a regular Panda queue ARC Control Tower www.nordugrid.org/arc14

Architecture ARC Control Tower Panda Server ARC CE Session Directory BOINC LRMS Plugin Boinc server Volunteer PC Boinc Client VM Shared Directory Grid Catalogs and Storage DB proxy cert www.nordugrid.org/arc15

Participants No idea who they are! www.nordugrid.org/arc16

Summary and outlook  ARC-CE is well established beyond Nordics now –More contributors, too  aCT2 opens up many more possibilities, such as usage of HPC and volunteer computing –Very new, much optimisation is still needed –… and also documentation and packaging  Future directions: –Focus on LHC requirements –Enhance support of HPC systems, more batch system options –Develop more user-friendly task schedulers, using aCT2 experience www.nordugrid.org/arc17