CMS Experience with Indigo DataCloud

Slides:



Advertisements
Similar presentations
Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
Advertisements

Workload Management Workpackage Massimo Sgaravatto INFN Padova.
The Prototype Laurence Field IT/SDC 11 November 2014.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
EGI Technical Forum Madrid COMPSs in the EGI Federated Cloud Daniele Lezzi – BSC EGI Technical Forum Madrid.
INDIGO – DataCloud WP5 introduction INFN-Bari CYFRONET RIA
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Overview of the global architecture Giacinto DONVITO INFN-Bari.
INDIGO – DataCloud Security and Authorization in WP5 INFN RIA
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
GridPP37, Ambleside Adrian Coveney (STFC)
Bob Jones EGEE Technical Director
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
Workload Management Workpackage
Running LHC jobs using Kubernetes
DPM at ATLAS sites and testbeds in Italy
Utilizzo di INDIGO in CMS e AMS
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
The First INDIGO-DataCloud Software Release
Elastic Computing Resource Management Based on HTCondor
Virtualization and Clouds ATLAS position
StoRM: a SRM solution for disk based storage systems
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Unified Data Access and MGMT. in Distributed hybrid Cloud
HTCondor at RAL: An Update
The PaaS Layer in the INDIGO-DataCloud
Overview of the global architecture
Design rationale and status of the org.glite.overlay component
Population Imaging Use Case - EuroBioImaging
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ATLAS Cloud Operations
Outline Benchmarking in ATLAS Performance scaling
Identity Federations - Overview
StratusLab Final Periodic Review
StratusLab Final Periodic Review
How to enable computing
Processing of Images: Orchestrating an Elastic Cloud (
EGI-Engage Engaging the EGI Community towards an Open Science Commons
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
An easier path? Customizing a “Global Solution”
WLCG Collaboration Workshop;
Management of Virtual Execution Environments 3 June 2008
Leigh Grundhoefer Indiana University
Case Study: Algae Bloom in a Water Reservoir
Orchestration & Container Management in EGI FedCloud
Wide Area Workload Management Work Package DATAGRID project
Cloud Computing: Concepts
WP3: BPaaS Research Execution Environment
Job Submission Via File Transfer
Presentation transcript:

CMS Experience with Indigo DataCloud Future challenges for HEP Computing CNAF 24.10.2016 Sonia Taneja; Stefano dal Pra: INFN-CNAF Marica Antonacci; Giacinto Donvito: INFN-BA Tommaso Boccali: INFN-PI Daniele Spiga: INFN-PG

Daniele Spiga - Future challenges in HEP Computing Outline Brief introduction to INDIGO-DataCloud (Slides from Davide Salomoni and Giacinto Donvito) The CMS Experience using INDIGO solutions Motivations and objectives Architecture Status Next steps 24/10/16 Daniele Spiga - Future challenges in HEP Computing

The Project: INDIGO-DataCloud 24/10/16 Daniele Spiga - Future challenges in HEP Computing

INDIGO“simplified” architecture 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Status of the INDIGO Project 24/10/16 Daniele Spiga - Future challenges in HEP Computing

INDIGO: The building blocks 24/10/16 Daniele Spiga - Future challenges in HEP Computing

The CMS experience using INDIGO… …or how we use (some) INDIGO developed solutions to integrate a well consolidated (and complex) computing model into modern technologies Our challenge is to develop a solution for generating an on-demand, container based cluster to run CMS analysis workflow in order to allow: The effective use of opportunistic resources, like general purposes campus facilities. The dynamic extension of an already existing dedicated facility. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

HTCondor@CMS CMS Workload Management System completely relies on HTCondor CMS runs three main HTCondor pools ITB for integration/testing purposes Tier0 for all T0 activities Global Pool (all the rest) Analysis & Production CMS has a model pilots driven Pilots either generated by GlideinWMS or via autostart mechanisms End users are provided with a friendly interface called CRAB to submit analysis jobs to the HTCondor queue CRAB pack the user local analysis environment (configs, libs etc) CRAB submit actual job to HTCondor 24/10/16 Daniele Spiga - Future challenges in HEP Computing

HTCondor: a central role Putting everything together our objective is to demonstrate the ability to dynamically add opportunistic resources to an already existing HTCondor Pool. Moreover we aim to provide a friendly and secure solution CMS represents a perfect case study to test and to evaluate our solution. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Expected impacts By simplifying and automating the process of creating, managing and accessing a pool of computing resources the project aims to improve: Sites management: A simple solution for dynamic/elastic T2 extensions on “opportunistic”/stable resources A friendly procedure to dynamically instantiate a spot ‘ Tier3 like resource center’ Users experience: Generation of a ephemeral T3 on demand seen by the Experiment computing infrastructure as personal WLCG-type facility, in order to serve a group of collaborators. The system must allow the use of standard/regular CMS Tools such as CRAB. Experiment-Collaboration resources: A comprehensive approach to opportunistic computing. A solution to access and orchestrate multiple e.g. campus centers, harvesting all the free CPU cycles without major deployment effort. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Four Pillars Cluster Management: Mesos clusters as a solution in order to execute docker for all the services required by a regular CMS site (Worker Node, HTCondor Schedd and squids). Marathon guarantees us the dynamic scaling up and down of resources, a key point. AuthN/Z & Credential Management: INDIGO Identity Access Management (IAM) service is responsible for AuthN/Z to the cluster generation. Token Translation Service (TTS) enables the conversion of IAM tokens in to a X.509 certificates NOTE: This allow Mesos slaves (running HTCondor_startd daemon) to join CMS central queue (HTCondor_schedd) as a regular Grid WN Data Management: Dynafed & FTS is the approach currently followed by the project. We will investigate Oneclient (from Onedata) as a tool allowing to mount remote Posix file-system. Automation: TOSCA templates, meant to be managed by INDIGO PaaS Orchestrator, allow the automation of the overall setup. The aim is to produce a single YAML file describing the setup of all required services and deps. CLUES is able to scale the Mesos cluster as needed by the load of the users jobs. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Architecture USER Schedd (CMS central or private) CRAB CFG User analysis job description pointing to SITENAME VM#1 Squid1 VM#2 WN1 VM#3 WN2 VM#4 WN3 Cloud#1 Mesos cluster SITENAME #/type of services SQUIDs Schedd if needed WNs (range desired) Onedata / Dynafed attached Storage TFC rules Fallback strategy Temp storage to be used Cloud#2 Data Management Data as defined in TFC (Onedata, Dynafed, Xrootd Fed) PaaS Orchestrator TTS Provides Submits Configures Instantiates Translates Reads Joins IAM Retrieves 24/10/16 Daniele Spiga - Future challenges in HEP Computing

The prototype... It is working Marathon Apps Monitoring The Mesos Cluster generation has been fully automated (not yet TOSCA) We added CMS specifications to the HEAT (+Ansible) INDIGO template for a Mesos cluster: Squid proxy (docker), CVMFS setup, WN (docker), proxy manager service (docker) Currently from user perspectives generating such a cluster means: Where env_heat allow to specify few parameters such as the size of the cluster. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

A real test with CRAB jobs Excerpt crab config Regular CRAB jobs successfully run on the generated Mesos cluster condor_q CMS Dashboard: job monitoring 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing … Inside Mesos Cluster Worker nodes (docker) are running And executing CMS Analysis Payload 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing About AuthN/Z The Mesos cluster generation requires a valid IAM access token (Openid Connect) All the rest is handled transparently to the user thanks to: Delegation of credential among services Credential Renewal (through token exchange feature) Token Translation Service Creates credential for services not integrating OIDC We generate our cluster passing IAM token as incoming credential, and our WN (startd) authN with Schedd using the following proxy: 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Data Management So far we rely on the storage solution already adopted by CMS Computing model: Job input data is accessed through xrootd data federation Job produced output go through “gfal-cp” As next step: The plan for the second phase of this project foresees the exploitation of INDIGO data management solutions Onedata, Dynafed 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Upcoming challenges Moving from HEAT to TOSCA usage in order to exploit INDIGO PaaS Orchestration Dynamic auto scaling of cluster E.g. integrating CLUES To include a schedd among the services running on the Mesos cluster Allows to test additional setups (Flocking configuration) Represents a step towards a more general solution Multy IaaS interoperability Test the proposed solution making a effective use of opportunistic and possibly public providers 24/10/16 Daniele Spiga - Future challenges in HEP Computing

Daniele Spiga - Future challenges in HEP Computing Conclusion We developed a prototype for generating a “T3 ephemeral site as a service” We enabled the complete automation of the cluster generation We integrated the INDIGO credential harmonization We successfully executing CMS analysis jobs Although the presented prototype is focused on CMS workflows, the adopted approach can be easily generalized e.g. to whoever can integrate HTCondor 24/10/16 Daniele Spiga - Future challenges in HEP Computing