Running LHC jobs using Kubernetes

Slides:

Advertisements

Similar presentations

HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Advertisements

Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Grid job submission using HTCondor Andrew Lahiff.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

Workload management, virtualisation, clouds & multicore Andrew Lahiff.

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

HTCondor Private Cloud Integration Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.

Building Cloud Solutions Presenter Name Position or role Microsoft Azure.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.

DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.

UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

PaaS services for Computing and Storage

CMS Experience with Indigo DataCloud

Virtualisation & Containers (A RAL Perspective)

Dynamic Extension of the INFN Tier-1 on external resources

WLCG IPv6 deployment strategy

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Container Orchestration - Simplifying Use of Public Clouds

Containers and other technology at the Tier-1

Use of HLT farm and Clouds in ALICE

Architectural Overview Of Cloud Computing

AWS Integration in Distributed Computing

Virtualization and Clouds ATLAS position

Virtualisation for NA49/NA61

Dag Toppe Larsen UiB/CERN CERN,

Belle II Physics Analysis Center at TIFR

HTCondor at RAL: An Update

What is Virtualization Last Update

Dag Toppe Larsen UiB/CERN CERN,

Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")

ATLAS Cloud Operations

WLCG Manchester Report

Workload Management System

Agenda Who am I? Whirlwind introduction to the Cloud

Provisioning 160,000 cores with HEPCloud at SC17

WLCG experiments FedCloud through VAC/VCycle in the EGI

In-Depth Introduction to Docker

DIRAC services.

Moving from CREAM CE to ARC CE

Virtualisation for NA49/NA61

Logo here Module 3 Microsoft Azure Web App. Logo here Module Overview Introduction to App Service Overview of Web Apps Hosting Web Applications in Azure.

Cloud Computing Platform as a Service

CREAM-CE/HTCondor site

AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.

Discussions on group meeting

WLCG Collaboration Workshop;

AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.

OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.

Kubernetes Container Orchestration

Dev Test on Windows Azure Solution in a Box

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

AWS Cloud Computing Masaki.

Orchestration & Container Management in EGI FedCloud

Container cluster management solutions

Technical Capabilities

Ivan Reid (Brunel University London/CMS)

Cloud Computing: Concepts

OpenShift as a cloud for Data Science

Azure Container Service

Exploit the massive Volunteer Computing resource for HEP computation

Deploying machine learning models at scale

Containers on Azure Peter Lasne Sr. Software Development Engineer

Presentation transcript:

Running LHC jobs using Kubernetes Andrew Lahiff, STFC Rutherford Appleton Laboratory International Symposium on Grids & Clouds 2017

Overview Introduction Kubernetes Kubernetes on public clouds Running jobs from LHC experiments Summary 2

Introduction This work began as a Research Councils UK Cloud Working Group Pilot Project Aim to investigate portability between on-premises resources & public clouds portability between multiple public clouds Using particle physics workflows as an example initially using the CMS experiment now also looking ATLAS & LHCb 3

Introduction It can be difficult to transparently use on-premises resources public clouds It’s even more difficult to use multiple public clouds different APIs provide difference services with different APIs auto scalers, load balancers, storage, ... In HEP a lot of effort has gone into developing software to interact with the APIs from different cloud providers Question: Is there a simpler way? Why can we just instantly use any cloud? 4

Kubernetes Kubernetes is an open-source container cluster manager Pod originally developed by Google, donated to the Cloud Native Computing Foundation schedules & deploys containers onto a cluster of machines e.g. ensure that a specified number of instances of an application are running provides service discovery, distribution of configuration & secrets, ... provides access to persistent storage Pod smallest deployable unit of compute consists of one or more containers that are always co-located, co-scheduled & run in a shared context 5

Kubernetes Architecture etcd API server: provides a REST API kubelet: runs pods scheduler: assigns pods to nodes kube-proxy: TCP/UDP forwarding controller manager: runs control loops etcd: key-value store etcd kubelet API server kube-proxy scheduler controller manager Master Nodes 6

Why Kubernetes? It can be run anywhere on-premises bare metal, OpenStack, ... public clouds Google, Azure, AWS, ... Aim is to use Kubernetes as an abstraction layer migrate to containerised applications managed by Kubernetes & use only the Kubernetes API can then run out-of-the-box on any Kubernetes cluster Avoid vendor lock-in as much as possible by not using any vendor specific APIs or services except where Kubernetes provides an abstraction e.g. storage, load balancers 7

Kubernetes on public clouds Availability on the major cloud providers: There are a variety of CLI tools available automate deployment of Kubernetes enable production-quality Kubernetes clusters on AWS & other clouds to be deployed quickly & easily including node auto-scaling Native support Node auto-scaling Google ✔ Azure ✗ AWS 8

Kubernetes on public clouds Third-parties can also provide Kubernetes on public clouds e.g. StackPointCloud supports Google, Azure, AWS & DigitalOcean provide features such as node auto-scaling 9

How LHC jobs are run All 4 LHC experiments use pilot frameworks to run jobs What does this mean? pilot factory submits pilot jobs to sites pilot jobs connect to the experiment’s job queue & pull down then execute the actual payload job ... job creator job queue pilot job payload job pilot factory submit pilot job experiment workload management system grid site 10

Previous work on clouds Much work by many groups over the past 5-6 years both private (e.g. OpenStack) & public clouds (mainly AWS, more recently Azure, Google) generally all involve variations on methods for instantiating VMs on demand the VMs either: are virtual worker nodes in a HTCondor batch system directly run the experiment’s pilot 11

Issues with this approach Usually involves software which needs to talk to each cloud API in order to provision VMs e.g. the CMS experiment uses HTCondor, which can provision VMs on clouds supporting the EC2 or Google API adding ability to use Azure would require modifications to both CMS software & third-party software (HTCondor) Some projects have used cloud-provider specific services (sometimes many services) e.g. one HEP tool uses 10 AWS services this is a problem if you want or need to use resources from a different provider as well Using Kubernetes as an abstraction layer avoids these problems 12

Running LHC jobs Some ways LHC jobs could be run on Kubernetes: each experiment’s workload mangement system could talk to the Kubernetes API not possible for a site to do on its own, requires effort from experiments modify a grid Computing Element (CE) to support Kubernetes as an alternative batch system use the “vacuum” model directly run the experiment’s pilot agent scale up if there is work, scale down when there is no work Using the vacuum model is the simplest method we’re also looking at running a HTCondor pool on Kubernetes 13

Container image We need to run the pilot agent in a container with appropriate software installed (like an SL6 grid worker node) access to the CernVM File System (CVMFS) Need to do this in a way without requiring any modifications to the host where the container will run e.g. installing CVMFS on the host breaks portability Currently the only option is to run a privileged container which both provides CVMFS runs the pilot agent as an unprivileged user container exits when the pilot exits 14

Resources used for testing On-premises resources & public clouds RAL Kubernetes installed using kubeadm bare metal nodes Azure Azure Container Service (initially used a CLI from Weaveworks) Google Google Container Engine Amazon Used StackPointCloud to provision Kubernetes 15

horizontal pod autoscaler A simple test on Azure HTCondor pool + squid caches running in containers on Kubernetes manually create & submit CMS MC production jobs to the HTCondor schedd replication controllers managing negotiator, collector, schedd, worker nodes, squids horizontal pod autoscaler for the worker nodes scaling based on CPU usage HTCondor collector HTCondor negotiator HTCondor schedd HTCondor worker nodes squids squids horizontal pod autoscaler 16

A simple test on Azure 17 Cluster resource limits & usage CMS MC production jobs submitted - number of pods increases Jobs killed - number of pods decreases Cluster resource limits & usage Example squid container resource limits & usage 17 Used https://github.com/kubernetes/heapster/blob/master/docs/influxdb.md

Improvements Issues Auto-scaling based on CPU usage is of course not perfect CPU usage can vary during the life of a pilot (or job) Kubernetes won’t necessarily kill ‘idle’ pods when scaling down quite likely that running jobs will be killed this is definitely a problem for real jobs confirmed this does happen for ATLAS & LHCb CPU usage of an ATLAS pilot 18

Improvements Instead of using a replication controller to manage the worker nodes (or pilots), do something else create a custom ‘controller’ Python script which can create pods as needed runs in a container in the cluster, uses the Kubernetes REST API don’t kill running pods, lets them finish on their own two versions: worker pool: maintains a specified number of running pods vacuum: scales up & down depending on how much work there is could also monitor a HTCondor pool & create pods if there are idle jobs 19

Credentials Need to provide a X509 proxy to authenticate to the experiment’s pilot framework store a host certificate & key as a Kubernetes Secret cron which runs a container which creates a short-lived proxy from the host certificate updates a Secret with this proxy Containers running jobs proxy made available as a file new containers will have the updated proxy the proxy is updated in already running containers 20

Running LHC jobs To summarize, we can start with a single yaml file a host certificate & key Run a single command (kubectl create –f file.yaml) to run jobs from an LHC experiment on any Kubernetes cluster services managed by Kubernetes self-healing e.g. if a node dies, any services running on it will be re-scheduled onto another node elastic number of pilots will scale depending on work 21

More realistic CMS testing Used standard CRAB3 analysis jobs with a test glideinWMS instance at RAL Kubernetes RAL glideinWMS HTCondor pool squid squid startd startd jobs created CERN CRAB server submit workflow CRAB: system for running CMS analysis glideinWMS: pilot framework used by CMS 22

More realistic CMS testing Monte Carlo generation & simulation is a CPU-intensive workflow (simplest CMS workflow) Ran the same workflow (~5000 jobs in total) at RAL & on 2 public clouds data staged-out to a standard grid site no attempt at any tuning or optimisations CPU efficiency & failure rate similar across the 3 resources CPU & wall time ~30% faster on Azure compared to RAL 23

ATLAS & LHCb Over the past week have been running real ATLAS & LHCb jobs on Kubernetes at RAL (at a small scale) Using containers which run a startd which joins an existing ATLAS HTCondor pool run the LHCb DIRAC agent successful jobs ATLAS 24 Cluster resource limits & usage

Summary Using Kubernetes is promising way running LHC workloads identical for both on-premises resources & public clouds very simple: run a single command to create an elastic, self-healing site have successfully run ATLAS, CMS & LHCb jobs Ongoing & future work large scale tests with ATLAS on Azure a new site within ATLAS is currently being setup will also test Azure blob storage (via Dynafed) I/O intensive workloads 25

Summary Further improvements Kubernetes federations once Kubernetes supports different mount propagation modes we should be able to run jobs & CVMFS in different containers will be able to run jobs in unprivileged containers Kubernetes federations makes it very easy to run applications across multiple clusters have done some basic tests, but have not yet run real jobs has the potential to be very useful for bursting from a local cluster to public clouds 26

Acknowledgements Microsoft (Azure Research Award) Google Amazon 27