Download presentation
Presentation is loading. Please wait.
1
Running LHC jobs using Kubernetes
Andrew Lahiff, STFC Rutherford Appleton Laboratory International Symposium on Grids & Clouds 2017
2
Overview Introduction Kubernetes Kubernetes on public clouds
Running jobs from LHC experiments Summary 2
3
Introduction This work began as a Research Councils UK Cloud Working Group Pilot Project Aim to investigate portability between on-premises resources & public clouds portability between multiple public clouds Using particle physics workflows as an example initially using the CMS experiment now also looking ATLAS & LHCb 3
4
Introduction It can be difficult to transparently use
on-premises resources public clouds It’s even more difficult to use multiple public clouds different APIs provide difference services with different APIs auto scalers, load balancers, storage, ... In HEP a lot of effort has gone into developing software to interact with the APIs from different cloud providers Question: Is there a simpler way? Why can we just instantly use any cloud? 4
5
Kubernetes Kubernetes is an open-source container cluster manager Pod
originally developed by Google, donated to the Cloud Native Computing Foundation schedules & deploys containers onto a cluster of machines e.g. ensure that a specified number of instances of an application are running provides service discovery, distribution of configuration & secrets, ... provides access to persistent storage Pod smallest deployable unit of compute consists of one or more containers that are always co-located, co-scheduled & run in a shared context 5
6
Kubernetes Architecture etcd
API server: provides a REST API kubelet: runs pods scheduler: assigns pods to nodes kube-proxy: TCP/UDP forwarding controller manager: runs control loops etcd: key-value store etcd kubelet API server kube-proxy scheduler controller manager Master Nodes 6
7
Why Kubernetes? It can be run anywhere
on-premises bare metal, OpenStack, ... public clouds Google, Azure, AWS, ... Aim is to use Kubernetes as an abstraction layer migrate to containerised applications managed by Kubernetes & use only the Kubernetes API can then run out-of-the-box on any Kubernetes cluster Avoid vendor lock-in as much as possible by not using any vendor specific APIs or services except where Kubernetes provides an abstraction e.g. storage, load balancers 7
8
Kubernetes on public clouds
Availability on the major cloud providers: There are a variety of CLI tools available automate deployment of Kubernetes enable production-quality Kubernetes clusters on AWS & other clouds to be deployed quickly & easily including node auto-scaling Native support Node auto-scaling Google ✔ Azure ✗ AWS 8
9
Kubernetes on public clouds
Third-parties can also provide Kubernetes on public clouds e.g. StackPointCloud supports Google, Azure, AWS & DigitalOcean provide features such as node auto-scaling 9
10
How LHC jobs are run All 4 LHC experiments use pilot frameworks to run jobs What does this mean? pilot factory submits pilot jobs to sites pilot jobs connect to the experiment’s job queue & pull down then execute the actual payload job ... job creator job queue pilot job payload job pilot factory submit pilot job experiment workload management system grid site 10
11
Previous work on clouds
Much work by many groups over the past 5-6 years both private (e.g. OpenStack) & public clouds (mainly AWS, more recently Azure, Google) generally all involve variations on methods for instantiating VMs on demand the VMs either: are virtual worker nodes in a HTCondor batch system directly run the experiment’s pilot 11
12
Issues with this approach
Usually involves software which needs to talk to each cloud API in order to provision VMs e.g. the CMS experiment uses HTCondor, which can provision VMs on clouds supporting the EC2 or Google API adding ability to use Azure would require modifications to both CMS software & third-party software (HTCondor) Some projects have used cloud-provider specific services (sometimes many services) e.g. one HEP tool uses 10 AWS services this is a problem if you want or need to use resources from a different provider as well Using Kubernetes as an abstraction layer avoids these problems 12
13
Running LHC jobs Some ways LHC jobs could be run on Kubernetes:
each experiment’s workload mangement system could talk to the Kubernetes API not possible for a site to do on its own, requires effort from experiments modify a grid Computing Element (CE) to support Kubernetes as an alternative batch system use the “vacuum” model directly run the experiment’s pilot agent scale up if there is work, scale down when there is no work Using the vacuum model is the simplest method we’re also looking at running a HTCondor pool on Kubernetes 13
14
Container image We need to run the pilot agent in a container with
appropriate software installed (like an SL6 grid worker node) access to the CernVM File System (CVMFS) Need to do this in a way without requiring any modifications to the host where the container will run e.g. installing CVMFS on the host breaks portability Currently the only option is to run a privileged container which both provides CVMFS runs the pilot agent as an unprivileged user container exits when the pilot exits 14
15
Resources used for testing
On-premises resources & public clouds RAL Kubernetes installed using kubeadm bare metal nodes Azure Azure Container Service (initially used a CLI from Weaveworks) Google Google Container Engine Amazon Used StackPointCloud to provision Kubernetes 15
16
horizontal pod autoscaler
A simple test on Azure HTCondor pool + squid caches running in containers on Kubernetes manually create & submit CMS MC production jobs to the HTCondor schedd replication controllers managing negotiator, collector, schedd, worker nodes, squids horizontal pod autoscaler for the worker nodes scaling based on CPU usage HTCondor collector HTCondor negotiator HTCondor schedd HTCondor worker nodes squids squids horizontal pod autoscaler 16
17
A simple test on Azure 17 Cluster resource limits & usage
CMS MC production jobs submitted - number of pods increases Jobs killed - number of pods decreases Cluster resource limits & usage Example squid container resource limits & usage 17 Used
18
Improvements Issues Auto-scaling based on CPU usage is of course not perfect CPU usage can vary during the life of a pilot (or job) Kubernetes won’t necessarily kill ‘idle’ pods when scaling down quite likely that running jobs will be killed this is definitely a problem for real jobs confirmed this does happen for ATLAS & LHCb CPU usage of an ATLAS pilot 18
19
Improvements Instead of using a replication controller to manage the worker nodes (or pilots), do something else create a custom ‘controller’ Python script which can create pods as needed runs in a container in the cluster, uses the Kubernetes REST API don’t kill running pods, lets them finish on their own two versions: worker pool: maintains a specified number of running pods vacuum: scales up & down depending on how much work there is could also monitor a HTCondor pool & create pods if there are idle jobs 19
20
Credentials Need to provide a X509 proxy to authenticate to the experiment’s pilot framework store a host certificate & key as a Kubernetes Secret cron which runs a container which creates a short-lived proxy from the host certificate updates a Secret with this proxy Containers running jobs proxy made available as a file new containers will have the updated proxy the proxy is updated in already running containers 20
21
Running LHC jobs To summarize, we can start with
a single yaml file a host certificate & key Run a single command (kubectl create –f file.yaml) to run jobs from an LHC experiment on any Kubernetes cluster services managed by Kubernetes self-healing e.g. if a node dies, any services running on it will be re-scheduled onto another node elastic number of pilots will scale depending on work 21
22
More realistic CMS testing
Used standard CRAB3 analysis jobs with a test glideinWMS instance at RAL Kubernetes RAL glideinWMS HTCondor pool squid squid startd startd jobs created CERN CRAB server submit workflow CRAB: system for running CMS analysis glideinWMS: pilot framework used by CMS 22
23
More realistic CMS testing
Monte Carlo generation & simulation is a CPU-intensive workflow (simplest CMS workflow) Ran the same workflow (~5000 jobs in total) at RAL & on 2 public clouds data staged-out to a standard grid site no attempt at any tuning or optimisations CPU efficiency & failure rate similar across the 3 resources CPU & wall time ~30% faster on Azure compared to RAL 23
24
ATLAS & LHCb Over the past week have been running real ATLAS & LHCb jobs on Kubernetes at RAL (at a small scale) Using containers which run a startd which joins an existing ATLAS HTCondor pool run the LHCb DIRAC agent successful jobs ATLAS 24 Cluster resource limits & usage
25
Summary Using Kubernetes is promising way running LHC workloads
identical for both on-premises resources & public clouds very simple: run a single command to create an elastic, self-healing site have successfully run ATLAS, CMS & LHCb jobs Ongoing & future work large scale tests with ATLAS on Azure a new site within ATLAS is currently being setup will also test Azure blob storage (via Dynafed) I/O intensive workloads 25
26
Summary Further improvements Kubernetes federations
once Kubernetes supports different mount propagation modes we should be able to run jobs & CVMFS in different containers will be able to run jobs in unprivileged containers Kubernetes federations makes it very easy to run applications across multiple clusters have done some basic tests, but have not yet run real jobs has the potential to be very useful for bursting from a local cluster to public clouds 26
27
Acknowledgements Microsoft (Azure Research Award) Google Amazon 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.