Download presentation
Published byByron Gaines Modified over 7 years ago
2
Enabling Containers for High-Performance Computing
Abdulrahman Azab Services for Sensitive Data (TSD) Research Support Services Group University Center for Information Technology (USIT)
3
IEEE International Conference on Cloud Engineering (IC2E) 2017
Problem statement Software dependency hell Software targeted at OS environment (biology people are in love with Ubuntu, while physics people are in love with CentOS) Users desire to execute their scientific applications and workflows in the same environment used for development or adopted by their community. HPC centers are increasingly struggling to keep pace with the rapid expansion in applications, libraries, and tools demanded by the user community, especially new data intensive communities. This growth is driven by a number of factors. In some cases, large communities are developing software to serve their specific scientific community. In other cases, users may be interested in specific tools that are difficult to install, have a long list of dependencies, and are difficult to port. In some cases, this software may be specifically targeted at an OS environment that is common for their domain but may conflict with the requirements from another community. For example, the biology and genomics community may adopt Ubuntu as their base OS with a specific version of Perl and Python. Meanwhile, the High-Energy Physics community may use Scientific Linux as their platform of choice with very specific requirements for certain libraries, compilers, and scripting tools. While porting these tools to other OS versions may be possible, the overhead to do the port and validate it may be too high for a community. Environment Modules can be used to support different versions of libraries, scripting tools, etc, but building a robust, well tested stack with the exact combination of dependencies can be tedious and challenging. In many cases, what users desire is the ability to easily execute their scientific applications and workflows in the same environment used for development or adopted by their community. In some cases, this can include being able to seamlessly go from their desktop to the HPC environment. Some communities have turned to the cloud because it promises to provide this flexibility. However, using a cloud environment can be challenging as users have to typically address all of the components that would normally be provided by a managed cluster or HPC center. For example, the users need to solve how they handle workload management, file systems, and basic provisioning. The overhead to address these requirements solely to gain flexibility over the software stack is typical too large to be feasible 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
4
IEEE International Conference on Cloud Engineering (IC2E) 2017
Containers: What? App2 Guest OS App3 App1 App4 App5 App6 Virtual Machines App1 Libs/Bins App2 App3 App4 Containers Container Engine VM Monitor VMs: Dedicated resources for each VM (more VMs = more resources). Guest OS for each VM = Wasted resources. The host has no control over VM processes. A VM is a big black box. Containers: Containers use the host kernel. Libs/Bins can be very lightweight (e.g. Ububtu VM is 600 MB while a full Ubuntu Docker base image is 200 MB). A container process is visible and manageable by the host OS like any other process, and thus can be manually classified into cgroups for resource usage (unlike VMs where resources must be allocated as long as the VM is up even if it is running nothing). Host OS Host OS Kernel Hardware Hardware 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
5
IEEE International Conference on Cloud Engineering (IC2E) 2017
dockerplay]$ docker run -d centos sleep 100 Dd453b6b4a876d35279e255ee4f3aa8a44fef2382f978578acb963ccc1f6bf47 dockerplay]$ ps -ef | grep sleep root :55 ? 00:00:00 sleep 100 azab :56 pts/2 00:00:00 grep sleep 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
6
IEEE International Conference on Cloud Engineering (IC2E) 2017
Containers: Why? Containers are build-once-run-anywhere. Most of our module based tools have images on Docker Hub. We may very rarely need to create in-house images. With the use of Linux containers, researchers can install tools on their platform of choice, e.g. Ubuntu for bioinformaticians and CentOS for physicists, as a Docker image and publish it on the Docker hub or just share the Dockerfile with collaborators. Then anyone who has a Docker engine and a proper Linux kernel may run the image and get the same functionality. This makes life easier for software developers in that they no longer need to write multiple installation guides and test on different Linux distributions. It also makes life easier for system administrators, as instead of receiving software requests of type: "I need software X, and here is the installation guide, please install it!", requests will be of type: "I need software X, here is the name of its Docker image, please pull it". In addition, no software maintenance will be needed and no dependency conflicts will arise when installing new software. 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
7
IEEE International Conference on Cloud Engineering (IC2E) 2017
Containers: Why? Queue = 'usit-hpc-software' 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
8
IEEE International Conference on Cloud Engineering (IC2E) 2017
Containers: Why? We don’t have to maintain. Very little performance degradation compared to native With the use of Linux containers, researchers can install tools on their platform of choice, e.g. Ubuntu for bioinformaticians and CentOS for physicists, as a Docker image and publish it on the Docker hub or just share the Dockerfile with collaborators. Then anyone who has a Docker engine and a proper Linux kernel may run the image and get the same functionality. This makes life easier for software developers in that they no longer need to write multiple installation guides and test on different Linux distributions. It also makes life easier for system administrators, as instead of receiving software requests of type: "I need software X, and here is the installation guide, please install it!", requests will be of type: "I need software X, here is the name of its Docker image, please pull it". In addition, no software maintenance will be needed and no dependency conflicts will arise when installing new software. 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
9
IEEE International Conference on Cloud Engineering (IC2E) 2017
Performance 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
10
Docker
11
Images and Layers
12
Images and Layers ubuntu : 200 Mb ubuntu + R : 250 Mb
ubuntu + matlab : 250 Mb All three: 300 Mb
13
Containers is a technology
Docker is a company
14
Want to play with docker?
play-with-docker.com training.play-with-docker.com 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
15
Conrainers without Docker
16
IEEE International Conference on Cloud Engineering (IC2E) 2017
dockerplay]$ cat /etc/*-release LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Red Hat Enterprise Linux Server release 6.9 (Santiago) dockerplay]$ apt-get -bash: apt-get: command not found dockerplay]$ wget dockerplay]$ sudo tar -zxf rootfs.tar.gz [sudo] password for azab: dockerplay]$ ls rootfs bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
17
IEEE International Conference on Cloud Engineering (IC2E) 2017
dockerplay]$ sudo chroot rootfs /bin/bash [sudo] password for azab: ls / bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var cat /etc/*-release PRETTY_NAME="Debian GNU/Linux 8 (jessie)" NAME="Debian GNU/Linux" VERSION_ID="8" VERSION="8 (jessie)" ID=debian HOME_URL=" SUPPORT_URL=" BUG_REPORT_URL=" apt-get apt for amd64 compiled on Mar :31: 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
18
High-performance computing (HPC)
19
HPC Cluster User Shared FS Scheduler Pull Shared images
I need a Scientific Linux with 2GB RAM! User Scheduler Compute Node Compute Node Compute Node Compute Node Container Engine Registry Pull Shared images Shared FS IEEE International Conference on Cloud Engineering (IC2E) 2017
20
IEEE International Conference on Cloud Engineering (IC2E) 2017
Containers for HPC Secure efficient 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
21
IEEE International Conference on Cloud Engineering (IC2E) 2017
Secure How to securely run containers within cluster jobs? Enforce running containers in unprivileged mode, as the user not as root: A container process will run with the same privileges as any other user process. drop Linux capabilities that may provide attack surface, e.g. SETUID and SETGID. How to limit the resource usage of a container job to the borders defined by the queuing system? For queuing systems controlling resource usage by cgroups, e.g. Slurm and Moab: enforce classifying the container processes within the job cgroups. For Docker images, only automated build enabled images should be supported (no black boxes) 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
22
IEEE International Conference on Cloud Engineering (IC2E) 2017
Efficient Build minimal images (centos 200 MB, alpine 3.9 MB) Use multi-stage build Docker system prune: Remove all unused containers, volumes, networks and images (both dangling and unreferenced). 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
23
Docker Multi-stage build
FROM golang:1.7.3 as builder # 700 MB WORKDIR /go/src/github.com/alexellis/href-counter/ RUN go get -d -v golang.org/x/net/html COPY app.go . RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app . FROM alpine:latest # 3.9 MB RUN apk --no-cache add ca-certificates WORKDIR /root/ COPY --from=builder /go/src/github.com/alexellis/href-counter/app . CMD ["./app"]
24
IEEE International Conference on Cloud Engineering (IC2E) 2017
Efficient Less layers as possible avoid adding large files use the .dockerignore Clean up: docker image prune –a docker system prune -a Docker system prune: Remove all unused containers, volumes, networks and images (both dangling and unreferenced). 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
25
Socker Docker Engine Sys Admin Docker binary User socker
-rwsr-sr-x 1 root root Docker Engine
26
Socker workflow Start dockerroot Member of docker Collect user info
Become root Docker installed Valid Image and command Compose docker run Become dokerroot execute Is Slurm job [cg]<-- Slurm’s cgroups Classify cprocesses To [cg] Wait for the container to exit Remove container End Exit Socker workflow
27
Slurm: docker run cat /proc/4822/cgroup 11:blkio:/docker/d034ac18549fed :net_cls:/ 8:devices:/docker/d034ac18549fed :cpuacct:/docker/d034ac18549fed :cpu:/docker/d034ac18549fed :memory:/docker/d034ac18549fed :cpuset:/docker/d034ac18549fed :freezer:/docker/d034ac18549fed91...
28
Slurm: socker run cat /proc/5409/cgroup 11:blkio:/ 10:net_cls:/ 8:devices:/ 6:cpuacct:/ 5:cpu:/ 3:memory:/slurm/uid_238869/job_ :cpuset:/slurm/uid_238869/job_ :freezer:/slurm/uid_238869/job_
29
IEEE International Conference on Cloud Engineering (IC2E) 2017
Thanks … 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
30
IEEE International Conference on Cloud Engineering (IC2E) 2017
Singularity dockerplay]$ module load singularity dockerplay]$ singularity exec ../singularity-img/Centos7-abel.img sleep 100 & [1] dockerplay]$ ps -ef | grep sleep azab :08 pts/2 00:00:00 /cluster/software/VERSIONS/singularity/2.2.1/libexec/singularity/sexec sleep 100 azab :08 pts/2 00:00:00 sleep 100 azab :08 pts/2 00:00:00 grep sleep 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
31
On the cluster: Run the container
Containers: How? Container runners: Singularity, Socker Singularity Socker On a local node: Pull from docker hub (or elsewhere), or create own container $ module load singularity $ singularity import $HOME/<container-file> docker://<image-name> $ docker pull <image-name> $ docker create –-name <container-name> <image-name> $ docker export -o $HOME/<container-file> <container-name> On the cluster: Run the container $ singularity run –B $PWD:$PWD $HOME/<container-file> <command> $ module load socker $ socker run –v $PWD:$PWD $HOME/<container-file> <command> 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
32
Docker in HTCondor universe = docker executable = /usr/local/bin/bowtie arguments = -p 8 -S -v 2 -m 1 /work/hg19/hg19 /work/bowtie-input/$(process).normal.fastq docker_image = genomicpariscentre/bowtie1 output = $(process).normal.sam error = $(process).normal.err log = normal.log queue 46
33
Singularity in HTCondor
Create a singularity container from Docker Centos image $ singularity create --size 2048 /shared/tmp/Centos7.img $ singularity import /shared/tmp/Centos7.img docker://centos:latest Create HTCondor submit file $ nano /tmp/sing.submit executable = /bin/hostname universe = vanilla +SingularityImage = ”/shared/tmp/Centos7.img" +DESIRED_OS="CentOS6" output = example1.$(process).out error = example1.$(process).err log = example1.$(process).log queue 1 Submit… $ condor_submit /tmp/sing.submit Note: Shifter can run a container only inside a slurm script, while socker and singularity are independent 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
34
Socker in Slurm #!/bin/bash #SBATCH --nodes=1 #SBATCH --partition=docker #SBATCG –-image-name=genomicpariscentre/bowtie1 module load socker socker run bowtie -p 8 -S -v 2 -m 1 /work/hg19/hg19 /work/bowtie-input/normal.fastq
35
IEEE International Conference on Cloud Engineering (IC2E) 2017
MPI support Singularity Shifter Socker build-in support for MPI (OpenMPI, MPICH, IntelMPI) relies on MPICH Application Binary Interface (ABI) IBM’s Docker MPICH integration Data analysis example Computing principal components for the first 10,000 variants from the 1000 Genomes Project chromosomes 21 and 22: Singularity has build-in support for MPI (OpenMPI, MPICH, IntelMPI). Once app is build with MPI libraries, execute mpirun as normal, replacing the usual binary with the single file singularity app. While the app is running inside the singularity container, Process Management Interface (PMI/PMIx) calls will pass thru the singularity launcher onto ORTED. This is what MPI is designed to do, no hacks, so works well. Shifter relies on MPICH Application Binary Interface (ABI). Apps that use vanilla MPI that is compatibile with MPICH would work. Site-specific MPI libraries need to be copied to the container at run time. LD_LIBRARY_PATH need to include for example /opt/udiImage/... More details: Socker uses IBM’s Docker MPICH integration: Internally running, e.g: dockermpi -f myhosts mycontainer mycommand myargs Note: Socker is planning to support both Docker and Singularity containers. Different from Singularity and Shifter, we don’t convert Docker containers to another format first but we just use them as they are The data analysis example in details: wget wget LANG=C CHUNKSIZE= mpirun -x LANG -x CHUNKSIZE -np 2 singularity run -H $(pwd) variant_pca.img 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
36
Norway leading Containers task for PRACE
Let’s make Norway great again! 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
37
IEEE International Conference on Cloud Engineering (IC2E) 2017
What have we done so far? Socker is being tested (final stages) on two cluster in Norway (Abel and Colossus), one cluster in Denmark (Computerome), and the Finnish cloud (cPouta). Singularity is in production on Abel and Colossus, and is being tested on Computerome. Shifter is being tested on a test cluster, Snabel. Queuing systems that we worked with: Slurm, Moab, HTCondor 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
38
Publication Google: “azab docker secure”
2016: Software Provisioning Inside a Secure Environment as Docker Containers using STROLL File-system 2017: Enabling containers for high-performance and many task computing
40
IEEE International Conference on Cloud Engineering (IC2E) 2017
Support Slides 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
41
source: “Understanding and Hardening Linux Containers”, page 96
Additional slide if people want to know more about container security features source: “Understanding and Hardening Linux Containers”, page 96
42
IEEE International Conference on Cloud Engineering (IC2E) 2017
Singularity 03. April 2017 IEEE International Conference on Cloud Engineering (IC2E) 2017
43
Socker Docker Engine Sys Admin Docker binary User socker
-rwsr-sr-x 1 root root Docker Engine
44
Docker Protection methods 1
Enable/disable Linux capabilities. Docker by default drops a list of Linux capabilities: CAP_AUDIT_WRITE = Audit log write access CAP_AUDIT_CONTROL = Configure Linux Audit subsystem CAP_MAC_OVERRIDE = Override kernel MAC policy CAP_MAC_ADMIN = Configure kernel MAC policy CAP_NET_ADMIN = Configure networking CAP_SETPCAP = Process capabilities CAP_SYS_MODULE = Insert and remove kernel modules CAP_SYS_NICE = Priority of processes CAP_SYS_PACCT = Process accounting CAP_SYS_RAWIO = Modify kernel memory CAP_SYS_RESOURCE = Resource Limits CAP_SYS_TIME = System clock alteration CAP_SYS_TTY_CONFIG = Configure tty devices CAP_SYSLOG = Kernel syslogging (printk) CAP_SYS_ADMIN = All others /etc/subuid (on the host): dockremap : : 1 /etc/subgid(on the host): Then run the daemon with the following option: docker daemon --userns-remap=default
45
Docker Protection methods 2
seccomp: Secure Computing, or seccomp, helps with the creation of sandboxes. It does so by defining what system calls should be blocked. The latest version of seccomp provides this syscall filtering by using the Berkeley Packet Filter (BPF). Containers currently have the following syscalls disabled (since LXC 1.0.5): kexec_load open_by_handle_at init_module finit_module delete_module When any of the blocked syscalls is made, the kernel will send a SIGKILL signal to stop the related process. /etc/subuid (on the host): dockremap : : 1 /etc/subgid(on the host): Then run the daemon with the following option: docker daemon --userns-remap=default
46
Docker Protection methods 3
User namespaces /etc/subuid (on the host): dockremap : : 1 /etc/subgid(on the host): Then run the daemon with the following option: docker daemon --userns-remap=default
47
High-performance Computing (MPI test)
48
Performance evaluation for HPC
49
Many-Task Computing (Inter-cluster test platform)
50
Condor01 Broker Overlay cPouta Snabel Abel USIT socker_submit Slick
Slurm W HTCondor Slick Registry cPouta USIT Abel Snabel Broker Overlay Condor01 Condor-C Submission script universe = docker …… docker_image = biodckr/bowtie executable = /usr/bin/bowtie Image_size = 28 Meg ……. queue 60 socker_submit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.