Putting your users in a Box John (TJ) Knoeller Center for High Throughput Computing.

Slides:



Advertisements
Similar presentations
Putting your users in a Box Greg Thain Center for High Throughput Computing.
Advertisements

Lightweight virtual system mechanism Gao feng
Greg Thain HTCondor Week 2015
HTCondor / HEP Partnership and Activities HEPiX Fall 2014 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Hands-On Virtual Computing
HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Section 3.1: Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random.
Linux in a Virtual Environment Nagarajan Prabakar School of Computing and Information Sciences Florida International University.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Grid job submission using HTCondor Andrew Lahiff.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
Breaking Barriers Exploding with Possibility Breaking Barriers Exploding with Possibility The Cloud Era Unveiled.
HTCondor Recent Enhancement and Future Directions HTCondor Recent Enhancement and Future Directions HEPiX Fall 2015 Todd Tannenbaum Center for High Throughput.
Docker and Container Technology
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Docker Overview Automating.
CSC414 “Introduction to UNIX/ Linux” Lecture 6. Schedule 1. Introduction to Unix/ Linux 2. Kernel Structure and Device Drivers. 3. System and Storage.
Lecture 02 File and File system. Topics Describe the layout of a Linux file system Display and set paths Describe the most important files, including.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Oracle Virtualization Last Update Copyright 2012 Kenneth M. Chipps Ph.D.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Support for Vanilla Universe Checkpointing Thomas Downes University of Wisconsin-Milwaukee (LIGO)
Linux Containers and Docker
Effective use of cgroups with HTCondor
Containing HTCondor Greg Thain INFN 2016.
HTCondor Annex (There are many clouds like it, but this one is mine.)
Improvements to Configuration
Fundamentals Sunny Sharma Microsoft
Dockerize OpenEdge Srinivasa Rao Nalla.
HTCondor Security Basics
Intermediate HTCondor: Workflows Monday pm
Linux Containers Overview & Roadmap
UBUNTU INSTALLATION
Things you may not know about HTCondor
Workload Management System
High Availability in HTCondor
Things you may not know about HTCondor
Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command Some by invocation from other running.
Operating System Structure
Machine Learning Workshop
Containers and Virtualisation
Andrew Pruski SQL Server & Containers
Containers in HPC By Raja.
Condor: Job Management
Integration of Singularity With Makeflow
Troubleshooting Your Jobs
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Haiyan Meng and Douglas Thain
HTCondor Security Basics HTCondor Week, Madison 2016
Job Matching, Handling, and Other HTCondor Features
IS3440 Linux Security Unit 7 Securing the Linux Kernel
Intro about Contanier and Docker Technology
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Docker Some slides from Martin Meyer Vagrant Box:
Introduction to Docker
Process Description and Control in Unix
Condor-G Making Condor Grid Enabled
Azure Container Service
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Putting your users in a Box
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

Putting your users in a Box John (TJ) Knoeller Center for High Throughput Computing

› Why put jobs in a box? › Boxes that work everywhere › Shiny new Linux-only boxes › Older Linux-only boxes 2 Outline

1) Protect the machine from the job. 2) Protect the job from the machine. 3) Protect one job from another. 3 Protections 3

› Allows nesting › Need not require root › Can’t be broken out of › Portable to all OSes › Allows full management:  Creation // Destruction  Monitoring  Limiting The ideal box 4

 CPU  Memory  Disk  Network.  Signals Resources a job can (ab)use 5

› Some people see this problem, and say › “I know, we’ll use a Virtual Machine” The Big Hammer 6

› Might need hypervisor installed  The right hypervisor (the right Version…) › Need to keep full OS image maintained › Difficult to debug › Hard to see into › Hard to federate › Just too heavyweight Problems with VMs 7

› One way glass  Opaque from the inside not from the outside  Work with Best feature of HTCondor ever Which is …. ? › Linux containers (LXC) applicable here  But rootly powers required How far can we get without root? Containers, not VMs 8

› HTCondor Preempt expression PREEMPT = TARGET.MemoryUsage > threshold ProportionalSetSizeKb > threshold › setrlimit call USER_JOB_WRAPPER STARTER_RLIMIT_AS Boxing without root 9

ASSIGN_CPU_AFFINITY = true › Pros  Works with dynamic slots  Works even if not root  Any Linux/Windows version › Cons  Glide-ins don’t know which CPU to use  Doesn’t allow the job to soak up idle cycles CPU AFFINITY 10

› PID namespaces › Named Chroots › Cgroups › Mount under scratch and/or › Docker * * Not root exactly, but might as well be… With root comes… 11

› You can’t kill what you can’t see › Requirements:  HTCondor 8.0+  RHEL 6 or later  USE_PID_NAMESPACES = true (off by default)  Must be root PID namespaces 12

PID Namespaces 13 Init (1) Master (pid 15) Startd (pid 26) Starter (pid 39) Job B (pid 2) Starter (pid 73) Job A (pid 2) Condor_init (pid 1)

› Isolate the filesystem from the job › Startd advertises available chroots NAMED_CHROOT = /foo/R1,/foo/R2 › Job picks one: +RequestedChroot = “/foo/R1” › Make sure path is secure! Named Chroots 14

Ancient History: Chroot › HTCondor used to chroot every job:  No job could touch the file system  Private files in host machine stayed private But…

Chroot: more trouble than value Increasingly difficult to work: Shared libraries /dev /sys /etc /var/run pipes for syslog, etc. How to create root filesystem? Easier now with yum, apt get, etc., but still hard:

We gave up! HTCondor no longer chroots all jobs But you can optionally do so. Very few site sites do…

Enter Docker Docker manages Linux containers. And gives Linux processes a private: Root file system Process space NATed network UID space

Examples This is an “ubuntu” container This is my host OS, running Fedora Processes in other containers on this machine can NOT see what’s going on in this “ubuntu” container

Docker Container vs. Image Image is like Unix program on disk read only, static Container is like Unix process Docker run starts a container from an image Container states: like a condor job: Running Stopped

Docker Images Images provide the user level filesystem Doesn’t contain the linux kernel Or device drivers Or swap space Very small: ubuntu: 200Mb. Images are READ ONLY

At the Command Line $ cat /etc/redhat-release Fedora release 20 (Heisenbug) $ docker run ubuntu cat /etc/debian_version jessie/sid $ time docker run ubuntu sleep 0 real0m1.825s user0m0.017s sys0m0.024s

Isn’t this a Virtual Machine? › Containers share Linux kernel with host › Host can “ps” into container  One-way mirror, not black box › Docker containers do not run system daemons  CUPS, , cron, init, fsck, (think about security!) › Docker images much smaller than VM ones  Just a set of files, not a disk image › Docker provides namespace for images › Much more likely to be universally available

Docker run two step docker run ubuntu cat /etc/deb.. Every image that docker runs must be local Docker run implies docker pull run can fail if image doesn’t exist or can’t be downloaded

Where images come from Docker, inc provides a public-access hub Contains 10,000+ publically usable images behind a CDN What’s local? $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE new_ubu latest b df7 8 weeks ago MB dd58b0ec6b9a 8 weeks ago MB 1d19dc9e2e4f 8 weeks ago MB rocker/rstudio latest 14fad19147b6 8 weeks ago 787 MB ubuntu latest d0955f21bf24 8 weeks ago MB busybox latest 4986bf8c months ago MB How to get $ docker search image-name $ docker pull image-name

Full Image name hub.demo.org:8080/user/image:ver Name of the hub (default docker-io) Image name and version: (default: “latest”)

Run your own docker hub › Docker hub is an image! $ docker run docker/docker-registry ( and a bunch of setup – google for details) Any production site will want to run own hub Or put a caching proxy in front of the public one

HTCondor docker universe Need condor 8.4+ Need docker (maybe from EPEL) $ yum install docker-io Docker is moving fast: docker 1.8+, ideally Condor needs to be in the docker group! $ useradd –G docker condor $ service docker start

What? No Knobs? › condor_starter detects docker by default $ condor_status –l | grep –i docker HasDocker = true DockerVersion = "Docker version 1.5.0, build a8a31ef/1.5.0" › If docker is in a non-standard place DOCKER = /usr/bin/docker

“Docker” Universe jobs universe = docker docker_image = deb7_and_HEP_stack executable = /bin/my_executable arguments = arg1 transfer_input_files = some_input output = out error = err log = log queue

A docker Universe Job Is a Vanilla job › Docker containers have the job-nature  condor_submit  condor_rm  condor_hold  Write entries to the user log event log  condor_dagman works with them  Policy expressions work.  Matchmaking works  User prio / job prio / group quotas all work  Stdin, stdout, stderr work  Etc. etc. etc.*

Docker Universe universe = docker docker_image =deb7_and_HEP_stack # executable = /bin/my_executable Image is the name of the docker image on the execute machine. Docker will pull it Executable is from submit machine or image NEVER FROM execute machine! Executable is optional (Images can name a default command)

Docker Universe and File transfer universe = docker transfer_input_files = When_to_transfer_output = ON_EXIT HTCondor volume mounts the scratch dir And sets the cwd of job to scratch dir RequestDisk applies to scratch dir, not container Changes to container are NOT transferred back Container destroyed after job exits

Docker Resource limiting RequestCpus = 4 RequestMemory = 1024M RequestDisk = Somewhat ignored… RequestCpus translated into cgroup shares RequestMemory enforced If exceeded, job gets OOM killed job goes on hold RequestDisk applies to the scratch dir only 10 Gb limit rest of container

› Install docker › Give HTCondor access to docker › Profit! But can you effectively box jobs without it? Docker universe “just works” 35

› Two basic kernel abstractions: 1) nested groups of processes 2) “controllers” which limit resources Control Groups aka “cgroups” 36

› Implemented as filesystem  Mounted on /sys/fs/cgroup, or /cgroup or … › User-space tools in flux  Systemd  Cgservice › /proc/self/cgroup › Docker manages cgroups for containers Control Cgroup setup 37

› Cpu  Allows fractional cpu limits › Memory  Need to limit swap also or else… › Freezer  Suspend / Kill groups of processes Cgroup controllers 38

› Requires:  RHEL6, RHEL7 even better  HTCondor 8.0+  Rootly condor  BASE_CGROUP=htcondor  And… cgroup fs mounted… Enabling cgroups 39

› Starter puts each job into own cgroup  Named exec_dir + job id › Procd monitors  Procd freezes and kills atomically › MEMORY attr into memory controller › CGROUP_MEMORY_LIMIT_POLICY  Hard or soft  Job goes on hold with specific message Cgroups with HTCondor 40

› Or, “Shared subtrees” › Goal: protect /tmp from shared jobs › Requires  Condor 8.0+  RHEL 5  HTCondor must be running as root  MOUNT_UNDER_SCRATCH = /tmp,/var/tmp MOUNT_UNDER_SCRATCH 41

MOUNT_UNDER_SCRATCH=/tmp,/var/tmp Each job sees private /tmp, /var/tmp Downsides: No sharing of files in /tmp MOUNT_UNDER_SCRATCH 42

Questions? 43