Putting your users in a Box

Slides:



Advertisements
Similar presentations
Putting your users in a Box Greg Thain Center for High Throughput Computing.
Advertisements

Lightweight virtual system mechanism Gao feng
PlanetLab Operating System support* *a work in progress.
European HTCondor Workshop December 2014 summary Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014.
Greg Thain HTCondor Week 2015
Processes & Daemons Chapter IV / Part III. Commands Internal commands: alias, cd, echo, pwd, time External commands, code is in a file: grep, ls, more.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
To run the program: To run the program: You need the OS: You need the OS:
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Methodologies, strategies and experiences Virtualization.
Configuring Disk Quotas Linux System Administration To implement disk quotas, use the following steps: Enable quotas per file system by modifying /etc/fstab.
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
26/4/2001VMware - HEPix - LAL 2001 Windows/Linux Coexistence : VMware Approach HEPix – LAL Apr Michel Jouvin
HTCondor at the RAL Tier-1 Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Section 3.1: Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random.
Linux in a Virtual Environment Nagarajan Prabakar School of Computing and Information Sciences Florida International University.
Recent advances in the Linux kernel resource management Kir Kolyshkin, OpenVZ
4P13 Week 1 Talking Points. Kernel Organization Basic kernel facilities: timer and system-clock handling, descriptor management, and process Management.
Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
Virtualization One computer can do the job of multiple computers, by sharing the resources of a single computer across multiple environments. Turning hardware.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Docker Overview Automating.
VM vs Container Xen, KVM, VMware, etc. Hardware emulation / paravirtualization Can run different OSs on the same box Dozens of instances OS sprawl problem.
Oracle Virtualization Last Update Copyright 2012 Kenneth M. Chipps Ph.D.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
System Software (1) The Operating System
Condor Week Apr 30, 2008Pseudo Interactive monitoring - I. Sfiligoi1 Condor Week 2008 Pseudo-interactive monitoring in Condor by Igor Sfiligoi.
Greg Quinn Computer Sciences Department University of Wisconsin-Madison Privilege Separation in Condor.
Putting your users in a Box John (TJ) Knoeller Center for High Throughput Computing.
VM vs Container Xen, KVM, VMware, etc. ● Hardware emulation / paravirtualization ● Can run different OSs on the same box ● Dozens of instances ● OS sprawl.
OpenShift & SELinux Dan Walsh Twitter: #rhatdan
Introduction to threads
CHTC Policy and Configuration
Effective use of cgroups with HTCondor
Containing HTCondor Greg Thain INFN 2016.
HTCondor Annex (There are many clouds like it, but this one is mine.)
Matt Lemons Nate Mayotte
Current Generation Hypervisor Type 1 Type 2.
Application Sandboxes
Intermediate HTCondor: Workflows Monday pm
Linux Containers Overview & Roadmap
Operating System Structure
Course Introduction CSSE 332 Operating Systems
Containers and Virtualisation
CS490 Windows Internals Quiz 2 09/27/2013.
Virtualization overview
Accounting in HTCondor
Integration of Singularity With Makeflow
CIT 480: Securing Computer Systems
VMPCS-OGC Virtual Machine Protection and Checking System using Out-of-Guest Control ferify.
OS Virtualization.
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Haiyan Meng and Douglas Thain
Container technology. Let’s dive into the world of docker and kubernetes Bjarte Brandt, DevOps Architect TV2.
Virtualization Techniques
Upgrading Condor Best Practices
Partition Starter Find out what disk partitioning is, state key features, find a diagram and give an example.
Virtualization and Persistence
Condor: Firewall Mirroring
Linux Filesystem Management
Process Management and System Monitoring
CSE 153 Design of Operating Systems Winter 2019
Troubleshooting Your Jobs
Configuring Disk Quotas
Presentation transcript:

Putting your users in a Box Greg Thain Condor Week 2013

Outline Why put job in a box? Old boxes that work everywhere* *Everywhere that isn’t Windows New shiny boxes

3 Protections Protect the machine from the job. Protect the job from the machine. Protect one job from another. Actually impossible to do with pure POSIX. Especially because a job is a bunch of processes.

The perfect box Allows nesting Need not require root Can’t be broken out of Portable to all OSes Allows full management: Creation // Destruction Monitoring Limiting

A Job ain’t nothing but work Resources a job can (ab)use CPU Memory Disk Signals Network.

Previous Solutions HTCondor Preempt expression setrlimit call TARGET.MemoryUsage > threshold ProportionalSetSizeKb > threshold setrlimit call USER_JOB_WRAPPER STARTER_RLIMIT_AS

From here on out… Newish stuff

The Big Hammer Some people see this problem, and say “I know, we’ll use a Virtual Machine”

Problems with VMs Might need hypervisor installed The right hypervisor (the right Version…) Need to keep full OS image maintained Difficult to debug Hard to federate Just too heavyweight

Containers, not VMs Want opaque box Much LXC work applicable here Work with Best feature of HTCondor ever?

CPU AFFINITY ASSIGN_CPU_AFFINITY=true Now works with dynamic slots Need not be root Any Linux version Only limits the job

PID namespaces You can’t kill what you can’t see Requirements: HTCondor 7.9.4+ RHEL 6 USE_PID_NAMESPACES = true (off by default) Doesn’t work with privsep Must be root

PID Namespaces Init (1) Master (pid 15) Startd (pid 26) Starter (pid 73) Starter (pid 39) Job (pid 1) Job (pid 1)

Named Chroots “Lock the kids in their room” Startd advertises set NAMED_CHROOT = /foo/R1,/foo/R2 Job picks one: +RequestedChroot = “/foo/R1” Make sure path is secure!

Control Groups aka “cgroups” Two basic kernel abstractions: 1) nested groups of processes 2) “controllers” which limit resources

Control Cgroup setup Implemented as filesystem Mounted on /sys/fs/cgroup, or /cgroup or … User-space tools in flux Systemd Cgservice /proc/self/cgroup

Cgroup controllers Cpu Memory freezer

Enabling cgroups Requires: RHEL6 HTCondor 7.9.5+ Rootly condor No privsep BASE_CGROUP=htcondor And… cgroup fs mounted…

Cgroups Starter puts each job into own cgroup Procd monitors Named exec_dir + job id Procd monitors Procd freezes and kills atomically MEMORY attr into memory controller CGROUP_MEMORY_LIMIT_POLICY Hard or soft Job goes on hold with “xxx” message

Cgroup artifacts StarterLog: ProcLog 04/22/13 11:39:08 Requesting cgroup htcondor/condor_exec_slot1@localhost for job … ProcLog cgroup to htcondor/condor_exec_slot1@localhost for ProcFamily 2727. 04/22/13 11:39:13 : PROC_FAMILY_GET_USAGE 04/22/13 11:39:13 : gathering usage data for family with root pid 2724 04/22/13 11:39:17 : PROC_FAMILY_GET_USAGE 04/22/13 11:39:17 : gathering usage …

$ condor_q -- Submitter: localhost : <127.0.0.1:58873> : localhost  ID      OWNER            SUBMITTED RUN_TIME ST PRI SIZE CMD                   2.0   gthain          4/22 11:36 0+00:00:02 R 0 0.0 sleep 3600 $ ps ax | grep 3600 gthain 2727  4268 4880 condor_exec.exe 3600    

A process with Cgroups $ cat /proc/2727/cgroup 3:freezer:/htcondor/condor_exec_slot1@localhost 2:memory:/htcondor/condor_exec_slot1@localhost 1:cpuacct,cpu:/htcondor/condor_exec_slot1@localhost

$ cd /sys/fs/cgroup/memory/htcondor/condor_exec_slot1@localhost/ $ cat memory.usage_in_bytes 258048 $ cat tasks 2727

MOUNT_UNDER_SCRATCH Or, “Shared subtrees” Goal: protect /tmp from shared jobs Requires Condor 7.9.4+ RHEL 5 Doesn’t work with privsep HTCondor must be running as root MOUNT_UNDER_SCRATCH = /tmp,/var/tmp

MOUNT_UNDER_SCRATCH MOUNT_UNDER_SCRATCH=/tmp,/var/tmp Each job sees private /tmp, /var/tmp Downsides: No sharing of files in /tmp

Future work Per job FUSE and other mounts? non-root namespaces

Not covered in this talk Prevent jobs from messing with everyone on the network: See Lark talk at ?????

Conclusion Questions? See cgroup reference material in kernel doc https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt LKN article about shared subtree mounts: http://lwn.net/Articles/159077/