Securing HTCondor Flocking

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
How do Networks work – Really The purposes of set of slides is to show networks really work. Most people (including technical people) don’t know Many people.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
Mr. Mark Welton.  Three-tiered Architecture  Collapsed core – no distribution  Collapsed core – no distribution or access.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Connecting LANs, Backbone Networks, and Virtual LANs
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
An example of how you can host your own Wordpress sites, or any other web applications, with a minimum of fuss or clashes between sites.
1 Prepared by: Les Cottrell SLAC, for SLAC Network & Telecommunications groups Presented to Kimberley Clarke March 8 th 2011 SLAC’s Networks.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Block1 Wrapping Your Nugget Around Distributed Processing.
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University.
Network Components By Kagan Strayer. Network Components This presentation will cover various network components and their functions. The components that.
6.1 © 2004 Pearson Education, Inc. Exam Designing a Microsoft ® Windows ® Server 2003 Active Directory and Network Infrastructure Lesson 6: Designing.
What you need to know.  Each TDI vessel is equipped with satellite communications that supplies a LOW BANDWIDTH internet connection. Even though the.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Gabi Kliot Computer Sciences Department Technion – Israel Institute of Technology Adding High Availability to Condor Central Manager Adding High Availability.
Eric Osborne ARNOG 2016 NFV (and SDN). Introduction About me: 20+ years in Internet networking: startup, Cisco, Level(3) Currently a principal architect.
Connectors, Repeaters, Hubs, Bridges, Switches, Routers, NIC’s
Condor Week 09Condor WAN scalability improvements1 Condor Week 2009 Condor WAN scalability improvements A needed evolution to support the CMS compute model.
Space Science and Engineering Center University of Wisconsin-Madison Space Science and Engineering Center University of Wisconsin-Madison 1 NPP Atmosphere.
Advanced Network Labs & Remote Network Agent
Virtual Machine and VirtualBox
Shaopeng, Ho Architect of Chinac Group
Hello everyone I am rajul and I’ll be speaking on a case study of elastic slurm Case Study of Elastic Slurm Rajul Kumar, Northeastern University
HTCondor Annex (There are many clouds like it, but this one is mine.)
Containers as a Service with Docker to Extend an Open Platform
HTCondor Security Basics
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Vmware 2V0-642 VMware Certified Professional 6 - Network Virtualization (NSX v6.2) VCE Question Answers.
Containers: The new network endpoint
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
Docker Birthday #3.
Virtualization and Hosting
LINUX ADMINISTRATION
User Preparation for new Satellite generations
High Availability in HTCondor
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Chapter 4 Data Link Layer Switching
Adding High Availability to Condor Central Manager Tutorial
Chapter 5: Inter-VLAN Routing
Introducing To Networking
Building Grids with Condor
Introduction to Networking
Who We Are SSEC (Space Science and Engineering Center) is part of the Graduate School of the University of Wisconsin-Madison (UW). SSEC hosts CIMSS (Cooperative.
Integration of Singularity With Makeflow
Welcome To : Group 1 VC Presentation
Network+ Guide to Networks 6th Edition
Kubernetes Container Orchestration
Unit 27: Network Operating Systems
Marrying OpenStack and Bare-Metal Cloud
The Stanford Clean Slate Program
Kubernetes intro.
Network Virtualization
HTCondor Security Basics HTCondor Week, Madison 2016
Partner Logo Azure Provides a Secure, Scalable Platform for ScheduleMe, an App That Enables Easy Meeting Scheduling with People Outside of Your Company.
Chapter Goals Compare and contrast various technologies for home Internet connections Explain packet switching Describe the basic roles of various network.
REDCap and Data Governance
Virtual Machine and VirtualBox
How To Configure Hotspot in Virtual Mikrotik on VMware
Virtual Machine and VirtualBox
Virtual Machine and VirtualBox
Kubernetes.
Connectors, Repeaters, Hubs, Bridges, Switches, Routers, NIC’s
2019/7/26 OpenFlow-Enabled User Traffic Profiling in Campus Software Defined Networks Presenter: Wei-Li,Wang Date: 2016/1/4 Author: Taimur Bakhshi and.
Job Submission Via File Transfer
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Securing HTCondor Flocking Kevin Hrpcek UW-Madison Space Science and Engineering Center

SSEC Earth Atmospheric Research Ice Engineering Off earth atmosphere Weather, climate, numerical weather prediction CIMSS, SIPS, SDS, McIDAS Collaboration with NOAA,NASA,NWS Ice Ice core drilling Antarctica weather stations Engineering S-HIS Sounder High speed photometer on Hubble - Removed to fix optics Off earth atmosphere

Satellite data processing High throughput satellite data processing Polar Orbiters MODIS (Terra 1999, Aqua 2002) VIIRS (SNPP 2011, NOAA20 2017) CrIS (SNPP 2011, NOAA20 2017) GEO - experimental ABI (GOES 16) AHI (Himawari 8/9) Forward Stream Processing for Polar Orbiters Uses ~20% of cluster day to day Periodic mission reprocessing Days to weeks of processing

Flocking Bidirectional sharing of compute resources among HTCondor clusters On UW campus CHTC, SSEC, WID, HEP, IceCube, Physics, DoIT, BioStat, BioChem Bidirectional isn’t necessary Jobs need to be architected to work over internet or wan This is what keeps my team from flocking out Runs like normal condor job but as nobody user

Network Unrouted private network for resources Few hosts such as condor submitter have multiple network connections so they can be routed to from outside private network Compute needs many resources on private network Ceph, NFS, Database

Flocking Security Problems Condor provides some security Nobody user Not really secure… Probe network resources Break out of working directory Download anything onto compute nodes Primarily relying on linux user security

Possible Solutions Lots of firewall rules? Don’t flock? Let it be and hope for the best? Virtual Machines? Docker? Something else?

Docker Start from clean container with each restart Something breaks? Restart it Can provide network isolation by specifying NIC to use Less overhead than VM Easily modifiable Building images is easy Doesn’t require overhauling my infrastructure

Flocking+Docker Theory Create a new vlan and trunk it to the all switch ports for compute and condor submitter HTCondor submitter acts as the flocking vlan gateway to the internet Default route for this vlan NAT HTCondor submitter acts as a firewall between flocking and SIPS networks Very important Each compute node runs docker and a CentOS 7 based container that is running condor_master Management script controls the regular startd and flocking startd

The Docker Image

Docker Network Need to have container run on a specific vlan with no access to system routes or other network interfaces Macvlan driver Directly connects a host’s ‘physical’ interface to a running container

Host Network

Container Network docker run --hostname f205.sips --name flocking_startd --network macvlan2512 -- ip=10.27.2.5 --dns=8.8.8.8 -it -v /dev/shm --tmpfs /dev/shm:rw,nosuid,nodev,exec,size=64g sipsdev.sips:5000/centos7-flock /bin/bash

Old Network

New Network

Monitoring from HTCondor Regular startd hosts start with ‘p’ Flocking containers start with ‘f’ All show up on the condor master

Shepherd Python program that manages the flock Runs on condor master Uses python bindings to keep track of everything Turns regular and flocking startd on and off as necessary /tmp/flockoff override Always prefers local work to flocking Leave ~25% of cluster to not flock Run with circus or systemd

Shepherd Script Logic If /tmp/flockoff: ensure all flocking disabled; else Get status of all hosts, regular and flock, and store it Check condor queue If idle queue < 600 and not all hosts are flocking Condor_off $x number of regular startd (p220) condor_on flock container on that physical host (f220) Disable startd process monitoring in Icinga2 Elif idle queue > 600 and there is active flocking Condor_off $y flocking startd, condor_on corresponding physical condor startd Enable startd process monitoring in Icinga2 Sleep 5 min and repeat

Shepherd Status Prints current status of all shepherd managed hosts

Puppet Install docker Set up em1.2512 host interface Set up macvlan2512 docker network Install systemd service to manage flocking container

What does all this get me? Unprivileged user Unprivileged container Reduced Capabilities On a firewalled host On a firewalled vlan with no access to my private network

Risks Break out of container Keep kernel up to date to mitigate risks Only sharing /dev/shm to container A slip up in firewall rules could cause access to my network Other?

Questions?