Condor: Firewall Mirroring

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
High End Computing at Cardiff University Focus on Campus Grids James Osborne.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor Derek Wright Computer Sciences Department.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
17/09/2004 John Kewley Grid Technology Group Introduction to Condor.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
April Open Science Grid Building a Campus Grid Mats Rynge – Renaissance Computing Institute University of North Carolina, Chapel.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison
John Kewley e-Science Centre CCLRC Daresbury Laboratory 28 th June nd European Condor Week Milano Heterogeneous Pools John Kewley
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
Grid Computing I CONDOR.
Grid job submission using HTCondor Andrew Lahiff.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Condor Usage at Brookhaven National Lab Alexander Withers (talk given by Tony Chan) RHIC Computing Facility Condor Week - March 15, 2005.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David.
Condor week – March 2005©Gabriel Kliot, Technion1 Adding High Availability to Condor Central Manager Gabi Kliot Technion – Israel Institute of Technology.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
22 nd Oct 2008Euro Condor Week 2008 Barcelona 1 Condor Gotchas III John Kewley STFC Daresbury Laboratory
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
Five todos when moving an application to distributed HTC.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
HTCondor Networking Concepts
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
HTCondor Networking Concepts
HTCondor Security Basics
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Condor Introduction and Architecture for Vanilla Jobs CERN Feb
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Moving CHTC from RHEL 6 to RHEL 7
High Availability in HTCondor
Things you may not know about HTCondor
Monitoring HTCondor with Ganglia
Moving CHTC from RHEL 6 to RHEL 7
Building Grids with Condor
Condor: Job Management
Troubleshooting Your Jobs
Accounting, Group Quotas, and User Priorities
HTCondor Security Basics HTCondor Week, Madison 2016
Basic Grid Projects – Condor (Part I)
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Late Materialization has (lately) materialized
Condor-G Making Condor Grid Enabled
Credential Management in HTCondor
Troubleshooting Your Jobs
Job Submission Via File Transfer
Condor-G: An Update.
Presentation transcript:

Condor: Firewall Mirroring UK Condor Week 2004

Outline Problem of Firewalls within a Condor Pool Options to alleviate these problems Our Solution

Firewalls within a Condor Pool Some resource owners have firewalls on their personal workstations Since Condor needs each submit node to be able to talk to every potential execute node, this does not scale well.

Slide based on one from the University of Wisconsin-Madison Job Startup Central Manager Negotiator Collector Submit Machine Execute Machine Schedd Startd Starter Job Steps 1. Startd sends collector ClassAd describing itself. (The Schedd does as well, but it has nothing interesting to say yet.) 2. The user calls condor_submit to submit a job. The job is handed off to the schedd and condor_submit returns. 3. The schedd alerts the collector that it now has a job waiting. 4. The negotiator asks the collector for a list machines able to run jobs and schedd queues with waiting jobs. 5. The negotiator contacts the schedd to learn about the waiting job. 6. The negotiator matches the waiting job with the waiting machine. 7. The negotiator alerts the schedd and the startd that there is a match. 8. The schedd contacts the startd to claim the match. 9. The schedd starts a shadow to monitor the job. 10. The startd starts a starter to start the job. 11. The starter and the shadow contact each other. 11. The starter starts the job. 12. If the job is using the Condor syscall library (typically through being condor_compiled), it will contact the shadow to access necessary files. Submit Shadow Condor Syscall Lib Slide based on one from the University of Wisconsin-Madison

What if the firewall is out-of-step? A Job may still match for the newly added machine to the firewalled resource. This job will not be able to run Parts of the system jam as a result. condor_q on submitting node The other parts of the submit script (maybe also parts of the central node)

A Related problem Similar “jams” occur if part of your pool (or flock of pools) is on a network that is unavailable to some of the other nodes How can we permit jobs from submit nodes that can access the private network to run on these nodes whilst preventing Condor sending jobs from other submit nodes there?

How can we get round this? Restrict the number of submit nodes Automatically update the firewall files Ensure everything is up-to-date Permit pool to evolve whilst persuading Condor to “avoid” going to nodes where the job can’t run Restrict the number of submit nodes Only these nodes need to be updated when new machines are added to the pool. User’s must all have accounts on at least one of the submit nodes. Automatically update the FW files Resource owners who are serious enough about security to have their own firewall are unlikely to want their firewall files messed with by a script which runs as root! Ensure everything is up-to-date Infeasible

Firewall Mirroring (1) Each machine with a firewall declares the fact in its ClassAds: HAS_FIREWALL = TRUE Also, which machines and/or subnets it permits to access its Condor ports (mirroring FW table settings): FW_ALLOWS_113 = TRUE FW_ALLOWS_rjavig6 = TRUE Finally, it needs to export these settings: STARTD_EXPRS = HAS_FIREWALL, FW_ALLOWS_113, \ FW_ALLOWS_rjavig6

Firewall Mirroring (2) To ensure that jobs can only go to resources they can reach. Ensure that they declare their subnet and hostname: MY_SUBNET = 113 MY_HOST = condor Use these value in the following macro which is added to all REQUIREMENTS for jobs from this machine: OK_FOR_THIS_MACHINE = ( \ (HAS_FIREWALL =!= TRUE) || \ (FW_ALLOWS_$(MY_HOST) == TRUE) || \ (FW_ALLOWS_$(MY_SUBNET) == TRUE) ) APPEND_REQUIREMENTS = $(OK_FOR_THIS_MACHINE)

And Private Networks? Same solution can be used for private networks by pretending they have a firewall and declaring which other nodes have access to that network

Conclusion While this solution does not solve the firewalled workstation problem, it does make it nicer to live in their presence!