Download presentation
Presentation is loading. Please wait.
Published byMia Lyon Modified over 10 years ago
1
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service
2
Condor pool characteristics Large number (~1000) of similar/ identical workstations Workstations centrally managed Primary purpose of workstations not for running Condor jobs Workstations are public access machines, i.e. available to all members of institution
3
Fundamental requirements Condor service in this environment must be: Stable: Must not make machines any less stable Low impact: Must be unnoticeable to ordinary users Secure: Must not significantly increase the attack surface
4
Stability Only use the current Condor stable series, not the development series Extensive testing (months, 1000s of test jobs) on small pool of workstations Disable any features of Condor not required by users Support only limited subset of Condor functionality (only Vanilla and Java universes)
5
Low impact Gather usage statistics of target workstations and only allow Condor to run at periods when they would normally be idle Will not run jobs if a user is logged in Custom ClassAd attribute with number of users logged in Any user activity aggressively preempts Condor job Issue under standard Linux 2.6 kernels: USB mouse and keyboard activity not detected Control Condor jobs environment and sterilise environment after job completion Handles jobs using up all available disk space and not cleaning up after themselves, etc
6
Security What is our threat landscape? What are we worried about? How does this specifically relate to Condor? Specific security concerns… …and how we addressed them
7
Threat landscape Threats internal to the environment are at least as significant as external threats: Largest body of users (students) are untrusted No clear separation of use of machines by trusted and untrusted users Access (often wholly or largely unrestricted) to the public Internet is a core requirement: Both for normal use of the machines and for Condor jobs Firewalls are of little help
8
Specific security concerns (1) Reliable identification of machines: IP addresses useless as identifiers (IP spoofing) So strong authentication required: Do not significantly increase the attack surface of machines: No daemons running as root that listen to the network: Privilege separation (see following talk) Control access to the Condor pool: Easiest at point of job submission Restricted number of centralised submit nodes
9
Specific security concerns (2) Controlling the job execution environment: Inspect job prior to running on machine Start job in a sterile environment Sterilise environment after job has run Job run under dedicated unprivileged user account Restrict access to the Condor commands: Ideally develop separate front-end to Condor system Currently just wrapper scripts for Condor commands Can be circumvented (in some cases), so piloting service with relatively trusted users
10
Strong authentication Currently only available under UNIX/Linux Kerberos or GSI GSI: Flawed security paradigm (mandates daemons run as root, etc) Serious usability and scalability issues Kerberos: KDCs provide separate audit trail Plan to use Kerberos elsewhere in the University Support for Kerberos under Windows and MacOS X is being added to Condor; support for GSI is not (functional GSI libraries not available) Bug in Kerberos support in the stable series of Condor: Backported patch from development series to fix Kerberos has proved surprisingly easy to deploy and administer in our setup
11
Scalability / Performance condor_schedd (job queue management) doesnt scale well: Monolithic process: performs too many different tasks Uses blocking connections in stable series In our experience: Performs very badly above 4,000 jobs Falls over above 10,000 jobs Cannot handle significant numbers of short-running (less than 5 minute) jobs Job overhead is such that jobs need to be about 10 minutes long to be worth running under Condor Not much we can do about this: Add more submit nodes as demand on our service rises Educate our users to use service sensibly (e.g. batch up short running jobs) Wrap / replace Condor commands to encourage sensible behaviour / mitigate some of these problems Lobby Condor Team to re-design the condor_schedd daemon
12
Partitioning the pool Require ability to only allow jobs from certain users to run on certain machines: No sensible way provided to do this Restriction via lists of users or machines in configuration files / ClassAd attributes is unwieldy and doesnt scale Our method: Machines configured to only accept jobs with particular ClassAd attribute Set automatically by our wrapper scripts based on users identity On execute nodes cross check user against independently maintained and distributed (via LDAP) ACL – this prevents users falsifying the ClassAd attributes
13
Architectural overview Large number of centrally managed public access workstations running Linux Jobs only run when no users are logged in Centralised submit node(s) Wrappers around Condor commands Restricted (but still useful) subset of Condors functionality Machine identity strongly authenticated Improved Condor security model: Privilege separation on execute nodes Strict control of job environment
14
Conclusion Although Condor not designed for a hostile environment, it can be used relatively securely in such environments (some caveats naturally)… …under Linux… …but a lot of development work is required to achieve this… …and it requires the supporting infrastructure of a stable, centrally managed workstation service. Improvements to Condor would make this significantly easier: Design for a hostile environment. These days, most environments are.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.