Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

Slides:



Advertisements
Similar presentations
Todd Tannenbaum Condor Team GCB Tutorial OGF 2007.
Advertisements

Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Overview of local security issues in Campus Grid environments Bruce Beckles University of Cambridge Computing Service.
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Parasol Architecture A mild case of scary asynchronous system stuff.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Secure Generic Connection Brokering – SGCB JPDPS Tel-Aviv Dec Secure Generic Connection Brokering SGCB enhancing secure submission of grid jobs.
Lesson 1: Configuring Network Load Balancing
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
1 Enabling Secure Internet Access with ISA Server.
Firewalls and the Campus Grid: an Overview Bruce Beckles University of Cambridge Computing Service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
Sales Kickoff - ARCserve
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Sonny (Sechang) Son Computer Sciences Department University of Wisconsin-Madison Dealing with Internet Connectivity in Distributed.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
Presented by Xiaoyu Qin Virtualized Access Control & Firewall Virtualization.
Horst Severini Chris Franklin, Josh Alexander University of Oklahoma Implementing Linux-Enabled Condor in Windows Computer Labs.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
SECURITY ZONES. Security Zones  A security zone is a logical grouping of resources, such as systems, networks, or processes, that are similar in the.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
A Transport Framework for Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox, John Yin, Gurhan Gunduz, Hongbin Liu, Ahmet Uyar, Mustafa Varank.
Grid Computing I CONDOR.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Chapter 23: ARP, ICMP, DHCP CS332, IS333 Spring 2014.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Condor Usage at Brookhaven National Lab Alexander Withers (talk given by Tony Chan) RHIC Computing Facility Condor Week - March 15, 2005.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Module 10: Windows Firewall and Caching Fundamentals.
Horst Severini, Chris Franklin, Josh Alexander, Joel Snow University of Oklahoma Implementing Linux-Enabled Condor in Windows Computer Labs.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
22 nd Oct 2008Euro Condor Week 2008 Barcelona 1 Condor Gotchas III John Kewley STFC Daresbury Laboratory
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
Alex Chee Daniel LaBare Mike Oster John Spann Bryan Unbangluang Collaborative Document Sharing In Conjunction With.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Sonny (Sechang) Son Computer Sciences Department University of Wisconsin-Madison Dealing with Internet Connectivity in Distributed.
SQL Database Management
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
HTCondor Networking Concepts
HTCondor Networking Concepts
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Building Grids with Condor
Privilege Separation in Condor
Basic Grid Projects – Condor (Part I)
Specialized Cloud Architectures
Condor: Firewall Mirroring
Presentation transcript:

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of Wisconsin) John Kewley (CCLRC Daresbury Laboratory)

What is Condor? A specialised, cross-platform, distributed batch scheduling system Often used for utilising idle CPU cycles on workstations Distributed systems architecture: Different components run on different machines Can provide greater resilience and improve performance… …at the expense of simplicity (particularly simplicity of its use of the network)

Main Condor machine roles Central Manager: Monitors all Condor nodes and matches jobs to execute nodes Submit nodes: Submit jobs to pool Execute nodes: Execute jobs Checkpoint server (optional): Stores checkpoints of jobs (for supported job types) Machines may have more than one role, and there may be multiple machines with each of the above roles (except there can only ever be one active central manager)

Diagrammatic overview Central Manager Execute Node Submit Node Condor daemons (Normally listen on ports 9614 and 9618) Condor daemons Start job on Execute Node Send results to Submit Node Users executable code Condor libraries Users job For some jobs, system calls performed as remote procedure calls back to Submit Node Spawns job and signals it when to abort, suspend, or checkpoint. Execute Node tells Central Manager about itself. Central Manager tells it when to accept a job from Submit Node. Submit Node tells Central Manager about job. Central Manager tells it to which Execute Node it should send job. Checkpoint Server Condor daemons (listen on ports 5651 – 5654) For some jobs, write checkpoints to Checkpoint Server Checkpoint server advertises itself to Central Manager For some jobs, check status of Checkpoint Server

Who? How? Machine communication: Which machine talks to which Protocol(s) used: (Does not include high availability daemon (Condor and later))

Firewalls: Basic problems Pattern of network communication: Many-to-many Often bidirectional Port usage: Large range of dynamic ports Checkpoint server ports not configurable Protocols used: TCP and UDP

Firewalls: Other problems Administrative overhead: Large pool may mean many exceptions Personal firewalls: Like having a different firewall for each machine(!) Condor does not handle certain network connectivity failures gracefully Inadequate/inaccurate documentation Bugs in Condor: Didnt always set SO_KEEPALIVE (now fixed) Machines disappearing from pool (although machine still has network connectivity) Problems with Windows Firewall (now resolved?)

Solutions: Identified requirements Respect the security boundary Reduce administrative overhead Minimal impact on firewall performance NAT/firewall traversal Allow incremental implementation Scalability Robustness (in the face of network problems) Fail gracefully Integrated into Condors security framework Logging Documentation

Types of solution Mitigation (avoidance): Mitigating the effects of firewalls Altering pattern of network communication: Reducing it from many-to-many to one-to-many, few-to-many, etc. NAT/firewall traversal: Traversing the security boundary

Current solutions CCLRCs Firewall Mirroring (FM) Using centralised submit nodes (CS) Remote job submission/Condor-C (C-C) Generic Connection Brokering (GCB) Dynamic Port Forwarding (DPF)

Firewall Mirroring Developed by John Kewley Ensures that jobs are never given to execute nodes that cannot run them because of network connectivity issues (e.g. personal firewalls) Achieved by duplicating firewall configuration in machines ClassAd and then modifying job requirements appropriately Works well with personal firewalls Some administrative overhead

Centralised submit nodes Reduce pattern of network communication (few-to-many or better) Lowers administrative overhead Can have minimal impact on firewall performance …but may impact performance of the Condor pool Ideal for centrally managed campus grid scenarios

Remote job submission/Condor-C Remote job submission: Submit node submits job to a different submit node, which then submits the job to the Condor pool Poorly documented Doesnt scale well Security implications Condor-C: New feature as of Condor Moves job submission queue of one submit node to another submit node scales gracefully when compared with Condors flocking mechanism Maintains only a single network connection between the two submit nodes Can use to reduce pattern of network communication

Generic Connection Brokering NAT/firewall traversal technique Developed by Se-Chang Son Transparent to application Can reverse direction of network connection… …or relay network packets between two machines that could not otherwise communicate Some scalability issues Not yet part of any official Condor release Not yet integrated into Condors security framework

Dynamic Port Forwarding NAT/firewall traversal technique Developed by Se-Chang Son Add-on to firewall: Currently only supports Linux netfilter-based firewalls Application asks DPF to open hole in firewall DPF closes hole when connection finished Highly scalable Not yet part of any official Condor release Not yet integrated into Condors security framework

Solutions v. Requirements See paper for notes and explanations

Conclusion No perfect solution (meets all requirements) Careful design of Condor pool can help Many solutions still experimental / not yet generally available Se-Chang working on further technical solutions not discussed here Some issues best addressed within Condor (e.g. failing gracefully if loss of network connectivity) Further development of Condor required to properly address many of these issues