ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL
Bologna -- All Hands Meeting 2 Overview Introduction Cross-site Job Migration Improving Documentation Virtual Machines Generic Connection Broker Future Plans Q & A
Bologna -- All Hands Meeting 3 Introduction University of Wisconsin team is dedicated to improving Condor technologies and the NMI framework. Condor user base continues to grow. Expecting upcoming surge of NSF users for NMI.
Bologna -- All Hands Meeting 4 Cross-site Job Migration Pools of ETICS computing resources installed at INFN, CERN, and University of Wisconsin. Jobs automatically routed to remote sites when local resources are unavailable to satisfy requirements. Transparent to users.
Bologna -- All Hands Meeting 5 Cross-site Job Migration Condor Schedd-on-the-Side Condor Schedd-on-the-Side Condor Job Condor Job Condor-C Job Grid Resource Routing Table NMI Build/Test Submission Local Site Remote Site Condor Schedd Condor Schedd Resource Advertiser Resource Advertiser Condor Matchmaker Condor Matchmaker Condor Matchmaker Condor Matchmaker
Bologna -- All Hands Meeting 6 Cross-site Job Migration NMI Universe Beyond ETICS: OMII-UK, OMII-Europe Available Resources Resource Advertiser CERN Resource Advertiser INFN Resource Advertiser University of Wisconsin
Bologna -- All Hands Meeting 7 Cross-site Job Migration Current status: –Explicit job routing is available in NMI framework Future plans: –Initial deployment (without prereq information): November 2006 –Improved matchmaking: December 2006 Still to be determined: –Authorization/Authentication method(s) –Scalable distributed data dissemination
Bologna -- All Hands Meeting 8 Documentation Emphasis on creating complete documentation and user tutorials for NMI framework. Additional contributions from Michael Bletzinger (NCSA) Target deadline: December 2006 ~ January 2007 New website:
Bologna -- All Hands Meeting 9 Virtual Machines Jobs are sand boxed inside of a virtual machine –Changes to the system are isolated to the local VM. Allow for more robust build and test scenarios Current Status in Condor: –Preliminary support for VMware is in Condor 6.9 –Users must create the VM image beforehand. –Future plans is to create VM dynamically and insert jobs –Plan to support Xen and VirtualPC Virtual Machines Condor's current VM-support is not directly usable by the NMI framework.
Bologna -- All Hands Meeting 10 Virtual Machines: Future Plans NMI and ETICS could provide a standard image per OS, configured with pre-requisite software. Images are stored in a cache and dynamically deployed with builds and tests. Users only need add a single-line to their submission file NMI framework enhancements: –Maintain cache of available OS VM images. –Inject build and test scripts inside of VM image. –Extract appropriate status, logs, and job artifacts.
Bologna -- All Hands Meeting 11 Generic Connection Broker One way for Condor jobs to traverse firewall. Daemon that acts as a proxy at the edge of firewalls. Acts as a broker, then steps out of the way. Low “maintenance”: –Works with NATs and multiple private networks. –No changes to firewall configuration Matchmaker Executor Submitter GCB ) Executor registers with GCB 2) Executor advertises to matchmaker 3) After match, submitter contacts executor, via GCB 4) GCB tells executor to open connection 5) Executor opens connection to submitter
Bologna -- All Hands Meeting 12 Gateway Connection Broker Currently only supported in Condor 6.8 for Linux Wisconsin team is working to improve GCB: –Clean up code base and remove testing logic –Port to other operating systems –Improve scalability and network performance
Bologna -- All Hands Meeting 13 Other Future Plans: NMI Parallel scheduling enhancements: –Task synchronization –Primitives today, high-level dependency spec/mgmt tomorrow? –Scalability testing: 10^1, 10^2, 10^3, 10^4 nodes? Re-factored database schema: –Improved DB scalability and performance –Improved build/test artifact provenance –Project hierarchy –Users and groups –Builds and tests are coupled to projects –Task-level metrics Fuzz testing mechanisms Website enhancements (maybe): –Consolidate "old" and "new" web interface –May focus more on debugging info than status info
Bologna -- All Hands Meeting 14 Other Future Plans: Condor New Development Series: Condor 6.9 Improved scalability: –Modularize schedd tasks –Non-blocking I/O Privilege separation: –Daemons no longer need to start with setuid permissions –Integration with glexec/sudo Enhanced security –Continue with source code audits –Signed ClassAds Parallel scheduling: –Document & understand current issues in a pool doing both independent & parallel work –Improve incrementally based on production experiences
Bologna -- All Hands Meeting 15 Q & A