Download presentation
Presentation is loading. Please wait.
Published byAnita Sundqvist Modified over 5 years ago
1
JRA 1 Progress Report ETICS 2 All-Hands Meeting
Alain Roy and Becky Gietzel University of Wisconsin-Madison Palermo, October 2008
2
Personnel Change Peter Couvares has left the Condor Project & ETICS
Becky Gietzel now manages the UW build and test facility Todd Miller now manages the Metronome software Alain Roy is the ETICS JRA 1 Work Package Manager Nate Griswold is system administrator Peter is now at: visiblecertainty.com JRA 1 Progress Report Palermo, October2008
3
Major focuses of activity right now
Focus 1: Remote job submission Focus 2: Submission to other batch systems JRA 1 Progress Report Palermo, October 2008
4
Focus 1: Remote Job Submission
Goal: Ability to submit from one build and test facility to another. Approach: When a job cannot run be run locally, run job with Condor-C on remote pool. Questions you might ask: Why can’t a job run locally? What is this Condor-C stuff? JRA 1 Progress Report Palermo, October 2008
5
Question: Why couldn’t a job run locally?
When you submit the job, even if you allow job migration: Condor will run the job locally, if a computer is available. You might have computers available locally, but they’re busy. You might not have computers available locally: perhaps you are request a platform that only exists at a remote site. Metronome will try to run the job remotely when: 5 minutes have passed without match (configurable). … and the Metronome administrator allows remote job submission. … and the job owner allows remote job submission. JRA 1 Progress Report Palermo, October 2008
6
Question: How do you run the job remotely? What is this Condor-C stuff?
There are two components: Job Router: Watches for a job that can migrate Rewrites job very slightly. No longer a “vanilla” Condor job A Condor-C job Condor-C: Instead of matching a job to a computer, runs a job at a remote Condor site Instead of submitting a job to a Condor startd (execution computer), submits to a Condor schedd (submit computer) Implication: matching will happen again at remote site JRA 1 Progress Report Palermo, October 2008
7
Diagram of Remote Job Submission
Local Site Condor Matchmaker (for computers) Condor Submitter (Schedd) 1 Condor Worker Nodes (startd) 2 1 Condor Worker Nodes (startd) Remote Site Condor Submitter (Schedd) Condor Matchmaker (for computers) 2 2 JRA 1 Progress Report Palermo, October 2008
8
State of Remote Job Submission
Tested in testbed: it works well! Running 24 jobs per day (1 per hour) Working 100% Currently moving to pre-production We hope to demonstrate in pre-production very soon Requires software upgrades: Metronome upgrade to 2.5.x Condor upgrade to 7.1.x JRA 1 Progress Report Palermo, October 2008
9
Focus 2: Submission to Other Batch Systems
We are currently prototyping submission to other batch systems. Approach: Use Condor-G Conceptually similar to Condor-C, but instead of submitting to Condor, we can submit to: Unicore CREAM NorduGrid GRAM 2 (pre-web services GRAM) GRAM 4 (web-services GRAM) PBS LSF JRA 1 Progress Report Palermo, October 2008
10
Tradeoffs When we don’t use plain old Condor or Condor-C, there are tradeoffs. Some apply to using Condor-G, some when you use other, non-Condor solution. Metronome uses Condor streaming I/O for real-time updates. Metronome uses Condor DAGMan to control set of jobs which makes up a build/test Works great with Condor-G and Condor-C Condor has mechanisms to recover and/or restart failed jobs Some work with Condor-G Hawkeye for computer information (used for matching) Co-scheduling (parallel jobs) JRA 1 Progress Report Palermo, October 2008
11
Questions? JRA 1 Progress Report Palermo, October 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.