Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Condor Project Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Derek Wright Computer Sciences Department, UW-Madison Lawrence Berkeley National Labs (LBNL)
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
1 Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison
1 HawkEye A Monitoring and Management Tool for Distributed Systems Todd Tannenbaum Department of Computer Sciences University of.
Hao Wang Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Grid Computing I CONDOR.
Condor Birdbath Web Service interface to Condor
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
Grid job submission using HTCondor Andrew Lahiff.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Zach Miller Computer Sciences Department University of Wisconsin-Madison Securing Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
HTCondor Security Basics
Building Grids with Condor
Condor: Job Management
HTCondor Security Basics HTCondor Week, Madison 2016
The Condor JobRouter.
Condor-G Making Condor Grid Enabled
Credential Management in HTCondor
Presentation transcript:

Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor

Overview › Condor Development Process  Stable vs. Development › New Features in › Significant improvements which are covered in other talks:  What’s new in Condor-G covered by Todd Tannenbaum  Hawkeye covered by Nick LeRoy  COD (Computing On Demand) covered by Derek Wright  Packaging and Testing covered by Alain Roy

Condor Development Process › We maintain two different releases at all times  Stable Series Second digit is even: e.g , 6.4.7,  Development Series Second digit is odd: e.g , 6.5.2

Stable Series › Heavily tested › Runs on our production pool of nearly 1,000 CPUs › No new features, only bugfixes, are allowed into a stable series › A given stable release is always compatible with other releases from the same series  6.4.X is compatible with 6.4.Y › Recommended for production pools

Development Series › Less heavily tested › Runs on our small(er) test pool. › New features and new technology are added frequently › Versions from the same development series are not always compatible with each other

Overview of New Features  Windows  DAGMan  Better Security  Central Manager  Improved Negotiation  Black Holes  New Utilities  Smarter File Transfer  Submit-time file staging  New Installer  ClassAd improvements  And More!!

Improvements in Condor for Windows › Ability to run SCHEDULER universe jobs  DAGMan  Any executable or batch file › JAVA universe support  JVM provided by execution site  Better error management  Ability to use CHIRP (Remote I/O)

Improvements in Condor for Windows (cont) › New Support for:  Windows XP  Foreign Language versions of Windows  Legacy 16-bit app › Improved Windows-to-UNIX job submission and vice versa. › BirdWatcher, a system tray icon which gives basic status and control of Condor

New Features in DAGMan › DAGMan previously required that all jobs share one log file › Each job can now have it’s own log file › Understands XML userlogs › Can produce.dot file graphs

Better Security › GSI (X.509 Certificates) implementation more complete and customizable  Each Condor daemon can have its own certificate  You can run a “Personal Condor” with your user proxy › Easier configuration  If you already have Globus installed, very little additional configuration of Condor is necessary to start using X.509 certificates for authentication › Improved error messages if something goes wrong  Tells you if the problems was network, authentication, or authorization related

Central Manager New Features › Keeps statistics on missed updates › Can use TCP instead of UDP, if you must › Redundant central managers can be running with the SECONDARY_COLLECTOR_LIST parameter  If the main central manager goes down, you may still run administrative commands › Central Manager daemons can now run on any port  COLLECTOR_HOST = condor.cs.wisc.edu:9019  NEGOTIATOR_HOST = condor.cs.wisc.edu:9020

Improved Negotiation › Allows the condor_schedd (the job queue manager) to send “classes” of jobs to the Negotiator for matching › Previously, jobs were sent one at a time. › Now, 1000 of the same job will take the same time to negotiate as 100, 10 or just one job › Currently, job classes are defined in the condor_config file. Very soon, they will be automatically determined…  “Buckets” will be needed

Avoiding Black Holes › Condor can keep track of the last N resource matches › This can be used to prefer the same machine if restarted › Can also be used to avoid a machine if restarted, which is a first step towards avoiding “Black Holes” – machines that consume jobs but always fail to run them

New Utilites › ‘condor_q –held’ gives you a list of held jobs and the reason they were put on hold › ‘condor_config_val –config’ tells you where (file and line number) an attribute is defined › ‘condor_rm –f’ will forcefully remove a job, which is particularily useful when the globus jobmanager is not cooperating › ‘condor_fetch_log’ will grab a log file from a remote machine:  condor_fetch_log c2-15.cs.wisc.edu STARTD

Smarter File Transfer › New file transfer mechanism:  ShouldTransferFiles = YES | NO | IF_NEEDED  YES : Always transfer files to execution site  NO : Rely on a shared filesystem  IF_NEEDED : will automatically transfer the files if the submit and execute machine are not in the same FileSystemDomain › Very useful for cross-platform submitting and also for flocking

Submit-Time File Staging › When submitting a job, you can tell Condor to create a “sandbox” of all necessary input files with ‘condor_submit –s’ › After completion, job can stay in queue with ‘leave_in_queue’ expression › Output files are then fetched manually

New Installer › For Windows  Based on MSI (Microsoft Software Installer)  Batch Install option › For UNIX  Version will be available in RPMs  Command line options specify the installation parameters, and no questions are asked  Easier to automate

ClassAds › ClassAd attributes can be dynamically linked to external functions  Example: [ label = “uptime” value = some_func_that_calls_uptime() ]

Misc New Features › Jobs can be submitted via GRAM (the Globus Gatekeeper) › Daemons do not have to run as ‘root’ or ‘condor’ to have multiple different users submitting › Rudimentary load balancing between checkpoint servers by picking one randomly from a list › More job policy expressions  PERIODIC_RELEASE  GLOBUS_RESUBMIT

Conclusion › Todd Tannenbaum will tell you about the roadmap for future work › Questions?