Presentation is loading. Please wait.

Presentation is loading. Please wait.

HTCondor Project Plans Zach Miller OSG AHM 2013.

Similar presentations


Presentation on theme: "HTCondor Project Plans Zach Miller OSG AHM 2013."— Presentation transcript:

1 HTCondor Project Plans Zach Miller zmiller@cs.wisc.edu OSG AHM 2013

2 Even More Lies Zach Miller zmiller@cs.wisc.edu OSG AHM 2013

3 chtc.cs.wisc.edu Outline › Fewer Lies Predictions › More Accomplishments › Some hints at Future Work

4 chtc.cs.wisc.edu HTCondor › As you’ve probably noticed, the project’s name has changed. › HTCondor specifically is the software that is developed by the Center for High- Throughput Computing at UW-Madison › However, the code, names of binaries, and any configuration file names and entries, have NOT changed.

5 chtc.cs.wisc.edu User Tools › condor_ssh_to_job  If enabled by the admin, allows users to get a shell in the remote execution sandbox as the user that is running the job  Great for debugging! % condor_ssh_to_job 115.0 Welcome to slot26@ingwe.cs.wisc.edu! Your condor job is running with pid(s) 8010. > ls condor_exec.exe _condor_stderr _condor_stdout > whoami zmiller

6 chtc.cs.wisc.edu User Tools › condor_q –analyze  Now doesn’t need to (but can) fetch user priorities, removing the need to contact the negotiator daemon—this results in a much improved response time for busy pools  Can also analyze the Requirements of the condor_startd, in addition to the Requirements of the job

7 chtc.cs.wisc.edu User Tools › condor_tail  Allows users to view the output of their jobs while the job is still running  Like the UNIX “tail -f” it will allow following the contents of a file (real time streaming)  Not yet part of the HTCondor release (but should be there Real Soon Now™)

8 chtc.cs.wisc.edu User Tools › condor_qsub  Allows a user to submit a PBS or SGE job to HTCondor directly  Translates the command-line arguments, as well as inline (#PBS or #$) commands to their equivalent condor_submit commands  This is in no way complete. We are not hoping to emulate every feature of qsub, but rather capture the main functionality that supports the majority of simple use cases.

9 chtc.cs.wisc.edu User Tools › condor_ping  Tests the authentication, mapping, and authorization of a user submitting a job to the running HTCondor daemons  Tries to provide helpful debugging info in the case of failure  globusrun –a

10 chtc.cs.wisc.edu Admin Tools › condor_ping  Tests the authentication, mapping, and authorization of daemon-to-daemon communications of running HTCondor daemons  Helps assure administrators they have configured things correctly  Tries to provide helpful debugging info in the case of failure

11 chtc.cs.wisc.edu Admin Tools › condor_who  Shows what jobs by which user are running on the local machine  Does not depend on contacting HTCondor daemons – it gets all info from logs and ‘ps’ % condor_who OWNER CLIENT SLOT JOB RUNTIME PID PROGRAM zmiller@cs.wisc.edu ingwe.cs.wisc.edu 26 117.0 0+00:00:53 8375 /scratch/

12 chtc.cs.wisc.edu Networking › IPv6 support in HTCondor has been around for some time, but we continue to test it and harden that code › The condor_shared_port daemon allows an HTCondor instance to listen on a single port, easing configuration in firewalled environments › We would like this to be the default, with port 9618 (registered by name with IANA)

13 chtc.cs.wisc.edu Security › Creation of official policy › Documented on HTCondor web site  Reporting Process  Release Process  Known vulnerabilities › Running coverity nightly

14 chtc.cs.wisc.edu Security › Added ability to increase number of bits in delegated proxies › condor_ping already mentioned › Audit Log  Will record the authenticated identity of each user and as much information as is practical about each job that runs  In progress…

15 chtc.cs.wisc.edu Sandboxing › Per-job PID namespaces  Makes it impossible for jobs to see outside of their process tree, and therefore unable to interfere with the system or with other jobs even if they are owned by the same user  Allows filesystem namespaces, such that each job can have its own /tmp directory which is actually mounted mounted in HTCondor’s temporary execute directory

16 chtc.cs.wisc.edu Sandboxing › cgroups  Allows for more accurate accounting of a job’s memory and CPU usage  Guarantees proper cleanup of jobs

17 chtc.cs.wisc.edu Partionable Slots › p-slots  Can contain “generic” resources, beyond CPU, Memory, Disk  Now work in HTCondor’s “parallel” universe  Support for “quick claiming” where a whole machine can be subdivided and claimed in a single negotiation cycle › Can lead to “fragmenting” of a pool

18 chtc.cs.wisc.edu Defrag › condor_defrag  A daemon which periodically drains and recombines a certain portion of the pool  Leads to the recreation of larger slots which can then be used for larger jobs  Necessarily causes some “badput” › condor_drain is a command-line tool that does this on an individual machine

19 chtc.cs.wisc.edu Statistics › Condor daemons can now report a wide variety of statistics to the condor_collector  Statistics about the daemons, like response times to incoming requests  About jobs and quantities of data transferred › What is yet to be done is to include tools that help make sense of those statistics, as either a new and improved CondorView or as a Gratia probe

20 chtc.cs.wisc.edu Scalability › Working on reducing memory footprint of daemons, particularily the condor_shadow › Queued file transfers are now processed round-robin instead of FIFO, so individual users are not starved › ClassAd caching in the schedd and collector have resulted in 30-40% savings in memory

21 chtc.cs.wisc.edu HTCondor Version 8.0.0 › Will contain all of the above goodness › Should be released approximately “during HTCondor week”, April 29 – May 3 › What lies beyond?

22 chtc.cs.wisc.edu Future Work › Scalability  We always need to be improving this in an attempt to stay ahead of Igor the curve  New tools will be needed – nobody wants to run condor_status and see individual state for 100,000 cores  Reducing the amount of per-job memory used on the submit machine  Collector hierarchies to deal with high-latency, wide area cloud pools

23 chtc.cs.wisc.edu Future Work › Weather report: 100% chance of clouds  Support for more types of resource acquisition models: EC2 Spot Instance, Azure, OpenStack, The Next Big Thing™  Simple creation of single-purpose clusters Homogenous Ephemeral Single user, single job  Seamless integration of cloud resources with locally deployed infrastructure

24 chtc.cs.wisc.edu Future Work › Dealing with more hardware complexity  More and more cores  GPUs › Simplifying deployment on widely disparate use cases › Improve support for black-box / commercial applications › Meeting increasing data challenges

25 chtc.cs.wisc.edu Conclusion › Many things accomplished… › Many more to do… › Questions? Ask me! Or email me at zmiller@cs.wisc.edu zmiller@cs.wisc.edu › Thanks!


Download ppt "HTCondor Project Plans Zach Miller OSG AHM 2013."

Similar presentations


Ads by Google