Talking Points: Deployment on Big Infrastructures INFN HTCondor Workshop Oct 2016
Examples Example: UW-Madison CHTC Example: Global CMS Pool Pool Size : ~15k slots Central Manager : 8 cores (loadavg of 2), 8GB RAM (5GB in use), no special config Submit Machines: ~80 submit machines, 3 "big" general purpose ones, each big one typically has Typical ~10k running / 100k queued, 32 cores, 96GB RAM, SSD Example: Global CMS Pool Pool Size: ~150k - 200k slots Central Manager: collector tree, no preemption Submit machines: 15 with ~15k running
Central Manager Planning Memory: ~1GB of RAM per 4,000 slots plus RAM for other services (e.g. monitoring) …or even better, run them somewhere else CPU: 4 cores can work if < 20k slots; 8 cores if bigger or many users Speed per core (clock) helps 1 gig network connection is OK Create CCB brokers separate from Central Manager at > ~20k slots
Central Manager Planning, cont Negotiator Top Level Collector Use a "collector tree" if using strong authentication / encryption, esp over the WAN Per 1500 execute nodes Hides latency, more parallelism See HOWTO at Child Child Child Execute Nodes https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors
Submit Machine Planning Memory: ~50KB per queued job, ~1MB per running job (actual is ~400KB, safety factor) CPU: 2 or 3 cores fine, BUT Base CPU decision on the needs of logged-in users (i.e. compiling, test jobs, etc) More than 5-10k jobs? Buy an SSD! Our setup typically has Dedicated, small, low-latency SSD for queue, AND Large high-throughput (striped) storage for user home/working directories Network: 1gig, or 10gig if primarily using HTCondor File Transfer
How to move files from submit machine to execute machine? Shared File system (NFS, AFS, Gluster, …) Pro: Less work for users - no need to specify input files Con: No management often leads to meltdown HTCondor File Transfer Con: Users need to specify input and/or output files Pro: File transfers are managed Pro: Simpler to now run the job offsite
Note Brian B's warning…
Horizontal Scaling Submit node scaling problems? Add more Pool can have an arbitrary number of schedds How many needed? Depends on many things Hertz rate of job ( schedd safe at ~10-20+ starts/sec) Submission one at a time vs big batches Amount of job I/O How to detect scaling problem? RecentDaemonCoreDutyCycle > 98% SCHEDD_HOST in .condor/user_config can point to a remote schedd Central manager scaling problems? Add more Add then federate via "Flocking" How to detect scaling problem? Metrics on dropped packets, negotiation cycle time (UW-Madison typically a couple minutes)
Some User/Admin Training Train Users to submit jobs in large batches Instead of running condor_submit 5,000 times do: executable = /bin/foo.exe initialdir = run_$(Process) queue 5000 Train users re reasonable number of queued jobs? reasonable job runtime? Avoid constant polling with condor_[q|status] Consider job event log, DAGMan post Consider monitoring with condor_gangliad, Fifemon Use selection and projection Bad: condor_status -l | grep Busy Good: condor_status -cons 'Activity=="Busy"' -af Name Custom Print Formats (https://is.gd/jB7m4q )
Tuning and Customization for large scale Kernel tuning Automatically done w/ HTCondor v8.4.x+ Enable Shared Port daemon Automatically done w/ HTCondor v8.5.x+ CCB required to let one schedd to have more than ~25k running jobs "Circuit Breaker" config knobs; we have lots of knobs Schedd: MAX_JOBS_PER_OWNER, MAX_JOBS_PER_SUBMISSION, MAX_JOBS_RUNNING, FILE_TRANSFER_DISK_LOAD_THROTTLE, MAX_CONCURRENT_UPLOADS/DOWNLOADS, … Central Manager: NEGOTIATOR_MAX_TIME_PER_SCHEDD, NEGOTIATOR_MAX_TIME_PER_SUBMITTER Schedd SUBMIT_REQUIREMENTS
Tuning, cont. Improve scalability by disabling unneeded features, e.g. Preemption negotiator_consider_preemption = false Job Ranking of machines negotiator_ignore_job_ranks = true Durable commits in event of power failure condor_fsync = false Improve scalability by enabling experimental features
Questions?