Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMS Week, June 7-11, 20041 CMS Production in Wisconsin Status of recent developments. Dan Bradley Sridhara Dasu Vivek Puttabuddhi Wesley Smith The Condor.

Similar presentations


Presentation on theme: "CMS Week, June 7-11, 20041 CMS Production in Wisconsin Status of recent developments. Dan Bradley Sridhara Dasu Vivek Puttabuddhi Wesley Smith The Condor."— Presentation transcript:

1 CMS Week, June 7-11, 20041 CMS Production in Wisconsin Status of recent developments. Dan Bradley Sridhara Dasu Vivek Puttabuddhi Wesley Smith The Condor Team +

2 CMS Week, June 7-11, 20042 Investigating User Mode Linux Designed a UML job wrapper for CMS Provides “blessed” linux environment on demand. Works transparently with almost any software stack. But there’s no such thing as a free linux 15-20% performance drop for Oscar Host “skas” kernel patch only gains ~3% I/O intensive jobs will be even worse without skas UML only ported to Linux x86 (no Windows) 80 MB tarball + install time, but this is easily cached

3 CMS Week, June 7-11, 20043 Condor Glidein What is it? Provides Condor job management under other batch systems (e.g. on the grid) Matchmaking, checkpointing, job migration, etc. Some improvements & experiments. MDS schema requirements now optional More automation in setup and installation With Vladimir Litvin and Edward Walker testing GridShell as Glidein submission agent. No such thing as a free Condor? Difficult to work across a firewall.

4 CMS Week, June 7-11, 20044 Better Condor Preemption Limitations of claim-based preemption Very flexible policy expressions, but… Not sensitive to job boundaries. Policy: “machine X should never kill job Y within bound Z” interferes with preemption of claims. Fair-sharing problems noticed on Grid 2003. Added claim “retirement” Claim retires on job boundary or limit. Uniform negotiation of retirement preferences and requirements. Works with any preemption policy.

5 CMS Week, June 7-11, 20045 Orphaned Jobs Trouble with orphans on Grid3 Jobs with no Globus jobmanager Affecting all batch systems, but most annoying in Condor because it doesn’t give up unless told to do so. Jobs with missing GASS files try forever. Globus jobmanager for Condor patched Runs job with time-to-live, periodically renewed Automatically halts failing or queued orphans. Jobmanager may still reattach within longer- term garbage collection cycle.

6 CMS Week, June 7-11, 20046 CMS on Grid Laboratories of Wisconsin 1st round of GLOW now commissioned. 300 2.4 GHz zeons 1.2 TB/rack cache 2 more rounds in the pipeline…


Download ppt "CMS Week, June 7-11, 20041 CMS Production in Wisconsin Status of recent developments. Dan Bradley Sridhara Dasu Vivek Puttabuddhi Wesley Smith The Condor."

Similar presentations


Ads by Google