Moving CHTC from RHEL 6 to RHEL 7

Slides:



Advertisements
Similar presentations
2 nd May 2007Condor Week 2007, Madison WI1 Condor Gotchas II John Kewley Daresbury Laboratory e-Science Centre ( this time it's cynical!)
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor's Use of the Cisco Unified Computing System.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
© 2005 Altera Corporation © 2006 Altera Corporation Batch Computing at Altera Condor, Quill and The Enterprise.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
22 nd Oct 2008Euro Condor Week 2008 Barcelona 1 Condor Gotchas III John Kewley STFC Daresbury Laboratory
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Taming Local Users and Remote Clouds with HTCondor at CERN
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Greg Quinn Computer Sciences Department University of Wisconsin-Madison Privilege Separation in Condor.
CHTC Policy and Configuration
Improvements to Configuration
HTCondor Networking Concepts
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
HTCondor Networking Concepts
HTCondor Security Basics
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ATLAS Cloud Operations
Matchmaker Policies: Users and Groups HTCondor Week, Madison 2017
Moving CHTC from RHEL 6 to RHEL 7
High Availability in HTCondor
Building Grids with Condor
Accounting in HTCondor
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Docker with HTCondor at FNAL
Negotiator Policy and Configuration
Accounting, Group Quotas, and User Priorities
Haiyan Meng and Douglas Thain
Semiconductor Manufacturing (and other stuff) with Condor
HTCondor Security Basics HTCondor Week, Madison 2016
Job Matching, Handling, and Other HTCondor Features
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
Upgrading Condor Best Practices
Introduction to High Throughput Computing and HTCondor
The Condor JobRouter.
Condor: Firewall Mirroring
Condor-G Making Condor Grid Enabled
GLOW A Campus Grid within OSG
Negotiator Policy and Configuration
Securing HTCondor Flocking
Job Submission Via File Transfer
Condor-G: An Update.
Presentation transcript:

Moving CHTC from RHEL 6 to RHEL 7 Greg Thain HTCondor Week 2017

Migration without Migraines

Actually from SL  CentOS, not RHEL Need ecryptfs for encrypted sandbox (Actually CentOSplus…) executable = calculate.exe encrypt_execute_directory = true queue

“EL”

Today’s CHTC Users OS %age of CHTC users Require EL 7 5% Either EL 6 or 7 90% ???? Require EL 6 5% ????

Who are EL 7 only users?

Who are EL 6 only users? Standard universe jobs that have started

Tomorrow’s CHTC Users OS %age of CHTC users Require EL 7 Was 5%, going ↑ Either EL 6 or 7 Was 90%, going ? Require EL 6 Was 5%, going ↓

Observation OS %age of CHTC users Require EL 7 Was 5%, going ↑ Either EL 6 or 7 Was 90%, going ? Require EL 6 Was 5%, going ↓

Moving CHTC machines -> RHEL 7

Moving CHTC machines -> CentOS7

Easy – not the focus of this talk Moving CHTC machines -> CentOS7 Easy – not the focus of this talk

Moving CHTC machines -> CentOS7 jobs

Moving CHTC jobs -> CentOS7 users

I know, I’ll deploy containers! Containers as an Infrastructure, or as jobs? CaaI or CaJ? Containers as job require user work to setup Tricky for GPU jobs GOAL: minimize user work!

The Time is now! CHTC formed a team in Nov 2016 Condor + Ops + RCF + OSG Controlled transition with minimum user pain

Plan on a slide Before: Every job gets EL 6 Phase 1: Flag Day! Roll out some # of EL 7 machines, but Default to EL6, users can opt into EL7 or BOTH Communicate to EVERY user Flag Day! Default to EL7, users can opt into EL6 or BOTH

CHTC Pool CHTC execute machines jobs CHTC submit machines foreign execute machines (flocking, OSG glidein, etc.) Foreign submit machines

CHTC Pool CHTC execute machines Want all configuration changes in our Schedds – it’s the point we control the most! jobs CHTC submit machines foreign execute machines (flocking, OSG glidein, etc.) Foreign submit machines

Attributes: Useful and Otherwise Find the EL attribute? How to force a job to land here? OpSys = "LINUX" OpSysAndVer = "CentOS7" OpSysLongName = "CentOS Linux release 7.3.1611 (Core)" OpSysMajorVer = 7 OpSysName = "CentOS" OpSysShortName = "CentOS" OpSysVer = 703

Forcing a job to EL 7 Requirements = OpSysMajorVer == 7 Submit file OpSys = "LINUX" OpSysAndVer = "CentOS7" OpSysLongName = "CentOS Linux release 7.3.1611 (Core)" OpSysMajorVer = 7 OpSysName = "CentOS" OpSysShortName = "CentOS" OpSysVer = 703 Requirements = OpSysMajorVer == 7 Submit file Works with no startd changes – existing glideins, etc.!

Forcing a job to either Requirements = (OpSysMajorVer == 6) OpSys = "LINUX" OpSysAndVer = "CentOS7" OpSysLongName = "CentOS Linux release 7.3.1611 (Core)" OpSysMajorVer = 7 OpSysName = "CentOS" OpSysShortName = "CentOS" OpSysVer = 703 Requirements = (OpSysMajorVer == 6) || (OpSysMajorVer == 7) Submit file

Pop Quiz: What about default? OpSys = "LINUX" OpSysAndVer = "CentOS7" OpSysLongName = "CentOS Linux release 7.3.1611 (Core)" OpSysMajorVer = 7 OpSysName = "CentOS" OpSysShortName = "CentOS" OpSysVer = 703 Requirements = nothing said Submit file

Hint: APPEND_REQUIREMENTS won’t work OpSys = "LINUX" OpSysAndVer = "CentOS7" OpSysLongName = "CentOS Linux release 7.3.1611 (Core)" OpSysMajorVer = 7 OpSysName = "CentOS" OpSysShortName = "CentOS" OpSysVer = 703 APPEND_REQUIREMENTS is unconditional!

Schedd xforms to the Rescue! # https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=OsMigrationHints JOB_TRANSFORM_NAMES = EL, EL_VER JOB_TRANSFORM_EL @= end REQUIREMENTS JobUniverse == 5 && Regexp("OpSysMajorVer",UnParse(Requirements),"i") =?= false && Regexp("\"WINDOWS\"", UnParse(Requirements)) =?= false SET Requirements (Target.OpSysMajorVer == 6) && $(MY.Requirements) @end JOB_TRANSFORM_EL_VER @= end [ eval_set_WantELVer = isError(int(Regexps("OpSysMajorVer[[:space:]]*=[?]?=[[:space:]]*([0-9])", UnParse(Requirements), "\\\\1", "i"))) ? 0 : int(Regexps("OpSysMajorVer[[:space:]]*=[?]?=[[:space:]]*([0-9])", UnParse(Requirements), "\\\\1", "i")); ]

Regexp("OpSysMajorVer",UnParse(Requirements),"i")

Engineering vs. Science Requirements = blah blah blah Requirements = (Target.OpSysMajorVer == 6) && blah blah blah

Engineering vs. Science Requirements = OpSysMajorVer == 7 && blah Requirements = OpSysMajorVer == 7 && blah

Current Status ~ 10 % of CHTC pool running CentOS 7 Some users invited to try CentOS 7 Every one successfully running EL 6 binaries on CentOS 7 with no changes! No transform surprises!

Future work What about OSG? Keep working with users How to ask for EL6 vs EL7? Keep working with users

Summary Users need RHEL 7 sooner than you may think With some work, transition may be less painful for them than you may think

Avoid User Migraines, not ours!