Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)

Similar presentations


Presentation on theme: "Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)"— Presentation transcript:

1 Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Condor CERN Operating a glideinWMS frontend by Igor Sfiligoi (UCSD) CERN Feb 14th Frontend operations

2 Overview Refresher – What is a VO frontend Setup and configuration
Troubleshooting CERN Feb 14th Frontend operations

3 Refresher – VO frontend
The VO frontend is the glideinWMS matchmaker Deciding where glideins are submitted Submit node Frontend node Worker node Monitor Condor Submit node Frontend glidein Central manager Startd Match CREAM Job Request glideins Factory node Condor Execution node glidein Globus Execution node glidein Factory Submit glideins CERN Feb 14th Frontend operations

4 Setup and configuration
Frontend operations Setup and configuration CERN Feb 14th Frontend operations

5 Setup and configuration
glideinWMS comes with an installer Actually two – command line and RPM Will help you install all the needed software and do most config You will still likely want to tweak the frontend config It is in XML format CERN Feb 14th Frontend operations

6 The Condor part The Condor components use standard config
But all communication security is GSI based Will need x509 certs and proper security configuration (as explained this morning) If used glideinWMS installer you should be already all set (assuming success) Submit node Submit node Submit node Central manager Schedd Collector Negotiator CERN Feb 14th Frontend operations

7 Frontend security VO frontend needs to talk to the factory
Mutual authentication, x509 based Both parties put respective DNs into mapfiles Frontend thus needs a x509 proxy Factory collector puts special attribute into every ClassAd AuthenticatedIdentity Used for app. level authorization Factory node Frontend node Condor Frontend Factory CERN Feb 14th Frontend operations

8 2nd level authorization
Frontend also has a “security name” Agreed upon string with the factory X509 mapping + security name whitelisted by factory Frontend admin needs to configure this <security classad_proxy="proxy_path" proxy_DN="blah" security_name="whatever"/> <factory><collectors> <collector node="something" DN="blah" </collectors> </factory> CERN Feb 14th Frontend operations

9 Proxy for glidein submission
The frontend needs a second proxy For glidein submission Will be delegated to the factory Sharing the same proxy for both tasks is possible, but may make debugging more confusing <security><proxies> <proxy absfname=”abspath”> </proxies></security> Factory node Execution node glidein Condor Frontend node Globus Execution node glidein Factory Frontend CERN Feb 14th Frontend operations

10 Matchmaking Deciding where to submit glideins is the major task a frontend performs What to match on decided at frontend level Not by the users May not be ideal, but this is how it is Admin must understand user needs and decide how matchmaking should be done CERN Feb 14th Frontend operations

11 Selecting the factory entries
When using a shared factory, a frontend likely can use only a subset of entries Not all Grid sites support all VOs First level filter can be applied as a Condor constraint expression Both for entries and jobs <match> <factory query_expr='stringListMember("CMS",GLIDEIN_Supported_VOs)'/> <job query_expr='(JobUniverse==5)'/> </match> CERN Feb 14th Frontend operations

12 Matching the attributes
The matchmaking expression uses Python syntax Against attributes obtained from factory/entries and user schedds Admin requested to list all attributes that will be used <match match_expr='glidein["attrs"]["GLIDEIN_SE"] in job["DESIRED_SEs"].split(",")))'> <factory><match_attrs> <match_attr name="GLIDEIN_SE" type="string"/> </match_attrs></factory> <job><match_attrs> <match_attr name="DESIRED_SEs" type="string"/> </match_attrs></job> </match> CERN Feb 14th Frontend operations

13 Matching by users Users are expected to follow the convention
But no way to force them Although one could set reasonable defaults at system level Universe = Vanilla Requirements = stringListMember(GLIDEIN_SE,DESIRED_SEs) +DESIRED_SEs=”blah1,blah2” Submit node Worker node Monitor Condor Submit node Frontend Match glidein Central manager Startd Match Factory Job CERN Feb 14th Frontend operations

14 Glidein customization
Glideins can be used as-is out of the box But a VO admin can customize any Condor knob Using frontend attributes Some attributes reserved (see manual) Start, rank, glexec glideinWMS-specific alternatives exist <attr name="USE_MATCH_AUTH" type="string" value="True" glidein_publish="False" job_publish="False"/> CERN Feb 14th Frontend operations

15 Complex customization
If more than a simple value is needed, the VO admin can provide code to be executed Can be used for validation (script fails == glidein will abort) Can be used to dynamically configure the startd (e.g. discover the installed CMS software and publish the version list) <file absfname="path_to_script" executable="True" wrapper="False" after_entry="True" after_group="False"/> CERN Feb 14th Frontend operations

16 Takes some time to get used to it
A note about reconfigs Frontend has init.d-like maintenance script ./frontend_startup start|stop|reconfig The config file editing unconventional Cannot edit the master config file (frontend.xml) Must edit a copy of it, typically in ../instance_bla.cfg/frontend.xml Then tell reconfig where is the copy ./frontend_startup reconfig ../instance_bla.cfg/frontend.xml Takes some time to get used to it CERN Feb 14th Frontend operations

17 Monitoring and troubleshooting
Frontend operations Monitoring and troubleshooting CERN Feb 14th Frontend operations

18 Monitoring - Condor The Condor pool can be monitored using standard Condor tools condor_q, condor_status Logs Frontend provides a historical view in Web form CERN Feb 14th Frontend operations

19 Monitoring - Frontend Frontend has extensive logs
Can get a good idea what's going on there No graphics/Web interface yet CERN Feb 14th Frontend operations

20 Troubleshooting - Condor
Standard Condor approach (see prev. talk) Glidein-induced problems: Firewall issues (jobs matching but not starting) Security issues (as above) Unexpected preemptions (glideins being killed on Grid resources) Unused glideins (matchmaking mismatch between frontend and user requirements) Jobs never starting (as above) Still same techniques apply (just be aware of it) CERN Feb 14th Frontend operations

21 Troubleshooting - frontend
Most problems due to mis-matching Getting it right may be tricky And no tools to really help you Even if mostly hidden, still see Grid problems Glideins never registering, early termination, ... Need collaboration with factory admins and Grid sites (more details in the factory talk) CERN Feb 14th Frontend operations

22 Frontend operations And the summary CERN Feb 14th Frontend operations

23 Summary Initial setup typically the most complex task
Getting the security right Deciding the matching strategy Customizing the glideins Day-to-day operations similar to dedicated Condor pool But can be more challenging due to two level matchmaking Still see some Grid-related problems But can be mostly offloaded to factory admins CERN Feb 14th Frontend operations

24 Pointers The official project Web page is CMS frontend at UCSD glideinWMS development team is reachable at CERN Feb 14th Frontend operations


Download ppt "Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)"

Similar presentations


Ads by Google