OSG Operations All Hands Meeting Rob Quick (Ops Coordinator) Slides by: Scott Teige and Kyle Gross
March 2011 Support Overview Communications Hub Coordinate Ticketing & Exchanges End-user Support OSG RA Documentation 2
March 2011 Communications Hub 24x7 Telephone – x7 – 24x7 Ticket Creation Leverage the 24 hour coverage of the GRNOC at IU Community Notification Tools Blogspot postings, twitter and RSS feed Twitter: OSGGOC (test) Weekly Operations Meeting Mondays 3
March 2011 Ticketing & Ticket Exchange Central OSG Ticket System GOCTicket interface Ticket Exchange – SC, GGUS, GOC-TX 10,000 ticket milestone – 2/22/2011 4
March 2011 End User Support OIM Registration VOMS (MIS, OSGEDU, CSIU) Certificate Requests Twiki Support 5
March 2011 OSG RA Alain Deximo as new OSG RA Updating Procedures/Docs for effective backup Other than new POC (Alain), transparent to users 6
March 2011 Documentation Work with OSG Documentation Team Help them with Twiki setup Cleaning up Operations Docs 7
March 2011 Service Overview Information Services Information to people Information to machines Accounting Services Monitoring Services Collaborative Services
March 2011 MyOSG
March 2011 Display
March 2011 OIM Open Science Grid Information Management Semi-static information to people and machines Find contacts, VO information, resources, much more
March 2011 BDII Berkeley Database Information Interface Mostly provides information to machines Most critical service for GOC Dynamic information, ~2 minute period Many services depend on BDII Some information to people
March 2011 Ticket Don’t get stuck, cut a ticket Ticket Exchange GOC ticketing system interacts with other support organization ticket systems via the ticket exchange. Allows seamless interaction of multiple ticket systems, seem to behave as one system.
March 2011 RSV Resource and Service Validation
March 2011 WLCG Comparison A accounting service Some OSG resources are also WLCG resources Separate accounting systems
March 2011 Software Cache Pointers to VDT software Certificate Authority Distribution VO package Certificate requests
March 2011 xxx-ITB Ditto above but for testing 1 st and 3 rd Tuesdays updates to ITB You are encouraged to test services, particularly those of interest to you 2 nd and 4 th Tuesdays updates to Prod. 5 th Tuesday, the GOC rests. 17
March 2011 Change Management and Ops Meetings Change Management Review Tuesdays ngeMgmtMeetingMinutes 18
March 2011 Recap from the Ops Coordinator 15 Minutes Sustainability “Yet, in spite of these spectacular strides in science and technology, and still unlimited ones to come, something basic is missing… We have learned to fly the air like birds and swim the sea like fish, but we have not learned the simple art of living together as brothers.” -MLK 19
Three things you’ve just gotta know about the VDT (And Frank) Alain Roy Open Science Grid Software Coordinator
March 2011 But first a poem 21 I have a flower on my head By Andrea Roy I have a Flower on my head What should I do? Should I water it? I think so.
March 2011 The three things you just gotta know about the VDT 1.RSV is way cooler 2.RPMs for the VDT are on the way 3.CREAM is coming to the VDT soon 22
March RSV is way cooler As of February 7th, OSG , RSV is just so much cooler for two main reasons: 1.Common RSV tasks are made simple with the new rsv-control command. 2.It is really easy to extend RSV with new probes If you can write a script to test something, you can put it into RSV. Is there something else you’d like to test? 3.Standalone installations are much easier (with config.ini) 23
March 2011 Easy to list your RSV probes! % rsv-control --list Metrics enabled for host: osg-edu.cs.wisc.edu:10443 | Service org.osg.srm.srmcp-readwrite | OSG-SRM org.osg.srm.srmping | OSG-SRM Metrics enabled for host: osg-edu.cs.wisc.edu | Service org.osg.batch.jobmanager-default-status | OSG-CE org.osg.batch.jobmanagers-available | OSG-CE org.osg.certificates.cacert-expiry | OSG-CE... 24
March 2011 Easy to see the RSV jobs! 25 % rsv-control --job-list Hostname: osg-edu.cs.wisc.edu ID OWNER ST NEXT RUN TIME METRIC rsv I :08 org.osg.globus.gridftp-simple rsv I :32 org.osg.gip.lastrun rsv R :47 org.osg.general.vdt-version... Hostname: osg-edu.cs.wisc.edu:10443 ID OWNER ST NEXT RUN TIME METRIC rsv I :33 org.osg.srm.srmping rsv R :28 org.osg.srm.srmcp-readwrite ID OWNER ST CONSUMER rsv R html-consumer rsv R gratia-consumer
March 2011 Easy to enable/disable RSV probes! 26 % rsv-control --enable --host osg-edu.cs.wisc.edu \ org.osg.ress.ress-classad-exists Enabling metric 'classad-exists' for host 'osg-edu.cs.wisc.edu' One or more metrics have been enabled and will be started the next time RSV is started. To turn them on immediately run 'rsv-control --on'.
March 2011 Easy to run a probe right now! 27 % rsv-control --run --host osg-edu.cs.wisc.edu org.osg.general.osg-version Running metric org.osg.general.osg-version: metricName: org.osg.general.osg-version metricType: status timestamp: :24:42 CST metricStatus: OK serviceType: OSG-CE serviceURI: osg-edu.cs.wisc.edu gatheredAt: osg-edu.cs.wisc.edu summaryData: OK detailsData: OSG EOT
March 2011 Easy to run all probes to refresh 28 % rsv-control --run –all-enabled Running metric org.osg.certificates.cacert-expiry (1 of 24) metricName: org.osg.certificates.cacert-expiry metricType: status timestamp: :40:40 CST metricStatus: OK serviceType: OSG-CE serviceURI: osg-edu.cs.wisc.edu gatheredAt: osg-edu.cs.wisc.edu summaryData: OK detailsData: Security Probe Version: 1.1 OK: CAs are in sync with OSG distribution EOT Running metric org.osg.general.osg-directories-CE-permissions (2 of 24)...
March 2011 Straightforward to get debugging info 29 % rsv-control --verify Testing if Condor-Cron is running... OK Testing if metrics are running... OK (24 running metrics) Testing if consumers are running... OK (2 running consumers) Checking which consumers are configured... The following consumers are enabled: html-consumer gratia-consumer % rsv-control --profile Running the rsv-profiler... OSG-RSV Profiler Analyzing... Making tarball (rsv- profiler.tar.gz)
March 2011 And now a slight detour: Frank Frank [ last-name removed ] Wrote some code for Condor that “worked”. But he meant: Works == Compiles A common mistake for beginners, so we won’t hold it against him. But it’s a useful indication of progress: A lot has been done, but it requires more before you can test it. 30
March RPMs for the VDT are on the way We have franked binary RPMs without configuration for: gLexec (Actually, they’ve been tested pretty well) Xrootd 95% of the worker node (56/59 RPMs) Currently missing: FTS client They are in a yum repo, will be available for testing soon. 31
March CREAM is coming to the VDT soon Basic CREAM install via Pacman Currently franks, but known problems End of March CREAM install via RPMs End of April And then a period of testing/finalizing Ready for production by September Timeline driven by ATLAS needs 32
March 2011 I’m happy if you leave with those three things 1.RSV is way cooler 2.RPMs for the VDT are on the way 3.CREAM is coming to the VDT soon But I’ll say a two more things: 33
March 2011 Two More Things Plan for next round of OSG: Do RPMs right: source packages, intermix with external dependencies neatly… Community-oriented distributions We are getting better about collecting accurate requirements and reporting work plans/time lines 34
March 2011 But wait! There’s more! The Second Annual OSG Summer School! June 26-30, 2011 Learn about high-throughput computing, OSG, and more! Tell anyone that would be interested, spread the word! ucation/OSGSummerSchool
March 2011 Any Questions? I’m here until Thursday—please come and talk to me. Or me: 36