USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Websydian products.
HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Mentoring A Younger Chemists’ Guide to a Career Essential.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Group F Reflections Guide (pg 82)
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
#RefreshCache CI - Daily Builds w/Jenkins – an Open Source Continuous Integration Server Nick Airdo Community Developer Advocate Central Christian Church.
Increasing Parent Involvement
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Matlab, R and Other Jobs in CHTC. chtc.cs.wisc.edu No suitable R No Matlab runtime Missing shared libraries Missing compilers … Running On Bare Bones.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
FINAL DEMO Apollo Crew, group 3 T SW Development Project.
by Marc Comeau. About A Webmaster Developing a website goes far beyond understanding underlying technologies Determine your requirements.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Plan Design Analyze Develop Test Implement Maintain Systems Development Life Cycle MAT Dirtbikes.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
Study for Migration from CVS to SubVersion (SVN) Gunter Folger CERN/PH/SFT.
WOW: WORKING ON THE WORK Tricks, Tips, and Best Practices Doug Curtright and Shane Miller.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
SOCIAL MEDIA FINAL PRESENTATION. PROJECT SUMMARY Our job was to making a working social stream that incorporated all social medias for FSU, FSU CCI, and.
March 11, 2008 USCMS Tier-2 Workshop Oh Dear God Alain made a PowerPoint presentation 1.
© 2008 Sterling Commerce. Confidential and Proprietary. How to Get Along with Project Using Microsoft Project so that it actually works for you, not against.
Grid Security Vulnerability Group Linda Cornwall, GDB, CERN 7 th September 2005
Online Simulation Creation Wizard Introduction to Project
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Proposal for a Global Network for Beam Instrumentation [BIGNET] BI Group Meeting – 08/06/2012 J-J Gras CERN-BE-BI.
1 CMS Software Installation, Bockjoo Kim, 23 Oct. 2008, T3 Workshop, Fermilab CMS Commissioning and First Data Stan Durkin The Ohio State University for.
Meeting the Grading Criteria Evaluation asks the question 'Is this the best way of doing it?'
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
INFSO-RI ETICS Local Setup Experiences A Case Study for Installation at Customers Location 4th. All Hands MeetingUwe Müller-Wilm VEGA Bologna, Nov.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
The Great Migration: From Pacman to RPMs Alain Roy OSG Software Coordinator.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
Workload Management Workpackage
Kevin Thaddeus Flood University of Wisconsin
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
CMS OSG Motivation and Introduction Overview
New monitoring applications in the dashboard
CREAM-CE/HTCondor site
Analysis Operations Monitoring Requirements Stefano Belforte
Discussions on group meeting
X in [Integration, Delivery, Deployment]
Using networks to be more effective
Experience with the process automation at SORS
Geant4 Documentation Geant4 Workshop 4 October 2002 Dennis Wright
Presentation transcript:

USCMS T2 Site Admin Toolkit Samir Cury MTF Meeting – May 26 th, 2011

How it began OSG All Hand Meeting 2010 Fermilab Yearly T2 Workshop Gathering of site admins A lot of ideas/comments Some code – Scripts

About site admins Frontline of site management They have in a Daily basis : Many requests Many issues Many workarounds – What happen with these? Relevant feedback for CMS Leak of features in existing software Leak of monitoring in existing systems May lead to Blindly operating it Is there always someone to listen? Thanks Monitoring Task Force!

Workarounds From the past slide, this toolkit is all about that. Not always complaining is the best way It may never be implemented Not everyone will see the benefits/cost Different needs Not always developers think about all user/ops needs Scripts are done to cover these needs These scripts can give a different approach to the ops Monitoring tools focused in admin's needs. Can improve response time / error/waste detection » Example – GridFTP Spy » JobView / CPU Efficiency on T1's Not essential, but normally saves some time.

The goal What is really missing – Official place for unofficial code – People get encouraged to share Call for tools Get the generic ones –> package into RPM Get the specific ones Turn into generic, then package into RPM Standard place (repository) Standard deploy procedure If it's not quick, no one tries. → RPM's Helping us to help ourselves.

What it is Full documentation/reference available : Where we document each tool included in the toolkit, future plans, etc. A gathering of scripts, that may need some work to get it working We also try to avoid that by having RPMs and all dependencies included – packages or in the repos. A free-time-task for every involved person We normally don't have schedules, but a plan. Shameless “coders” - that's what we need! We don't care how “bad written” it is, as long as it works

What certainly is not Something that is maintained by a lot of people But some that contribute with tools A dependency-solver / packager (me) Would appreciate some help Something that will solve all the problems That is not the goal, just to put together specific tools Something that has “professional quality” Involved people are very capable, but proportionaly time-constrained

What we can learn “Sites” can also generate some useful code They probably will do it for themselves, so don't expect High quality code Something that has not a lot of dependencies Expect Tools that you can adapt for your site with little effort To contribute and make it better instead of complaining “Sites” should be shameless enough to publish (and send us) tools they find useful. Ken bloom gave me space for a contribution on a USCMS T2 support meeting so I could present the proposal, then, some tools showed up. (Thanks, Ken!) T2 Coordinators could inform us when they see something useful in their support meetings, and also remind these sites that the toolkit is there

What I did learn Since getting the script until the RPM gives more work than I thought – many details, dependencies, etc... We will live better if we have a step before this : Toolkit People can download/edit from there, and is a shortcut for the ones that really want to spend some time understanding and deploying the tools that still don't have the RPM. It helped me to patch Stale Data improving the CLI

Tools we have right now CondorView (Caltech) - RPM ready GridFTP Spy (Caltech) – RPM ready Condor4Web (UERJ) - RPM ready Stale Data (Nebraska) – tested, needs packaging Condor Extract Mail (Nebraska) – to be tested Dcache tools (Wisconsin) – to be tested Your tool here

CondorView GUI for managing condor List every single job Can list ALL classAds for a given job Can do what you see in the menu Run from the cluster frontend Have the ability of SSH to the node, exactly into the running job temp dir Run from the site's CE Have the ability of killing/releasing/restart jobs

GridFTP Spy Shows in near real time active GridFTP transfers Very useful for link usage / server settings optimizing Somewhat tricky to deploy Needs a shared FS for harvesting logs How it does is reading the logs in real time and gathering interesting info Never tested it myself – testers are welcome!

Condor4web Real time batch system monitoring Visible from any corner of the world Your users like it They know what's going on with their jobs, after the CE MC People like it For the same reason. Live demos : If you don't use Condor, try JobView : isOpsT2Monitoring

Stale Data Looks like the (un)popularity data service Shows which datasets people didn't run a single job against Tested. Works fine, has a lot of dependencies which should be included in the RPM date = , Starting Date = Getting json Datasets idle since /JetMET/Run2010A-Dec4ReReco_v1/AOD, GB, Owned by AnalysisOps /G2Jets_Pt-20to60_TuneZ2_7TeV-alpgen/Fall10-START38_V12-v1/AODSIM, GB, Owned by top /W2Jets_ptW-0to100_TuneZ2_7TeV-alpgen-tauola/Fall10-START38_V12-v1/GEN, GB, Owned by DataOps /QCD6Jets_Pt120to280-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO, GB, Owned by top /W1Jets_ptW-800to1600_TuneD6T_7TeV-alpgen-tauola/Fall10-START38_V12-v1/AODSIM, GB, Owned by top (Suppressed) Space taken by stale datasets = TB Broken down by group: tracker-dpg => top => AnalysisOps => undef => FacOps => b-tagging => local => DataOps =>

“Condor Extract Mail” Fetches from grid proxies in your CE's, mails from the users running jobs in your cluster ~]# ~bbockelm/extract_ "Bockelman"

What CMS can profit Better than the code, the ideas Usability – you may find here potential features for existing real software Adapt ideas or tools that diserve to CMS central monitoring like cmsweb Gives an overview of site admin needs and what they would like to see in the software they use. Some become patches – like Brian Bockelman's script The model / idea of a free software community is a good example to follow – Small patches from many people turn small things into great ones. Share!

Thanks all involved Ken Bloom, Michael Thomas – Initial effort to set up and make everything public Authors that submitted tools : Caltech – Michael Thomas CondorView GridFTP Spy Nebraska – Carl Lundsted and Brian Bockelman Condor Extract Mail Stale Data Wisconsin - Will dCache Tools UERJ – Samir Condor4Web

Feel free to send : Tools Suggestions Help But first, we recommend some (small) reading here :

For the future 2 Trainees interested in help UERJ Migrate YUM Repos to CERN webservers Finish testing/package tools we already have.

Contacts

Recommended toolkit

Thanks!