First test of the PoC. Caveats I am not a developer ;) I was also beta tester of Crab3+WMA in 2011; I restarted testing it ~2 weeks ago to have a 1 to.

Slides:



Advertisements
Similar presentations
Strategies for Clear Communication
Advertisements

JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Directorate of Learning Resources Accessing electronic journals from off-campus This causes lots of headaches, but dont despair, heres how to do it! If.
Accessing electronic journals from off- campus This causes lots of headaches, but dont despair, heres how to do it! (Please note – this presentation is.
1 CRAB Tutorial 19/02/2009 CERN F.Fanzago CRAB tutorial 19/02/2009 Marco Calloni CERN – Milano Bicocca Federica Fanzago INFN Padova.
Copyright © 2005 EFT Network, Inc. All Rights Reserved. Automated Recurring Payments Flexible Payment Solution.
Blogs – what, why and how? A blog is a web-log It is a simple website that anyone can setup without any advanced computer know-how It’s the future: blogs,
R.Dubois 12 Jan 2005 Generating MC – User Experience 1/6 GLAST SAS Data Handling Workshop – Pipeline Session Running MC & User Experience Template for.
Student Employment Student Training Note: This is a template that can be utilized to create your own institutional specific Student Employment Student.
How Do I Find a Job to Apply to?
Study Tips for COP 4531 Ashok Srinivasan Computer Science, Florida State University Aim: To suggest learning techniques that will help you do well in this.
AMOD Report Simone Campana CERN IT-ES. Grid Services A very good week for sites – No major issues for T1s and T2s The only one to report is
Practical Web Management Christopher Gutteridge IWMW 2009.
Michael McDonnell Winterstorm Solutions Technical Support Simple Rules for Getting the Help You Need Every.
A SIR web based leave/absence management system. By Dave Doulton University of Southampton.
Web Application Security Testing Automation.. Copyright © 2008 Deloitte Touche Tohmatsu. All rights reserved.1 What types of automated testing are there?
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
The New SIMnet.org with Social Networking User Orientation Notes June 21,
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
1 Instant Data Warehouse Utilities Extended (Again!!) 14/7/ Today I am pleased to announce the publishing of some fantastic new functionality for.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
What makes a good interactive resume? Click for detailed information Multimedia Navigation Communication.
Moving Around in Scratch The Basics… -You do want to have Scratch open as you will be creating a program. -Follow the instructions and if you have questions.
June 12, 2009 Toronto Area SAS Society 1 What’s new in BASE SAS 9.2 Checkpoint/Restart Rupinder Dhillon Dhillon Consulting Inc.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
Software Engineering Chapter 3 CPSC Pascal Brent M. Dingle Texas A&M University.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Creating a Web Site Using 000webhost.com The 000webhost.com Site You will be required to create an account in order to use their host computer 000webhost.com.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Creating a Canvas Account! Follow these simple directions to access the course materials for this year.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
HTML: Images and Links Many Thanks to “Joe,” at index.html, from whom I got these lessons!
© 2008 Sterling Commerce. Confidential and Proprietary. How to Get Along with Project Using Microsoft Project so that it actually works for you, not against.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
1 Experimental Slide Show Using MS PowerPoint “Experimental” means I have no idea if it will work!! Photography is the result of a lot work by BOB LUND.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
Sight Words.
Internet Advancement Ore-Ida Council Boy Scouts of America.
Getting Started with Trilinos October 14, :30-10:30 a.m. Jim Willenbring.
Testing External Survey Automatic Credit Granting Shepherd University Department of Psychology.
Version Control and SVN ECE 297. Why Do We Need Version Control?
Yeah but.. What do I do? Software Leadership Dan Fleck 2007.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Alexis McGillivray All gifs from giphy.com. How Might Your Digital Footprint Effect Your Future Opportunities? You have ticked the Box that says “seen.
22/10/2007Software Week1 Distributed analysis user feedback (I) Carminati Leonardo Universita’ degli Studi e sezione INFN di Milano.
INFSO-RI Enabling Grids for E-sciencE FTS failure handling Gavin McCance Service Challenge technical meeting 21 June.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
By Sam Whisenhunt 11/14/2014. Featured Personalization Login with your personal account to keep up with your w/l ratio and bragging rights!
Work Arbitrage  get paid helping others find work! Zero investment Work from home Immediate start Fast and easy Zero training or investment.
Fab25 User Training Cerium Labs LabCollector - LIMS Lynette Ballast.
Galaxy in Production Nate Coraor Galaxy Team Penn State University.
The internet is a place of both useful and bad information. It has both good and bad side- and it’s all too easy for kids to stray into it. And no parents/guardian.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
Online Matching System
Testing Alfresco extensions (no, it’s not about jUnit)
The Status of Beijing site, and CMS local DBS
Analysis Operations Monitoring Requirements Stefano Belforte
Survey on User’s Computing Experience
Easy-Speak How easy is it?
Presentation transcript:

First test of the PoC

Caveats I am not a developer ;) I was also beta tester of Crab3+WMA in 2011; I restarted testing it ~2 weeks ago to have a 1 to 1 comparison The first 2 weeks of the PoC test were mainly – Finding a problem – Communicating the developers – Getting a new version – Trying again – I simply skip this part, which is ok; I speak about the results after all the fixes

What I tested (with both) A complicated workflow: the official (V)H->bb analysis step1 (see nalysisNewCode#NtupleV42_CMSSW_5_3_3_pat ch2 ) which takes ~2 hours just to compile nalysisNewCode#NtupleV42_CMSSW_5_3_3_pat ch2 – Indeed ISB ~ 45 MB, with 56 user compiled libraries Running on dataset /DoubleElectron/Run2012B- PromptReco-v1/AOD – 40 LS/job -> ~ 1200 jobs, a couple of hours each

Where I tested CRAB3/Panda: test is restricted to few sites (FNAL, Pisa, DESY, …) – The sample is indeed just in FNAL and Pisa among the PoC sites CRAB3/WMA: 8 T2s available, some of poor quality (T2_RU_*) Always used Pisa as storage site

Moreover PoC is not expected to provide full Crab3 functionality, just (as in the I got) – Submit – Resubmit – Kill – Status – Getoutput – Getlog So I stick to these also for Crab3/WMA (i.e. I do not do DBS publication)

Configs from WMCore.Configuration import Configuration import os from datetime import datetime config = Configuration() config.section_("General") config.General.serverUrl = 'poc3test.cern.ch’ config.General.ufccacheUrl = 'cmsweb-testbed.cern.ch’ config.section_("JobType") config.JobType.pluginName = 'Analysis' config.JobType.psetName = 'patData.py’ config.section_("Data") config.Data.inputDataset = '/DoubleElectron/Run2012B- PromptReco-v1/AOD' config.Data.publishDataName = os.path.basename(os.path.abspath('.')) +"_tom" config.Data.lumiMask = 'Lumi.json’ config.Data.publishDbsUrl = " _writer/servlet/DBSServlet" config.Data.splitting = 'LumiBased' config.Data.unitsPerJob = 40 config.section_("User") config.User. = ’’ config.section_("Site") config.Site.storageSite = 'T2_IT_Pisa' from WMCore.Configuration import Configuration import os config = Configuration() config.section_("General") config.General.requestName = 'request_name2' config.General.serverUrl = 'crab3-test.cern.ch' config.General.ufccacheUrl = 'cmsweb.cern.ch' config.section_("JobType") config.JobType.pluginName = 'Analysis' config.JobType.psetName = 'patData.py' config.section_("Data") config.Data.inputDataset = '/DoubleElectron/Run2012B- PromptReco-v1/AOD’ config.Data.splitting = 'LumiBased' config.Data.unitsPerJob = 40 config.Data.lumiMask = 'Lumi.json’ config.section_("User") config.User. = ’’ config.section_("Site") config.Site.storageSite = 'T2_IT_Pisa' Panda WMA

Soon after submit bash-3.2$ crab status -t crab_ _ i Registering user credentials Task name: tboccali_crab_ _113729_121127_ Panda url: &user=Tommaso%20Boccali Details: running 0.78 % (10/1279) activated % (1269/1279) Information per site are not available. Log file is /afs/cern.ch/work/b/boccalio/PoC/CMSSW_5_3_3_patch2/src/VHb bAnalysis/HbbAnalyzer/test/PoCTests/crab_ _113729/cra b.log No information per site, link to monitoring present bash-3.2$ crab status -t crab_request_name2 -i Registering user credentials Task Status: running Using 7 site(s): Jobs Details: submitted % ( running % pending % ) T2_US_Florida: submitted % T2_FR_GRIF_IRFU: submitted % T2_RU_JINR: submitted % T2_UK_London_IC: submitted % T2_FR_GRIF_LLR: submitted % T2_IT_Pisa: submitted % T2_ES_IFCA: submitted % Log file is /afs/cern.ch/work/b/boccalio/PoC/CMSSW_5_3_3_patch2/src/VHb bAnalysis/HbbAnalyzer/test/Crab3Tests/crab_request_name2/crab.log (no link to dashboard?) – one has to find by hand

Few Considerations Let’s start from the obvious: with both systems I reached 100% done, with some “resubmit” (site problems) Feature: with Panda a resubmit is a second task (with a second web page)… Not used to it but not a critical issue (you need just to get used to it)

ASO It worked flawlessly in both cases Nothing more to say I guess … (I did not even need to look into the ASO monitoring) You can get the files before ASO operated (I guess lcg-cp is used, …)

Issues with Panda Kill did not work for me; I understood it was simple timeout to be set to a different threshold, did not check more

Is resubmit working fine? In both cases, it was for me Caveat: the PoC enabled sites are generally good/very good. No chance to test a massive failure scenario

Let’s go straight to the point Up to here executive summary could be: “Limiting the scenario to what the PoC is supposed to allow me to do, PANDA performs at least as well as WMA” (again, this _after_ the two weeks of initial testing)

What is different Panda Monitoring seems by far better than what we are used to

Dashboard/WMA… (as usual)

…Plus WMStats Some debugging info added, but not that much (where is the WN name? where is the LSF id?)

Features we usually do not have All the log (pilots + stderr + stdout) are on the web – All: not only snippets for failed jobs – I guess ph support would love it, instead of asking to upload logs – support can get all the info from WEB, no need to ask the (maybe not too skilled user) – Snippets are not ok in general: a failure can be dependent from a bad Env Variable … cannot be seen from the snippet alone There is link PILOT LSF id ! This I considered lost since we left gLite, and it is a MAJOR help to debug strange problems (like WNs acting as black holes)

Pilot log  WN LSF id

logs (full logs present, not just snippets guessed as interesting by the system) Full logs uploaded to SE

Other features I liked Panda seems user friendly when scheduling jobs: if you submit a task, even if your priority is very low, a few jobs are executed almost immediately, allowing you to spot broken workflows in advance It seems I can resubmit at any time (no need to wait for task in cooloff …) – Is it because ACDC is not in the game? Is there anything we pay for this (side effects I am not aware of?)

Conclusions? As said, functionally both were doing what asked – PANDA does not look at all behind I cannot speak about what is NOT supposed to be in PoC (which is not a small subset) The major differences to me are – Monitoring: way better in PoC with full disclosure of all the info – The early prioritization of some jobs is a lot of help (goes far beyond simple python sanity check) – You seem to be able to resubmit any time – no cool off needed; this potentially cuts the time to process tails