22/10/2007Software Week1 Distributed analysis user feedback (I) Carminati Leonardo Universita’ degli Studi e sezione INFN di Milano.

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
User studies. Why user studies? How do we know security and privacy solutions are really usable? Have to observe users! –you may be surprised by what.
1 Validation and Verification of Simulation Models.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
LAYING OUT THE FOUNDATIONS. OUTLINE Analyze the project from a technical point of view Analyze and choose the architecture for your application Decide.
FZU participation in the Tier0 test CERN August 3, 2006.
By Emily Thielke.  When I first started schooling to be a Veterinary Technician I wasn’t aware just how much I would be dealing with people, instead.
Process Design (Requirements). Recall the Four Steps of Problem Solving * Orient Plan Execute Test These apply to any kind of problem, not just spreadsheet.
Effective Accident Investigation. “Effective Accident Investigation” HSE Inspectors receive approximately 5 solid weeks of classroom training purely on.
1 Instant Data Warehouse Utilities Extended (Again!!) 14/7/ Today I am pleased to announce the publishing of some fantastic new functionality for.
Unit 1 – Improving Productivity Hayden Norris. 1.2What skills did you need to learn in order to use PowerPoint? Copy and paste pictures Access the internet.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
An Anecdote An artist friend subscribes to AOL for and web services. Recently AOL attempted to make an automatic on- line upgrade to her address.
MENTORING ACCORDING TO THE PRACTICE OF CENTRAS Constantza Mamaia 2- 3 June 2011.
1 3132/3192 User Accessibility © University of Stirling /3192 User Accessibility 2.
Viruses Hackers Backups Stuxnet Portfolio Computer viruses are small programs or scripts that can negatively affect the health of your computer. A.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Nexthink V5 Demo ITSM – Slow Computer. Situaiton › How from a problem reported can I take smart decision to reduce overall global problem in my environment.
Nurcan Ozturk University of Texas at Arlington US ATLAS Transparent Distributed Facility Workshop University of North Carolina - March 4, 2008 A Distributed.
End-To-End Arguments in System Design J.H. Saltzer, D.P. Reed, and D. Clark Presented by: Amit Mondal.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
A GRID solution for Gravitational Waves Signal Analysis from Coalescing Binaries: preliminary algorithms and tests F. Acernese 1,2, F. Barone 2,3, R. De.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
(1) Test Driven Development Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu.
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
Optimization Written by: Tim Keyser Georgia CTAE Resource Network 2010.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
By: WenHao Wu. A current situation that I have is that I cannot decide if a computer career is for me. I am considering any career in computers, but I.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Fault Tolerance (2). Topics r Reliable Group Communication.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
Search Control.. Planning is really really hard –Theoretically, practically But people seem ok at it What to do…. –Abstraction –Find “easy” classes of.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
First test of the PoC. Caveats I am not a developer ;) I was also beta tester of Crab3+WMA in 2011; I restarted testing it ~2 weeks ago to have a 1 to.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
5 Reasons Your Website Doesn’t Convert. Website Not Converting? Don’t Worry You invest time and energy to get on Page 1 of Google You have great keywords.
Your Thoughts Objectives: * Understand that we are responsible for our own thoughts. * Define and understand the concept of automatic thoughts.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Using the AFRESH software Ruaraidh Dobson University of Aberdeen V1.0 27/04/2016.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.
Pick Ups & Job Management
L’analisi in LHCb Angelo Carbone INFN Bologna
Tier3(g,w) meeting at ANL ASC
L’analisi dei dati ad ATLAS
David Adams Brookhaven National Laboratory September 28, 2006
C++ coding standard suggestion… Separate reasoning from action, in every block. Hi, this talk is to suggest a rule (or guideline) to simplify C++ code.
A full demonstration based on a “real” analysis scenario
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
An introduction to the ATLAS Computing Model Alessandro De Salvo
CSCE 489- Problem Solving Programming Strategies Spring 2018
Significance Tests: The Basics
Arrays.
Presentation transcript:

22/10/2007Software Week1 Distributed analysis user feedback (I) Carminati Leonardo Universita’ degli Studi e sezione INFN di Milano

22/10/2007Software Week2 Distributed analysis: the user dream DPD Distributed analysis AOD

22/10/2007Software Week3 Distributed analysis with GANGA  A fair report on the ganga performance is complicated because it strongly depends :  Data distribution: AODs are not completely duplicated to all T1 as they should be (incomplete datasets)  Sites configuration: jobs fail in some sites for several (local) reasons  The discriminating variable is the COMPLETENESS of a dataset:  If a dataset is COMPLETE somewhere then ganga works perfectly: the jobs are sent correctly to sites which have a complete replica of the dataset.  The problem of ‘bad sites’ (= sites on which my jobs fail) is clearly an issue for the users:  Often the jobs fail because the site on which they are executed is not properly configured: this doesn’t depend (directly) on ganga  A less ambitious approach: restricting the access to a minimal list of GOOD sites on which I’m sure my jobs will run (automatic procedure?) would reduce a lot the failure rate and enhance the clients satisfaction  The possibility to define a ‘black list’ is included in the latest ganga release

22/10/2007Software Week4 Distributed analysis with ganga: current situation If a dataset is INCOMPLETE everywhere then you might run into problems: the current picture is probably not optimal from the user point of view: Suppose you have a dataset of 30 files which is incomplete everywhere In this configuration you will get 10 output files: Files from 1-5 will be lost because they are missing everywhere Files from will be lost because the job assignment to different sites A subsequent submission might give different results Subjob 1: files from 1-10 Subjob 2: files from Subjob 3: files from Site A: incomplete dataset files 5-15 Site B: incomplete dataset files Site C: incomplete dataset files 15-20

22/10/2007Software Week5 Distributed analysis with ganga: current situation  For the moment it’s not safe to let ganga decide for you: better to select the site with the maximum amount of files and send jobs there.  So here things start to be complicated (at least from a common user point of view) and users tend to become nervous..  A user has to do some operations which are not always simple:  Find out where the files are and select the site with the largest number of files  Be sure that the selected site is a good one.  send jobs there  A good example is the AsymFilter gamma jet sample 6379: ~4M events  Four tids available and incomplete at all lcg sites (except for tid which for dq2 is complete in desy)

22/10/2007Software Week6 Ask ganga where the files are: d.list_locations_num_files(‘... ‘) trig1_misal1_mc PythiaPhotonJet_AsymJetFilter.recon.AOD.v _tid {'CERNPROD': 10, 'DESY-HH': 0, 'NIKHEF': 2236, 'CPPM': 0, 'RALDISK': 0, 'CNAFDISK': 171, 'SACLAY': 0, 'ASGCDISK': 3931, 'AU-UNIMELB': 349, 'TORON': 1084, 'NIPNE_02': 0, 'TOKYO': 0, 'AGLT2': 0, 'NDGFT1DISK': 0, 'TRIUMFDISK': 3931, 'BNLDISK': 0, 'LAL': 0, 'BEIJING': 0, 'WUP': 3927, 'LAPP': 0, 'LYONDISK': 0, 'FZKDISK': 3928, 'LPNHE': 0, 'CERNCAF': 1120}trig1_misal1_mc PythiaPhotonJet_AsymJetFilter.recon.AOD.v _tid {'CERNPROD': 82, 'DESY-HH': 0, 'ASGCDISK': 549, 'RALDISK': 0, 'TW-FTT': 0, 'NAPOLI': 124, 'NIKHEF': 835, 'CNAFDISK': 2654, 'DESY-ZN': 2543, 'LNF': 151, 'CYF': 1756, 'TOKYO': 20, 'AGLT2': 0, 'NDGFT1DISK': 0, 'TRIUMFDISK': 2610, 'BNLDISK': 0, 'PICDISK': 2157, 'WUP': 1900, 'MILANO': 118, 'ROMA1': 95, 'LYONDISK': 170, 'FZKDISK': 1580, 'CERNCAF': 520}trig1_misal1_mc PythiaPhotonJet_AsymJetFilter.recon.AOD.v _tid {'MWT2_IU': 0, 'CNAFDISK': 0, 'TW-FTT': 0, 'TOKYO': 232, 'AGLT2': 0, 'NIKHEF': 161, 'FZU': 0, 'LYONDISK': 126, 'PICDISK': 1808, 'UTA_SWT2': 0, 'RALDISK': 0, 'FZKDISK': 1592, 'BU_DDM': 0, 'NDGFT1DISK': 0, 'BNLPANDA': 0, 'CERNCAF': 705, 'ASGCDISK': 256}trig1_misal1_mc PythiaPhotonJet_AsymJetFilter.recon.AOD.v _tid {'IFAE': 1598, 'DESY-HH': 0, 'NIKHEF': 540, 'UTA_SWT2': 0, 'MWT2_IU': 0, 'RALDISK': 2183, 'TOKYO': 297, 'SACLAY': 179, 'ASGCDISK_V2': 510, 'UAM': 1951, 'CYF': 881, 'DESY-ZN': 1523, 'AU-UNIMELB': 425, 'CNAFDISK': 723, 'FZU': 1700, 'NDGFT1DISK': 0, 'BNLDISK': 0, 'PICDISK': 1317, 'WUP': 331, 'LYONDISK': 2249, 'FZKDISK': 207, 'BU_DDM': 0, 'CERNCAF': 462} How reliable is this output?

22/10/2007Software Week7 Distributed analysis with ganga: running jobs Running jobs is smooth on good sites: Lyon and FZK are my favorite sites: plenty of files related to my analysis, fast execution (1h to run over 1M events). I ran successfully also in Desy, Madrid, CNAF Very high efficiency (~90%) running on those sites on the available files But again: where are the missing files? My result on the expected 4M AsymJet sample: ~2.1M -> 50% You can’t directly compare this number with the one obtained with Pathena because ganga suffers the problem of asymmetric data distribution Clearly with a more careful search of good sites one can obtain a better performance but in any case this should be done automatically by ganga A new ganga working model is being provided :

22/10/2007Software Week8 Distribute analysis with ganga: new model Send to the site a subjob running on files which are really present at the site In this configuration you will get 25 output files: minimal user intervention and maximal result Clearly this would be a big improvement wrt to the current situation: I think this is really the key to deal with incomplete dataset issue and to enlarge the number of ganga clients Subjob 1: files from 5-15 Subjob 2: files from Subjob 3: files from Site A: incomplete dataset files 5-15 Site B: incomplete dataset files Site C: incomplete dataset files 15-20

22/10/2007Software Week9 Distributed analysis with Pathena  Not much comments here. Pathena is the closest thing to the user dream: you just choose the input dataset name and the output dataset and your jobs on average will succeed with a very high efficiency (no additional worries!).  pathena --inDS trig1_misal1_csc PythiaH120gamgam.recon.AOD.v _tid outDS user.LeonardoCarminati.trig1_misal1_csc PythiaH120gamgam.recon.AOD --split 10 HggAnalysis_jobOptions.py  Many users love Pathena for this !  Ok so where’s the problem? Still I would have some questions on Pathena:  Pathena benefits from the fact that (almost) all AODs are copied in BNL: what happens if data are not collected at BNL?  When I run Pathena it seems to me that the jobs go to BNL only: is it correct?  Is this model scalable with the increase of distributed analysis clients ?

22/10/2007Software Week10 Conclusions: From a user point of view the distributed analysis can be very easy (and efficient) or very difficult depending AOD distribution and computing sites quality Using pathena these complications are hidden because almost all data are replicated in BNL and the users are happy. Using GANGA the problems are more evident although they are not caused by ganga itself. In order to allow the users to run with no panic and at the same level of pathena: Ensure a correct AOD distribution at least to Tier1 Strongly support the new ganga job assignment model Provide an automatic mechanism to prevent the users to run at ‘bad’ sites