Nurcan Ozturk University of Texas at Arlington Grid User Training for Local Community TUBITAK ULAKBIM, Ankara, Turkey April 5 - 9, 2010 Overview of ATLAS Distributed Analysis
Nurcan Ozturk 2 Outline User’s Work-flow for Data Analysis ATLAS Distributed Data Analysis System What Type of Data You Can Run On What Type of Jobs You Can Run How to Find Your Input Data AMI dq2 end-user tools ELSSI - Event Level Selection Service Interface Tips for Submitting Your Jobs How To Check Status of ATLAS Releases How to Check Status of Installation of Releases at Sites Tips for Retrieving Your Outputs Tips of Debugging Failures User Support Pathena Example on 7 TeV Collision Data
Nurcan Ozturk 3 User’s Work-flow for Data Analysis Locate the data Analyze the results Setup the analysis job Submit to the Grid Retrieve the results Setup the analysis code
Nurcan Ozturk 4 ATLAS Distributed Data Analysis System Gateways to resources User interface Execution infrastructure 3-layers:
Nurcan Ozturk 5 What Type of Data You Can Run On RAW: raw data from DAQ system ESD – Event Selection Data: output of reconstruction of RAW data AOD – Analysis Object Data: condensed version of ESD D1PD – Primary Derived Physics Data: ESD or AOD with: certain event removed (skimming) some data objects within an event removed (thinning) certain data members within object removed (slimming) Commissioning DPD and Performance DPD (made from ESD’s) Physics DPD (made from AOD’s) DnPD, n-th Derived Physics Data (D2PD, D3PD): Specific format defined by physics/performance groups or users TAG: event tags which are event thumbnails with pointers to the full event in the AOD – fast access to specific events (file-based or database-based) Monte Carlo data EVNT – Event generator data: output of event generation HITS – output of detector simulation (uses EVNT as input) RDO – Raw Data Object: output of digitization (uses HITS as input) – MC equivalent of raw data AOD and DPD All of the above. However RAW/HITS/RDO are on tape, you need to request for DDM replication to disk storage.
Nurcan Ozturk 6 What Type of Jobs You Can Run Athena jobs with official production transformations: Event generation, Simulation, Pileup, Digitization, Reconstruction, Merge General jobs (non-Athena type analysis): ROOT(CINT, C++, pyRoot) ARA (AthenaRootAccess) Python, user's executable, shell script, etc. Jobs with multiple-input streams (e.g. reco trf) Cavern, Minimumbias, BeamHalo, BeamGas input TAG selection jobs Jobs with nightly builds Jobs with arbitrary DBRelease Database release contains Conditions, Geometry and Trigger data More info about DBReleases is in backup slides etc.
Nurcan Ozturk 7 How to Find Your Input Data AMI dq2 end-user tools ELSSI (Event Level Selection Service Interface)
Nurcan Ozturk 8 What is AMI ATLAS Metadata Interface. A generic cataloging framework: Dataset discovery tool Tag Collector tool (release management tool) Where does AMI get its data: From real data : DAQ data from the Tier 0 From Monte Carlo and reprocessing Pulled from the Task Request Database : Tasks, dataset names, Monte Carlo and reprocessing config tags Pulled from the Production Database : Finished tasks – files and metadata From physicists Monte Carlo input files needed for event generation Monte Carlo Dataset number info, physics group owner,… Corrected cross sections and comments. DPD tags
Nurcan Ozturk 9 AMI Portal Page - There is also a read-only server at CERN:
Nurcan Ozturk 10 7 TeV Datasets
Nurcan Ozturk 11 AMI Tutorial Page
Nurcan Ozturk 12 AMI Fast Tutorial Page
Nurcan Ozturk 13 Simple Search in AMI – search by name type here to search for latest 7 TeV collision dataset: data10_7TeV%physics%MinBias%AOD%
Nurcan Ozturk 14 Simple Search in AMI – various useful links Group by Apply filter
Nurcan Ozturk 15 Simple Search in AMI – DQ2 link By clicking on the DQ2 link: Use always merge datasets ending with /
Nurcan Ozturk 16 Simple Search in AMI – PANDA link By clicking on the PANDA link:
Nurcan Ozturk 17 Simple Search in AMI - interpretation of tags (1)
Nurcan Ozturk 18 Simple Search in AMI – Interpretation of tags (2)
Nurcan Ozturk 19 Simple Search in AMI – Run Summary By clicking on the Run_Summary link:
Nurcan Ozturk 20 Simple Search in AMI – Run Queries By clicking on the Run_Query link:
Nurcan Ozturk 21 dq2 End-User Tools (1) User interaction with DDM system: via dq2 end-user tools: querying, retrieving, creating datasets requesting dataset replication, dataset deletion, etc. How to set up (on lxplus): source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh voms-proxy-init --voms atlas More info on setup at: ready_to_use_the_Grid
Nurcan Ozturk 22 dq2 End-User Tools (2) How to use: List available MInBias datasets in DDM system (same search as in AMI) : dq2-ls 'data10_7TeV*physics*MinBias*‘ Search for merged AOD’s in container datasets: dq2-ls 'data10_7TeV*physics*MinBias*merge*AOD*‘ Find location of container datasets (groups of datasets, ending with a '/' ): dq2-list-dataset-replicas-container data10_7TeV physics_MinBias.merge.AOD.f241_p115/ List files in container dataset: dq2-ls –f data10_7TeV physics_MinBias.merge.AOD.f241_p115/ Copy one file locally: dq2-get -n 1 data10_7TeV physics_MinBias.merge.AOD.f241_p115/ More info at:
Nurcan Ozturk 23 DQ2ClientsHowTo Extensive info here
Nurcan Ozturk 24 DQ2Tutorial
Nurcan Ozturk 25 ELSSI – Event Level Selection Service Interface Goal: Retrieve the TAG file from TAG database Define a query to select runs, streams, data quality, trigger chains,….Review the query Execute the query and retrieve the TAG file (a root file) to be used in Athena job
Nurcan Ozturk 26 Tips for Submitting Your Jobs Always test your job locally before submitting to the Grid Use the latest version of pathena/Ganga Submit your jobs on container datasets (merged) If your dataset is on tape, you need to request for data replication to a disk storage first Do not specify any site name in your job submission, pathena/Ganga will choose the best site available If you are using a new release that is not meant be used for user analysis, e.g. cosmic data, Tier0 reconstruction, then it’ll not be available at all sites. In general you can check release and installation status at:
Nurcan Ozturk 27 How To Check Status of ATLAS Releases
Nurcan Ozturk 28 How to Check Status of Installation of Releases at Sites (1) Choose here
Nurcan Ozturk 29 How to Check Status of Installation of Releases at Sites (2)
Nurcan Ozturk 30 Tips for Retrieving Your Outputs If everything went fine, you need to retrieve your outputs from the Grid Where and how you will store your output datasets: request a data replication: your output files will stay as a dataset on Grid. You need to freeze your dataset before requesting for replication download onto your local disk using dq2-get: by default, user datasets are created on _SCRATCHDISK at the site where the jobs run. All the datasets on _SCRATCHDISK are to be deleted after a certain period (~ 30days). If you see a possibility to use them on Grid, you should think about replication. No more than 30-40GB/day can be copied by dq2-get Details at:
Nurcan Ozturk 31 Tips of Debugging Failures Try to understand if the error is job related or site related Job log files tell most of the time If the site went down during your jobs being executed, you can check its status from the ATLAS Computer Operations Logbook: If input files are corrupted at given site you can exclude the site (till they are recopied) and notify DAST (Distributed Analysis Support Team) Look at the FAQ (Frequently Asked Questions) section of the pathena/Ganga twiki pages for most common problems and their solutions: If you can not find your problem listed there, do a search in the archive of the distributed-analysis-help forum in e-groups: If you still need help, send a message to the e-groups, DAST and other users will help you:
Nurcan Ozturk 32 Does your job require conditions database access? If you are running at CERN or at Tier1’s, you may have not even noticed it. They provide direct access to Oracle databases. You may have seen possible some overload problems (errors in job logs). What is stored in Oracle databases: The geometry and most of the conditions data LAr calibrations and InDet alignments are too large to be stored effectively in Oracle; they are stored as POOL files and replicated to all Tier1’s (soon to Tier2’s) If your job runs at Tier2/Tier3’s, remote access to Oracle databases is possible Solution for users: provide the conditions DB releases together with the job: DB release is an extraction of the needed constants into a tar file that is copied to the worker node and accessed locally. Avaiable as a dataset in DDM system. --dbReleases NameofDataset (in pathena/Ganga) Now in place: have user jobs to access Oracle database through FroNtier/Squid caches Jobs can run anywhere w/o configuration changes Solves latency problems with jobs running at Tier2/Tier3’s More info about databases at:
Nurcan Ozturk 33 ATLAS Computer Operations Logbook – site/service problems reported
Nurcan Ozturk 34 pathena FAQ
Nurcan Ozturk 35 Ganga FAQ
Nurcan Ozturk 36 User Support (1) DAST members EU time zone NA time zone Daniel van der Ster Nurcan Ozturk Mark Slater Alden Stradling Hurng-Chun Lee Sergey Panitkin Bjorn Samset Kamile Yagci Christian Kummer Bill Edson Maria Shiyakova Wensheng Deng Jaroslava Schovancova Manuj Jha Karl Harrison Elena Oliver Garcia
Nurcan Ozturk 37 User Support (2) DAST started in September 2008 for a combined support of pathena and Ganga users First point of contact for distributed analysis questions All kinds of problems are discussed, not just pathena/Ganga related ones Analysis tools related problems DDM related problems And athena related problems 15 hours coverage with 2 people on shift (one in NA, one in EU time zones). Plan is to have 2 people in each time zone. DAST helps directly by solving the problem or escalating to relevant experts More shifters and more user2user support needed Shift work counts to OTSMOU credit (Category-2 shifts currently) User feedback is extremely useful to debug the distributed analysis tools and explore the features pathena/Ganga have to offer. So feel free to write to this forum
Nurcan Ozturk 38 Pathena Example on 7 TeV Collision Data Everyone needs to complete this tutorial: Setup CMT and athena as explained above. You can use the latest release (production cache): source cmthome/setup.sh -tag= ,AtlasProduction,setup,32 Use a ntuple (D3PD) making package to run on AOD’s: Example: SUSYD3PDMaker package Get one file locally to test that athena runs fine: dq2-get -n 1 data10_7TeV physics_MinBias.merge.AOD.f241_p115/ Configure the job option file and run athena locally as explained in the SUSYD3PDMaker wiki page: athena SUSYD3PDMaker_topOptions.py Setup pathena on lxplus, or install yourself as explained in pathena wiki page: source /afs/cern.ch/atlas/offline/external/GRID/DA/panda-client/latest/etc/panda/panda_setup.sh Submit your job to Grid with pathena: pathena –inDS data10_7TeV physics_MinBias.merge.AOD.f241_p115/ --outDS user10.NurcanOzturk.trgrid.test SUSYD3PDMaker_topOptions.py Monitor your job on Panda monitor or using pbook (bookkeeping tool for pathena). Once your job is completed, get your output (root files and log files) to your local machine: dq2-get user10.NurcanOzturk.trgrid.test Open Root and plot some histograms to see how the real data looks like Use analysis codes (D3PD readers) to do more detailed analysis on ntuples. Codes can be based C++ (example in SUSYD3PDMaker wiki page), python (for instance SPyRoot), etc.